Uncategorized

Python 3.13 Gets a JIT


I think it’s really cool that Haoran Xu and Fredrik Kjolstad’s copy-and-patch technique[0] is catching on, I remember discovering it through Xu’s blog posts about his LuaJIT remake project[1][2], where he intends to apply these techniques to Lua (and I probably found those through a post here). I was just blown away by how they “recycled” all these battle-tested techniques and technologies, and used it to synthesize something novel. I’m not a compiler writer but it felt really clever to me.

I highly recommend the blog posts if you’re into learning how languages are implemented, by the way. They’re incredible deep dives, but he uses the details-element to keep the metaphorical descents into Mariana Trench optional so it doesn’t get too overwhelming.

I even had the privilege of congratulating him the 1000th star of the GH repo[3], where he reassured me and others that he’s still working on it despite the long pause after the last blog post, and that this mainly has to do with behind-the-scenes rewrites that make no sense to publish in part.

[0] https://arxiv.org/abs/2011.13127

[1] https://sillycross.github.io/2022/11/22/2022-11-22/

[2] https://sillycross.github.io/2023/05/12/2023-05-12/

[3] https://github.com/luajit-remake/luajit-remake/issues/11

While bears a significant resemblance, Ertl and Gregg’s approach is not automatic and every additional architecture requires a significant understanding of the target architecture—including an ability to ensure that fully relocable code can be generated and extracted. In comparison, the copy-and-patch approach can be thought as a simple dynamic linker, and objects generated by unmodified C compilers are far more predictable and need much less architecture-specific information for linking.

The copy-and-patch also assumes the compiler will generate patchable code. For example, on some architecture, have a zero operand might have a smaller or different opcode compared to a more general operand. Same issue for relative jumps or offset ranges. It seems the main difference is that the patch approach also patches jumps to absolute addresses instead of requiring instruction-counter relative code.

Does Ertl and Gregg’s approach have any “upsides” over copy-and-patch? Or is it a case of just missing those one or two insights (or technologies) that make the whole thing a lot simpler to implement?

Anton Ertl! <3

Context: I’ve been on a concatenative language binge recently, and his work on Forth is awesome. In my defense he doesn’t seem to list this paper among his publications[0]. Will give this paper a read, thanks for linking it! 🙂

If they missed the boat on getting credit for their contributions then at least the approach finally starts to catch on I guess?

(I wonder if he got the idea from his work on optimizing Forth somehow?)

[0] https://informatics.tuwien.ac.at/people/anton-ertl

Thanks a lot!! I’m something of a beginner language developer and I’ve been collecting papers, articles, blog posts, anything that provides accessible, high level description of these optimization techniques.

It’s interesting to see these 2-9% improvements from version to version. They are always talked about with disappointment, as if they are too small, but they also keep coming, with each version being faster than the previous one. I prefer a steady 10% per version over breaking things because you are hoping for bigger numbers. Those percentages add up!

This is happening mostly because Guido left, right? The take that CPython should be a reference implementation and thus slow always aggravated me (because, see, no other implementation can compete because every package depends on CPython kirks, in such a way that we’re now removing the GIL of CPython rather than migrating to Pypy for example)

Partly, yes, but do note he is still very much involved with the faster-cpython project via Microsoft. Google faster cpython and van rossum to find some interviews and talks. You can also check out the faster-cpython project on github to read more.

It’s fascinating to me that this process seems to rhyme with that of the path PHP took, with HHVM being built as a second implementation, proving that PHP could be much faster — and the main project eventually adopting similar approaches. I wonder if that’s always likely to happen when talking about languages as big as these are? Can a new implementation of it ever really compete?

Seems a bit silly to think that- Guido is still involved with Python… and in fact is the one heading the Faster CPython project at Microsoft which is responsible for many of these improvements.

5.5% compounded over 5 years is a bit over 30%: not a huge amount but an easily noticeable speed-up. What were you thinking of when you typed “significantly faster”?

Compunding a decrease works differently than an increase. If something gets 10% faster twice it actually got 19% faster. In other words, the runtime is 90% of 90%, i.e. 81%.

The link seems fairly clear to me – One explanation given is that python3 represents all integers in a “long” type, whereas python2 defaulted to small ints. This gave (gives?) python2 an advantage on tasks involving manipulating lots of small integers. Most real-world python code isn’t like this, though.

Interestingly they singled out pyaes as one of the worst offenders. I’ve also written a pure-python AES implementation, one that deliberately takes advantage of the “long” integer representation, and it beats pyaes by about 2000%.

I envy these small and steady improvements!!

I spent about one week implementing PyPy’s storage strategies in my language’s collection types. When I finished the vector type modifications, I benchmarked it and saw the ~10% speed up claimed in the paperš. The catch is performance increased only for unusually large vectors, like thousands of elements. Small vectors were actually slowed down by about the same amount. For some reason I decided to press on and implement it on my hash table type too which is used everywhere. That slowed the entire interpreter down by nearly 20%. The branch is still sitting there, unmerged.

I can’t imagine how difficult it must have been for these guys to write a compiler and succeed at speeding up the Python interpreter.

š https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tr…

I’d rather they add up. Minus -5% runtime there, another -5% there… Soon enough, python will be so fast my scripts terminate before I even run them, allowing me to send messages to my past self.

log(2)álog(1.1) ~= 7.27, so in principle sustained 10% improvements could double performance every 7 releases. But at some point we’re bound to face diminishing returns.

Finally!

Regardless of the work being done in PyPy, Jython, GraalPy and IronPython, having a JIT in CPython seems to be the only way beyond “C/C++/Fortran libs are Python” mindset.

Looking forward to its evolution, from 3.13 onwards.

The only way to achieve C/C++/Fortran efficiency is a statically compiled, strongly typed language. Witness the effort put into Java JITC and the rest of the modern Java (and Graal) runtime. Still well short of the promised “C equivalence”.

To me, Mojo looks like the best approach to fusing that with the Python ecosystem! (I have no doubt about it being open sourced at some point.)

I love Python and use it for everything other than web development.

One reason is performance. So if Python has a faster future ahead of it: Hurray!

The other reason is that the Python ecosystem moved away from stateless requests like CGI or mod_php use and now is completely set on long running processes.

Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?

I personally like Quart, which is like Flask, but with asyncio. Django is also incredibly popular and has been around forever, so it is very battle-tested.

I’ve been using FastAPI which is made by the same guy as Flask but taking it seriously this time and using asyncio and making space for multithreaded. It’s almost a drop-in replacement for Flask.

> Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption.

All of the popular frameworks automatically reload. It’s not instantaneous but with e.g. Django it was less than the time I needed to switch windows a decade ago and it hadn’t gotten worse. If you’re used to things like NextJS it will likely be noticeably faster.

With `reload(module)` you don’t even have to restart the server if you structure it properly.

Think server.py and server_handlers.py, where server.py contains logic to detect a modification of server_handlers.py (like via inotify) and the base handlers which then call the “modifiable” handlers in server_handlers.py.

This is not limited to servers (anything that loops or reacts to events) and can be nested multiple levels deep and is among the top 3 reasons of why i use Python.

Reloading is instantaneous and can gracefully handle errors in the file (just print an err messge or stack trace and keep running the old code)

The restart isn’t expensive in absolute terms, on a human level it’s practically instant. You would only do this during development, hopefully your local machine isn’t the production environment.

It’s also very easy, often just adding a CLI flag to your local run command.

edit: Regarding performance, Python today can easily handle at least 1k requests per second. The vast vast vast majority of web applications today don’t need anywhere near that kind of performance.

The thing is, I don’t run my applications locally with a “local run command”.

I prefer to have a local system set up just like the production server, but in a container.

Maybe using WSGI with MaxConnectionsPerChild=1 could be a solution? But that would start a new (for example) Django instance for every request. Not sure how fast Django starts.

Another option might be to send a HUP signal to Apache:

    apachectl -k restart

That will only kill the worker threats. And when there are none (because another file save triggered it already), this operation might be almost free in terms of resource usage. This also would require WSGI or similar. Not sure if that is the standard approach for Django+Apache.

Is the problem you’re having that you feel the need to expose a WSGI/ASGI interface instead of just a reverse proxy? Take a look at gunicorn, and for static files server you can use whitenoise.

With those two you can just stand up an python program in a container that serves html, and put it behind whatever reverse proxy you want.

I would still recommend running it properly locally, but whatever. Pseudo-devcontainer it is. I assume the code is properly volume mounted.

In production, you would want to run your app through gunicorn/uvicorn/whatever on an internal-only port, and reverse-proxy to it with a public-facing apache or similar.

Set up apache to reverse proxy like you would on prod, and run gunicorn/uvicorn l/whatever like you would on prod, except you also add the autoreload flag. E.g.

    uvicorn main:app --host 0.0.0.0 --port 12345 --reload

If production uses containers, you should keep the python image slim and simple, including only gunicorn/uvicorn and have the reverse proxy in another container. Etc.

I hate this argument that “most web apps don’t need that kind of performance.” For one thing, with responsive apps that are the norm it wouldn’t be surprising for a session to begin with multiple requests or to even have multiple requests per second. At that point all it takes is a few hundred active users to hit that 1k limit.

But even leaving that aside, you never know when your application will be linked somewhere or go semi-viral and not being able to serve 1000 users is all it takes for your app to go down and your one shot at a successful company to die a sad death.

I didn’t say python can handle <=1K, I was saying >=1K. I feel confident that I am orders of magnitude off the real limit you’d meet.

The specifics of that aside, any unprepared application is going to buckle at a sudden mega-surge of users. The solution remains largely the same, regardless of technology: Make sure everything that can be cached is cached, scale the hardware vertically until it stops helping, optimize your code, scale horizontally until you run out of money. I imagine the DB will be the actual bottleneck, most of the time.

There are other reasons to not choose python for greenfield application, but performance should rarely be one IMO.

> The other reason is that the Python ecosystem moved away from stateless requests like CGI or mod_php use and now is completely set on long running processes.

The long-running process is a WSGI/ASGI process that handles spawning the actual code, similar to CGI. The benefit is that it can handle how it spawns the request workers via multiple runtimes, process/threads, etc. It’s similar to CGI but instead of nginx handling it, it’s a special program that specializes in the different options for python specifically.

> Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?

Only for development!

To update your code in production you first deploy the new code onto the machine, and then you tell the WSGI/ASGI such as Gunicorn to reload. This will cause it to use the new code for new request, without killing current requests.

It’s a graceful reload, with no file watching needed. Just a “systemctl reload gunicorn”

If you run the debug web server, (e.g. Django’s `manage.py runserver`) command, yes it has watcher that will automatically restart the web server process if there is a code changes.

Once you deploy it to production, you usually run it using a WSGI/ASGI server such as Gunicorn or Uvicorn and let whatever deployment process you use handles the lifecycle. You usually don’t use watcher in production.

Basically similar stuff with nodejs, rails, etc.

In dev, this is handled mostly by the OS with things like inotify, so it has little perf impact.

In prod, you don’t do it. Deployment implies sending a signal like HUP to your app, so that it reloads the code gracefully.

All in all, everybody is moving to thid, even php. This allows for persitent connexion, function memoization, delegation to threadpools, etc

> That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?

What? No, in reality it’s just running your app in debug mode (just a cli flag), and when you save the files the next refresh of the browser has the live version of the app. It’s neither expensive nor complex.

I wish the money could be spent on PyPy but pypy has its problems – you don’t get a big boost on small programs that run often because the warmup time isn’t that fabulous.

For larger programs like you sometimes it some incredibly complicated incompatibility problem. For me bitbake was one of those – could REALLY benefit from pypy but didn’t work properly and I couldn’t fix it.

If this works more reliably or has a faster warmup then….well it could help to fill in some gaps.

Wasn’t CPython supposed to remain very simple in its codebase, with the heavy optimization left for other implementations to tackle? I seem to remember hearing as much a few years back.

The problem is that:
* CPython is slow, making extension modules written in C(++) very attractive
* The CPython extension API exposes many implementation details
* Making use of those implementation details helps those extension modules be even faster

This resulted in a situation where the ecosystem is locked-in to those implementation details: CPython can’t change many aspects of its own implementation without breaking the ecosystem; and other implementations are forced to introduce complex and slow emulation layers if they want to be compatible with existing CPython extension modules.

The end result is that alternative implementations are not viable in practice, as most existing libraries don’t work without their CPython extension modules — users of alternative implementations are essentially stuck in their own tiny ecosystem and cannot make use of the large existing (C)Python ecosystem.

CPython at least is in a position where they can push a breaking change to the extension API and most libraries will be forced to adapt. But there’s very little incentive for library authors to add separate code paths for other Python implementations, so I don’t think other implementations can become viable until CPython cleans up their API.

PyPy was released 17 years ago

Jython was released 22 years ago

IronPython was released 17 years ago

To date, no Python implementation has managed to hit all three:

1. Stay compatible with any recent, modern CPython version

2. Maintain performance for general-purpose usage (it’s fast enough without a warmup, and doesn’t need to be heavily parallelized to see a performance benefit)

3. Stayed alive

Which, frankly, is kind of a shame. But the truth of the matter is that it was a high bar to hit in the first place, and even PyPy (which arguably had the biggest advantages: interest, mindshare, compatibility, meaningful wins) managed to barely crack a fraction of a percent of Python market share.

If you bet on other implementations being the source of performance wins, you’re betting on something which essentially doesn’t exist at this point.

That was the original idea, when Python started attracting interest from big corporations. It has however become clear that maintaining alternative implementations is very difficult and resource-intensive; and if you have to maintain compatibility with the wider ecosystem anyway (because that’s what users want), you might as well work with upstream to find solutions that work for everyone.

Does Python even have a language specification? I’ve been told that CPython IS the specification. I don’t know if this is still true. In the Java world there is a specification and a set of tests to test for conformation so it’s easier to have alternative implementations of the JVM. If what I said is correct, then I can see how the optimized alternative implementation idea is less likely to happen.

There is a pretty detailed reference that distinguishes between cpython implementation details and language features at least. There was a jvm python implementation even. The problem is more that a lot of the libraries that everyone wants to use are very dependent on cpython’s ffi which bleeds a lot of internals.

Well, for Python the language reference in the docs[0] is the specification, and many things there are described as CPython implementation details. Like: “CPython implementation detail: For CPython, id(x) is the memory address where x is stored.” And as another example, dicts remembering insertion order was CPython’s implementation detail in 3.6, but from 3.7 it’s part of the language.

[0] https://docs.python.org/3/reference/index.html

The article presents a copy and patch jit as something new, but I remember DOS’s quickbasic doing the same thing. It generated very bad assembly code in memory by patching together template assembly blocks with filled in values, with a lot of INT instructions toward the quickbasic runtime, but it did compile, not interprete.

Template JITs in general aren’t a new technique, but Copy-and-Patch is a specific method of implementing it (leveraging a build time step to generate the templates from C code + ELF relocations).

QBasic was a slightly cut down version of Quickbasic that didn’t include the compiler, so your assumption was correct in that case. QBasic was bundled with DOS but you had to buy Quickbasic.

The last two-ish years have been insane for Python performance. Something clicked with the core team and they obviously made this a serious goal of theirs and the last few years have been incredible to see.

It’s because the total dollars of capitalized software deployed in the world using Python has absolutely exploded from AI stuff. Just like how the total dollars of business conducted on the web was a big driver of JS performance earlier.

AI heavy lifting isn’t just model training. There’s about a million data pipelines and processes before the training data gets loaded into a PyTorch tensor.

Ehhh… if you’re lucky. I’ve seen (and maybe even written) plenty of we-didn’t-have-time-to-write-this-properly-with-dataframes Python data munging code, banged out once and then deployed to production. I’ll take performance gains there.

That is the exact value proposition of Mojo.

Pythonesque (sane) syntax, great AOTC language features, great performance, memory safety, excellent Python interoperability. What’s not to like?

If you think of something, contact Modular…

Which is strange considering how bad the tooling to use Python on Windows is. There’s a few workflows where people have gone down the beaten path before (Conda, etc.), but outside of that you have to just pretend you’re on Linux and use the cygwin toolchains and even that doesn’t always work so well. Better support on Linux was a top 5 reason for me making the switch to using it full time when I went off to college, and it hasn’t changed substantially in the 8 years since then.

> There were no noticeable performance improvements in the course of the last two years.

In fairness, Python did get faster. Python 3.9 took 82 seconds for sudoku solving and 62 seconds for interval query. Python 3.11 took 53 and 43 seconds, respectively [1]. v3.12 may be better. That said, whether the speedup is noticeable can be subjective. 10x vs 15x slower than v8 may not make much difference mentally.

[1] https://github.com/attractivechaos/plb2

That’s why I said another.

It was a different time. Microsoft had a different strategy towards languages not developed by Microsoft. Similar to how there also used to be JScript, but now Node.js is basically a Microsoft’s pet project.

There are actually plenty of popular Microsoft’s projects that took even more than two tries. Azure is like their third attempt at cloud services, iirc. Credit where credit is due, they learn from mistakes… unfortunately, that only makes them more insidious.

This was a fantastic, very clear, write-up on the subject. Thanks for sharing!

If the further optimizations that this change allows, as explained at the end of this post, are covered as well as this one, it promises to be a very interesting series of blog posts.

What is it really JIT-ing? Given it says that it’s only relevant for those building CPython. So it’s not JIT-ing my Python code, right? And the interpreter is in C. So what is it JIT-ing? Or am I misunderstanding something?

> A copy-and-patch JIT only requires the LLVM JIT tools be installed on the machine where CPython is compiled from source, and for most people that means the machines of the CI that builds and packages CPython

Code fragments that implement each opcode in the core interpreter loop are additionally compiled in the way that each fragment is compiled into a relocatable binary. Once processed in that way, the runtime code generator can join required fragments by patching relocations, essentially doing the job of dynamic linkers. So it is compiling your Python code, but the compiled result is composed of pre-baked fragments with patches.

The article describes that the new JIT is a “copy-and-patch JIT” (I’ve previously heard this called a “splat JIT”). This is a relatively simple JIT architecture where you have essentially pre-compiled blobs of machine code for each interpreter instruction that you patch immediate arguments into by copying over them.

I once wrote an article about very simple JITs, and the first example in my article uses this style: https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-…

I take some issue with this statement, made later in the article, about the pros/cons vs a “full” JIT:

> The big downside with a “full” JIT is that the process of compiling once into IL and then again into machine code is slow. Not only is it slow, but it is memory intensive.

I used to think this was true also, because my main exposure to JITs was the JVM, which is indeed memory-intensive and slow.

But then in 2013, a miraculous thing happened. LuaJIT 2.0 was released, and it was incredibly fast to JIT compile.

LuaJIT is undoubtedly a “full” JIT compiler. It uses SSA form and performs many optimizations (https://github.com/tarantool/tarantool/wiki/LuaJIT-Optimizat…). And yet feels no more heavyweight than an interpreter when you run it. It does not have any noticeable warm up time, unlike the JVM.

Ever since then, I’ve rejected the idea that JIT compilers have to be slow and heavyweight.

ÂťLuaJIT is undoubtedly a “full” JIT compiler.ÂŤ

Yes, and it’s practically unmaintained. Pull requests to add support for various architectures have remained largely unanswered, including RISC-V.

I think Mike Pall has done enough work on LuaJIT for several lifetimes. If nobody else wants to merge pull requests and make sure everything still works then maybe LuaJIT isn’t important enough to the world.

I like python but I would never choose it for anything more than trivial on the backend. I want to know what types are being passed around from one middleware function to the next. Yes python has annotations but that’s not enough.

Just wait until you see what the enterprise Java developers passing around with type Object and encoded XML blobs. Type checking is really useful but it can be defeated in any language if you don’t have a healthy technical culture.

Is that why I see much object serialization/deserialization in Java?

They’re trying to pass data between layers of middleware, but Java has very strict typing, and the middleware doesn’t know what kind of object it will get, so it has to do tons of type introspection and reflection to do anything with the data?

Honestly I don’t understand the pessimistic view here. I think every release since Microsoft started funding python has increased high single digit best case performance.

Rather than focussing on the raw number compare to python 3.5 or so. It’s still getting significantly faster.

If they keep doing this steady pace they are slowly saving the planet!

Because it only increases high single digit each release. If they keep up the 10% improvement for the next 10 release, we will reach a speedup of around 2.5 times. That’s very small, considering how Python is like 10-20 times slower than JS (not even talking about C or Java like speeds).

I think the pessimism really comes from a dislike for Python

While very very very popular, Python is i think is very disliked languages, it doesnt have or it is not built around the current programming language features that programmers like, its not functional or immutable by default, its not fast, the tooling is complex, it uses indentation for code blocks (this feature was cool in the 90s, but dreaded since at least 2010)

so i guess if python become fasters, this will ensure its continued dominance, and all those hoping that one day it will be replace by a nicer , faster language are disappointed

this pessimism is the aching voice of the developers who were hoping for a big python replacement

> (this feature was cool in the 90s, but dreaded since at least 2010)

LOL this is a dead giveaway you haven’t been around long. There have been people kvetching about the whitespace since the beginning. Haskell went on to be the next big thing for reddit/HN/etc for years and it also uses whitespace.

To each his own, but the things you list are largely subjective/inaccurate, and there are many, many, many developers who use Python because they enjoy it and like it a lot.

Python is a very widely used language, and like any popular thing, yes many many many like it , and many many many dislike it .. it is that big, python can be disliked by a million developer and still be a lot more liked than disliked

but i also think that its true that python is not and have not been for a while considered as a modern or technically advanced language

the hype currently is for typed or gradually typed languages, functional languages, immutable data , system languages, type safe language, language with advanced parallelism and concurrency support etc ..

python is old , boring OOP, if you like it, than like millions of developers you are not picky about programming language, you use what works, what pays

but for devs passionate about programming languages, python is a relic they hope vanish

Python is designed to be “boring” (in other words, straightforward and easy to understand). It is admittedly less so, now that it has gained many features since the 2.x days, but it is still part of its pedigree that it is supposed to be teachable as a beginner language.

It is still the only beginner language that is also an industrial-strength production language. You can learn Python as your first language and also make an entire career out of it. That can’t really be said about the currently “hyped” languages, even though those are very fun and cool and interesting!

> but for devs passionate about programming languages, python is a relic they hope vanish

If you asked me what language I would consider to be a relic that I hope would vanish, I’d go with Perl.

> devs passionate about programming languages, python is a relic they hope vanish

Statements like this are obviously untrue for large numbers of people, so I’m not sure of the point you’re trying to make.

But certainly it’s true that there are both objective and subjective reasons for using a particular tool, so I hope you are in a position to use the tools that you prefer the most. Have a great day!

When did Microsoft start funding Python?

Also, such a shame that it takes sooo long for crucial open source to be funded properly. Kudos to Microsoft for doing it, shame on everyone else for not pitching in sooner.

FYI Python was launched 32 years ago, Python 2 was released 24 years ago and Python 3 was released 16 years ago.

To be clear Microsoft isn’t directly funding Python, excluding any PyCon sponsorship.

Microsoft hired Guido in late 2020 giving him freedom to choose what project he wanted. Guido decided to go back to core Python development and with approval of Microsoft created a “faster-cpython” project, at this point that project has hired several developers including some core CPython developers. This is all at the discretion of Microsoft, and is not some arms length funding arrangement.

Meta has a somewhat similar situation, they hired Sam Gross (not the cartoonist) to work on a Python non-gil project, and contribute it directly to CPython if they accept it (which they have), and they have publicly committed to support it, which if I remember right was something like funding two engineering years of an experienced CPython internals developer.

Julia is my source of pessimism. Julia is super fast once it’s warmed up, but before it gets there, it’s painfully slow. They seem to be making progress on this, but it’s been gradual. I understand that Java had similar growing pains, but it’s better now. Combined with the boondoggle of py3, I’m worried for the future of my beloved language as it enters another phase of transformation.

I’m not that up to date on the language, it’s been a few years since I did anything nontrivial with it because the experience was so poor. And while that might not seem fair to Julia, it’s my honest experience: my concern isn’t a pissing match between Julia and the world, it’s that bad JIT experience is a huge turnoff and I’m worried about Python’s future as it goes down this road.

There has been so much progress in Julia’s startup performance in the past “few years” that someone’s qualitative impressions from several major releases before the current one are of limited relevance.

You’re making this about Julia despite my repeated statements to the contrary. Please reread what I’ve written, you aren’t responding to the actual point I’ve made twice now. A reminder: I’m talking specifically about my outlook on the future of Python, vis a vis my historical experience with how other JIT languages have developed.

If you wanted to rebut this, you’d need to argue that Julia has always been awesome and that my experience with a slow warmup was atypical. But that would be a lie, right?

And, subtext: when I wrote my first commebt in this thread, its highest sibling led with

> I think the pessimism really comes from a dislike for Python

So I weighed in as a Python lover who is pessimistic for reasons other than a bias against the language.

> I’m talking specifically about my outlook on the future of Python, vis a vis my historical experience with how other JIT languages have developed.

But your assessment of the other language you mentioned is several years out of date and made largely irrelevant by the fast pace of progress. Therefore your conclusions about the probable future of Python, which may be correct, nevertheless do not follow.

I was sharing feelings and opinions, when you refer to my “conclusions” you’re speaking to elements of the empty set. I get that you’re a big Julia evangelist, but if you hope to reach people, you must learn to listen.

How long did it take Julia to solve its warmup issue? The language is about 12, and I last tried in earnest two years ago. So, more than a decade? You speak from the top of a mountain, and you say the view is nice. Sitting at the base of a similar mountain, it’s the journey that I dread, because Python’s recent long-term journeys have been pretty rough. And I’m just not convinced that the destination is so great.

It’s not that simple.

Amdahl’s Law is about expected speedup/decrease in latency. That actually isn’t strongly correlated to “saving the planet” afaik (where I interpret that as reducing direct energy usage, as well as embodied energy usage by reducing the need to upgrade hardware).

If anything, increasing speed and/or decreasing latency of the whole system often involves adding some form of parallelism, which brings extra overhead and requires extra hardware. Note that prefetching/speculative execution kind of counts here as well, since that is essentially doing potentially wasted work in parallel. In the past boosting the clock rate the CPU was also a thing until thermodynamics said no.

OTOH, letting your CPU go to sleep faster should save energy, so repeated single-digit perf improvements via wasting less instructions does matter.

But then again, that could lead to Jevons Paradox (the situation where increasing the efficiency encourages more wasteful than the increase in efficiency saves – Wirth’s Law but generalized and older, basically).

So I’d say there’s too many interconnected dynamics at play to really simply state “optimization good” or “optimization useless”. I’m erring on the side of “faster Python probably good”.

[0] https://en.wikipedia.org/wiki/Jevons_paradox

Why has it taken so much longer for CPython to get a JIT than, say, PyPy? I would imagine the latter has far less engineering effort and funding put into it.

For the longest time, CPython was deliberately optimized for simplicity. That’s a perfectly reasonable choice: it’s easier to reason about, easier for new maintainers to learn it, easier to alter, easier to fix when it breaks, etc. Also, CPUs are pretty good at running simple code very quickly.

It’s only fairly recently that there’s been critical mass of people who thought that performance trumps simplicity, and even then, it’s only to a point.

> It’s only fairly recently that there’s been critical mass of people who thought that performance trumps simplicity

This definitely wasn’t true, from the user perspective. And, I’m not even convinced it’s some “critical mass” of developers. These changes aren’t coming from some mass of developers, there’s coming from a few experts that had a clear plan, backed by the sanity of the huge disconnect that languages are actually meant for users of the language, not the developers of the language.

Basically a JIT (Just In Time), is also known as a dynamic compiler.

It is an approach that traces back to original Lisp and BASIC systems, among others lesser kwown ones.

The compiler is part of the language runtime, and code gets dynamically compiled into native code.

Why is this a good approach?

It allows for experiences that are much harder to implement in languages that tradicionally compile straight to native code like C (note there are C interpreters).

So you can have an interpreter like experience, and code gets compiled to native code before execution on the REPL, either straight away, or after the execution gets beyond a specific threshold.

Additionally, since dynamic languages per definition can change all the time, a JIT can profit from code instrumentation, and generate machine code that takes into account the types actually being used, something that an AOT approach for a dynamic language cannot predit, thus optimizations are hardly an option in most cases.

The article will be a confusing read to someone who does not know what a JIT is.

Look at the paper after the heading “What is a JIT?”

The first paragraph moves towards an answer – “compilation design that implies that compilation happens on demand when the code is run the first time” But then it backtracks on this and says that it could mean many things, and gets wishy-washy, and says that python is already a JIT.

The second paragraph says, “What people tend to mean when they say a JIT compiler, is a compiler that emits machine code.” What point is the author trying to make here? An Ahead of Time compiler emits machine code. But then it goes on to say that an Ahead of Time compiler also emits machine code. So what is a JIT?

The third paragraph starts talking about about mechanism, which is a distraction from the question it posed above – what is a JIT?

The article talks around points instead of making points.

Yeah as a junior without a CS degree I was reading this article thinking “this is very interesting” but found it very hard to really grasp the difference between Ahead of Time and JIT from their explanations. Just that it was different from the previous python interpreter method, which seems woefully inefficient. I do know that Java has a JIT and I’ve read about this, but I guess it became quickly clear that I didn’t really understand it since I couldn’t follow this article. I think I will need to read more about this elsewhere and come back to fully grasp the impact.

> very hard to really grasp the difference between Ahead of Time and JIT

JIT = “just in time” = bytecode is converted to native code while the program is running, either at the startup of the program or just before a particular function is called. Sometimes even after the function is called (since the JIT process itself takes time, it may be optimal to only run it once a function has been called N times or taken M microseconds total run time)

AOT = “ahead of time” = bytecode is converted to native code before the program starts. i.e. by the developer during their distribution or deployment process. AOT compilation knows nothing about the specific run time conditions.

It’s pretty clear to me.

>> JIT, or “Just in Time” is a compilation design that implies that compilation happens on demand when the code is run the first time.

>> What people tend to mean when they say a JIT compiler, is a compiler that emits machine code.

A JIT compiler is a compiler that emits machine code the first time that code is run, vs an AOT compiler which emits machine code when the code is built.

Did you already know what a JIT was before reading the article though? Confirming what you already know is a different thing than grokking it the first time. Plus in my brief but intense experience as someone teaching programming to artistic types who are scared of maths, it’s more useful to evaluate explanations by the possibility of being misunderstood and overwhelming, than the possibility of correctly interpreting it.

> At the moment, the JIT is only used if the function contains the JUMP_BACKWARD opcode which is used in the while statement but that will change in the future.

Isn’t this the main reason why it’s only a 2-9% improvement? Not much Python code uses the while statement in my experience.

I’ve found the startup time for Graal Python to be terrible compared with other Graal languages like JS. When I did some profiling, it seemed that the vast majority of the time was spent loading the standard library. If implemented lazily, that should have a negligible performance impact.

This is referenced in the article:

> The big downside with a “full” JIT is that the process of compiling once into IL and then again into machine code is slow. Not only is it slow, but it is memory intensive.

Python is a convenient friendly syntax for calling code implemented in C. While you can easily re-implement the syntax, you then have to decide how much of that C to re-implement. A few of the builtin types are easy (eg strings and lists), but it soon becomes a mountain of code and interoperability, especially if you want to get the semantics exactly right. And that is just the beginning – a lot of the value of Python is in the extensions, and many popular ones (eg numpy, sqlite3) are implemented in C and need to interoperate with your re-implementation. Trying to bridge from Java or .NET to those extensions will overwhelm any performance advantages you got.

This JIT approach is improving the performance of bits of the interpreter while maintaining 100% compatibility with the rest of the C code base, its object model, and all the extensions.

Given how many Microsoft employees today steer the Python decision making process, I’m sure in not so distant future, we might see a new CLR-based Python implementation.

Maybe Microsoft don’t know yet how to sell this thing, or maybe they are just boiling the frog. Time will tell. But I’m pretty sure your question will be repeated as soon as people will get used to the idea of Python on JIT.

“Python code runs 15% faster and and 20% cheaper on azure than aws, thanks to our optimized azurePython runtime. Use it for azure functions and ml training”

Just a guess at the pitch.

It’s a lot about popularity / stigma.

Microsoft developed both JScript and Node.js. They could’ve continued with JScript, but obviously decided against it because JScript didn’t earn the reputation they might have hoped for. Even if they invested efforts into rectifying the flaws of JScript, it would’ve been just too hard to undo the reputation damage.

Microsoft made multiple attempts to “befriend” Python. IronPython was one of the failures. They also tried to provide editing tools (eg. intellisense in MSVS), but kind of given up on that too (but succeeded to a large degree with VSCode).

The whole long-term Microsoft’s strategy is to capture and put the developers on a leash. They won’t rest until there’s a popular language they don’t control.

Not sure, these optimizations multiply in power when used together. Propagate constants and fold constants, after that you can remove things like “if 0 > 0”, both the conditional check and the whole block below it, and so on.

what are those future optimization he talks about?

he talks about an IL, but what’s that IL? does that mean that the future optimization will involve that IL?

I still don’t get why they didn’t reduce API of the interpreter internals in Python 3 so that things like this would be more achievable.

If you’re going to break backwards compatibility, it’s not like Unicode was the only foundational problem Python 2 had.

They did change the API for Python modules implemented in C. That was actually part of the reason why the 2->3 transition went so badly.

It wasn’t realistic to switch to 3.x when the libraries either weren’t there or were a lot slower (due to using pure Python instead of C code).

It also wasn’t realistic to rewrite the libraries when the users weren’t there.

It was in many respects a perfect case study in how not to do version upgrades.

So they compile the C implementation of every opcode into templates and then patch in the actual values from the functions being compiled. That’s genius, massive inspiration for me. It’s automatically ABI compatible with the rest of CPython too.

Is there an similarly accessible article about the specializing adaptive interpreter? It’s mentioned in this article but not much detail is given, only that the JIT builds upon it.

I wonder if I can skip the bytecode compilation phase.

For the lazy who just want to know if this makes Python faster yet, this is foundational work to enable later improvements:

> The initial benchmarks show something of a 2-9% performance improvement.

> I think that whilst the first version of this JIT isn’t going to seriously dent any benchmarks (yet), it opens the door to some huge optimizations and not just ones that benefit the toy benchmark programs in the standard benchmark suite.

You’re right, and in this case “foundational work” even undersells how minimal this work really is compared to the results it already gets.

I recommend that people watch Brandt Bucher’s “A JIT Compiler for CPython” from last year’s CPython Core Developer Sprint[0]. It gives a good impression of the current implementation and its limitations, and some hints at what may or may not work out. It also indirectly gives a glimpse into the process of getting this into Python through the exchanges during the Q&A discussion.

One thing to especially highlight is that this copy-and-patch has a much, much lower implementation complexity for the maintainers, as a lot of the heavy lifting is offloaded to LLVM.

Case in point: as of the talk this was all just Brandt Bucher’s work. The implementation at the time was ~700 lines of “complex” Python, ~100 lines of “complex” C, plus of course the LLVM dependency. This produces ~3000 lines of “simple” generated C, requires an additional ~300 lines of “simple” hand-written C to come together, and no further dependencies (so no LLVM necessary to run the JIT. Also “complex” and “simple” qualifiers are Bucher’s terms, not mine).

Another thing to note is that these initial performance improvements are just from getting this first version of the copy-and-patch JIT to work at all, without really doing any further fine-tuning or optimization.

This may have changed a bit in the months since, but the situation is probably still comparable.

So if one person can get this up and running in a few klocs, most of which are generated, I think it’s reasonable to have good hopes for its future.

[0] https://www.youtube.com/watch?v=HxSHIpEQRjs

An important context here is that the same code was reused for interpreter and JIT implementations (that’s a main selling point for copy-and-patch JIT). In the other words, this 2–9% improvement mostly represents the core interpreter overhead that JIT should significant reduce. It was even possible that JIT itself might have no performance impact by itself, so this result is actually very encouraging; any future opcode specialization and refinement should directly translate to a measurable improvement.

Copy&patch seems not much worse than compiling pure Python with Cython, which roughly corresponds to “just call whatever CPython API functions the bytecode interpreter would call for this bunch of Python”, so that’s roughly a baseline for how much overhead you get from the interpeter bit.

There is no reason to use copy-and-patch JIT if that were the case, because the good old threaded interpreter would have been fine. There are other optimization works in parallel with this JIT effort, including finer-grained micro operations (uops) that can replace usual opcodes at higher tiers. Uops themselves can be used without JIT, but the interpreter overhead is proportional to the number of (u)ops executed and would be too large for uops. The hope is that copy-and-patch JIT combined with uops have to be much faster than threaded code.

From the write-up, I honestly don’t understand how this paves the way. I don’t see an architectural path from a cut-and-paste JIT to something optimizing. That’s the whole point of a cut-and-paste JIT.

> . I don’t see an architectural path from a cut-and-paste JIT to something optimizing.

One approach used in V8 is to have a dumb-but-very-fast JIT (ie. this), and keep counters of how often each block of code runs (perhaps actual counters, perhaps using CPU sampling features), and then any block of code running more than a few thousand times run through a far more complex yet slower optimizing jit.

That has the benefit that the 0.2% of your code which uses 95% of the runtime is the only part that has to undergo the expensive optimization passes.

Note that V8 didn’t have a dumb-but-very-fast JIT (Sparkplug) until 2021; the interpreter (Ignition) did that block counting and sent it straight to the optimizing JIT (TurboFan).

V8 pre-2021 (i.e., only Ignition+TurboFan) was significantly faster than current CPython is, and the full current four-tier bundle (Ignition+Sparkplug+Maglev+TurboFan) only scores roughly twice as good on Speedometer as pure Ignition does. (Ignition+Sparkplug is about 40% faster than Ignition alone; compare that “dumbness” with CPython’s 2–9%.) The relevant lesson should be that things like very carefully designed value representation and IR is a much more important piece of the puzzle than having as many tiers of compilation as possible.

> keep counters of how often each block of code runs … and then any block of code running more than a few thousand times run through a far more complex yet slower optimizing jit.

That’s just all JITs. Sometimes its counters for going from interpreter -> JIT rather than levels of JITs, but this idea is as old as JITs.

Isn’t it the case that Python allows for type specifier (type hints) since 3.5, albeit the CPython interpreter ignores them? The JIT might take advantage of them, which ought to improve performance significantly for some code.

That what makes Python flexible is what makes it slow. Restricting the flexibility were possible offers opportunities to improve performance (and allows for tools and humans to spot errors more easily).

AFAIK good JITs like V8 can do runtime introspection and recompile on the fly if types change. Maybe using the type hints will be helpful but I don’t think they are necessary for significant improvement.

Well, GraalPython is a Python JIT compiler which can exploit dynamically determined types, and it advertises 4.3x faster, so it’s possible to do drastically better than a few percent. I think that’s state of the art but might be wrong.

That’s for this benchmark:

https://pyperformance.readthedocs.io/

Note that this is with a relatively small investment as these things go, the GraalPython team is about ~3 people I guess, looking at the GH repo. It’s an independent implementation so most of the work went into being compatible with Python including native extensions (the hard part).

But this speedup depends a lot on what you’re doing. Some types of code can go much faster. Others will be slower even than CPython, for example if you want to sandbox the native code extensions.

I doubt it with a copy-and-patch JIT, not the way they work now. I’m a serious mypy/python-static-types user and as is they currently wouldn’t allow you to do much optimization wise.

– All integers are still big integers

– Use of the typing opt-out ‘Any’ is very common

– All functions/methods can still be overwritten at runtime

– Fields can still be added and removed from objects at runtime

The combination basically makes it mandatory to not use native arithmetic, allocate everything on the heap, and need multiple levels of indirection for looking up any variable/field/function. CPU perf nightmare. You need a real optimizing JIT to track when integers are in a narrow range and things aren’t getting redefined at runtime.

To the contrary. In CL some flexibility was given up (compared to other LISP dialects) in favor of enabling optimizing compilers, e.g. the standard symbols cannot be reassigned (also preserving the sanity of human readers). CL also offers what some now call ‘gradual typing’, i.e. optional type declarations. And remaining flexibility, e.g. around the OO support, limits how well the compiler can optimize the code.

But type declarations in Python are not required to be correct, are they? You are allowed to write

    def twice(x: int) -> int:
        return x + x

    print(twice("nope"))

and it should print “nopenope”. Right?

Surely this is the job for a linter or code generator (or perhaps even a hypothetical ‘checked’ mode in the interpreter itself)? Ain’t nobody got time to add manual type checks to every single function.

Of course, this is not a good example of good, high-performance code, only an answer to the specific question… the questioner certainly also knows MyPy.

I actually don’t know anything about MyPy, only that it exists. Does it run that example correctly, that is, does it print “nopenope”? Because I think it’s the correct behaviour, type hints should not actually affect evaluation (well, beyond the fact that they must be names that are visible in the scopes thay’re used in, obviously), altough I could be wrong.

Besides, my point was that one of the reasons why languages with (sound-ish) static types manage to have better performance because they can omit all of those run-time type checks (and the supporting machinery) because they’d never fail. And if you have to put those explicit checks, then the type hints are actually entirely redundant: e.g. Erlang’s JIT ignores type specs, it instead looks at the type guards in the code to generate specialized code for the function bodies.

It should be fairly easy to add instruction fusing, where they recognize often-used instruction pairs, combine their C code, and then let the compiler optimize the combined code. Combining LOAD_CONST with the instruction following it if that instruction pops the const from the stack seems an easy win, for example.

In the interpreter, I don’t think it wouldn’t reduce overhead much, if at all. You’d still have to recognize the two byte codes, and your interpreter would spend additional time deciding, for most byte code pairs, that it doesn’t know how to combine them.

With a compiler, that part is done once and, potentially, run zillions of times.

If fusing a certain pair would significantly improve performance of most code, you’d just add that fused instruction to your bytecode and let the C compiler optimize the combined code in the interpreter. I have to assume CPython as already done that for all the low hanging fruit.

In fact, for such a fused instruction to be optimized that way on a copy-and-patch JIT it’d need to exist as a new bytecode in interpreter. A JIT that fuses instructions is no longer a copy-and-patch JIT.

A copy-and-patch JIT reduces interpretation overhead by making sure the branches in the executed machine code are the branches in the code to be interpreted, not branches in the interpreter.

This is make a huge difference in more naive interpreters, not so much in an heavily optimized threaded-code interpreter.

The 10% is great, and nothing to sneeze at for a first commit. But I’d actually like some realistic analysis of next steps for improvement, because I’m skeptical instruction fusing and other things being hand waved are it. Certainly not on a copy-and-patch JIT.

For context: I spent significant effort trying to add such instruction fusing to a simple WASM AOT compiler and got nowhere (the equivalent of constant loading was precisely one of the pairs). Only moving to a much smarter JIT (capable of looking at whole basic blocks of instructions) started making a difference.

Support for generating machine code at all seems like a necessary building block to me and probably is quite a bit of effort to work on top of a portable interpreter code base.

I wouldn’t be so enthusiastic. Look at other languages that have JIT now: Ruby and PHP. After years of efforts, they are still an order of magnitude slower than V8 and even PyPy [1]. It seems to me that you need to design a JIT implementation from ground up to get good performance – V8, Dart and LuaJIT are like this; if you start with a pure interpreter, it may be difficult to speed it up later.

[1] https://github.com/attractivechaos/plb2

PyPy is designed from the ground up and is still slower than V8 AFAIK. Don’t forget that v8 has enormous amounts of investment from professionally paid developers whereas PyPy is funded by government grants. Not sure about Ruby & PHP and it’s entirely possible that the other JIT implementations are choosing simplicity of maintenance over eking out every single bit of performance.

Python also has structural challenges like native extensions (don’t exist in JavaScript) where the API forces slow code or massive hacks like avoiding the C API at all costs (if I recall correctly I read that’s being worked on) and the GIL.

One advantage Python had is the ability to use multiple cores way before JS but the JS ecosystem remained single threaded longer & decided to use message passing instead to build WebWorkers which let the JIT remain fast.

PyPy is only twice as slow as v8 and is about an order of magnitude faster than CPython. It is quite an achievement. I would be very happy if CPython could get this performance but I doubt.

Anyone know if there will be any better tools for cross-compiling python projects?

The package management and build tools for python have been so atrociously bad (environments add far too much complexity to the ecosystem) that it turns many developers away from the language altogether.
A system like Rust’s package management, build tools, and cross compilation capability is an enormous draw, even without the memory safety. The fact that it actually works (because of the package management and build tools) is the main reason to use the language really. Python used to do that ~10 years ago. Now absolutely nothing works. It takes weeks to get simple packages working, only can do anything under extremely brittle conditions that nullify the project you’re trying to use this other package for, etc.

If python could ever get it’s act together and make better package management, and allow for cross-compiling, it could make a big difference.
(I am aware of the very basic fact that it’s interpreted rather than compiled yada yada – there are still ways to make executables, they are just awful). Since python is data science centric, it would be good to have decent data management capabilities too, but perhaps that could be after fundamental problem are dealt with.

I tried looking at mojo, but it’s not open source, so I’m quite certain that kills any hope of it ever being useful at all to anyone. The fact that I couldn’t even install it without making an account made me run away as fast as possible.

I can’t answer your initial question, but I do like to pile onto the package management points.

Package consumption sucks so bad, since the sensible way of using are virtual envs where you copy all dependencies. Then for freezing venvs or dumping package versions, so you can port your project to a different system, doesn’t consider only packages actually used/imported in code, but it just dumps everything in the venv. The fact you need external tools for this is frustrating.

Then there is package creation. Legacy vs modern approach, cryptic __init__ files, multiple packaging backends, endless sections in pyproject.toml, manually specifying dependencies and dev-dependencies, convoluted ways of getting package metadata actually in code without having it in two places (such as CLI programs with –version).

Cross compilation really would be a nice feature to simply distribute a single file executable. I haven’ tested it, but a Linux system with Wine should in theory be capable of “cross” compiling between Linux and Windows.

Still, like you, as a beginning I would prefer a sensible package management and package creation process.

“It takes weeks to get simple packages working”

Can you expand on what you mean by that? I have trouble imagining a Python packaging problem that takes weeks to resolve – I’d expect them to either be resolvable in relatively short order or for them to prove effectively impossible such that people give up.

– Trying to figure out what versions the scripts used and specifying them in a new poetry project
– Realizing some OS-dependent software is needed so making a docker file/docker-compose.yml
– Getting some of it working in the container with a poetry environment
– Realizing that other parts of the code work with other versions, so making a different poetry environment for those parts
– Trying to tie this package/container as a dependency of another project
– Oh actually, this is a dependency of a dependency
– How do you call a function from a package running in a container with multiple poetry environments in a package?
– What was I doing again?
– 2 weeks have passed trying to get this to work, perhaps I’ll just do something else

Rinse and repeat.

¯\_(ツ)_/¯
That’s python!

Honestly, 2-9% already seems like a very signficant improvement, especially since as they mention “remember that CPython is already written in C”. Whilst it’s great to look at the potential for even greater gains by building upon this work, I feel we shouldn’t undersell what’s been accomplished.

> “remember that CPython is already written in C”

What is this supposed to say? Most scripting language interpreters are written in low level languages (or assembly), but that alone doesn’t say anything about the performance of the language itself.

I think they mean that a lot of runtime of any benchmark is going to be spent in the C bits of the standard library, and therefore not subject to the JIT. Only the glue code and the bookkeeping or whatnot that the benchmark introduces would be improved by the JIT. This reduces the impact that the JIT can make.

This means, that a lot of python libraries like polars or tensorflow are written not in python.

So python programs, that already spend most of its cpu time running these libraries code, won’t see much of an impact.

Isn’t the point that if pure Python was faster they wouldn’t need to be written in other [compiled] languages? Having dealt with Cython it’s not bad, but if I could write more of my code in native Python my development experience would be a lot simpler.

Granted we’re still very far from that and probably won’t ever reach it, but there definitely seems to be a lot of progress.

Since Nim compiles to C, a middle step worth being aware of is Nim + nimporter which isn’t anywhere near “just python” but is (maybe?) closer than “compile a C binary and call it from python”.

Or maybe it’s just syntactic sugar around that. But sugar can be nice.

Also recall that a 50% speed improvement in SQLite was caused by 50-100 different optimisations that each eeked out 0.5-1% speedups. On phone now don’t have the ref but it all adds up.

Many small improvements is the way to go in most situations. It’s not great clickbait, but we should remember that we got from a single cell at some time to humans through many small changes. The world would be a lot better if people just embraced the grind of many small improvements…

That’s true, and Rust compiler speed has seen similar speedups from lots of 1% improvements.

But even if you can get a 2x improvement from lots of 1% improvements (if you work really really hard), you’re never going to get a 10x improvement.

Rust is never going to compile remotely as quickly as Go.

Python is never going to be remotely as fast as Rust, C++, Go, Java, C#, Dart, etc.

Does it matter?

Trains are never going to beat jets in pure speed. But in certain scenarios, trains make a lot more sense to use than jets, and in those scenarios, it is usually preferable having a 150 mph train to a 75 mph train.

Looking at the world of railways, high-speed rail has attracted a lot more paying customers than legacy railways, even though it doesn’t even try to achieve flight-like speeds.

Same with programming languages, I guess.

What is the programming analogy here?

Two decades ago, you could (as e.g. Paul Graham did at the time) argue that dynamically typed languages can get your ideas to market faster so you become viable and figure out optimization later.

It’s been a long time since that argument held. Almost every dynamic programming language still under active development is adding some form of gradual typing because the maintainability benefits alone are clearly recognized, though such languages still struggle to optimize well. Now there are several statically typed languages to choose from that get those maintainability benefits up-front and optimize very well.

Different languages can still be a better fit for different projects, e.g. Rust, Go, and Swift are all statically typed compiled languages better fit for different purposes, but in your analogy they’re all jets designed for different tactical roles, none of them are “trains” of any speed.

Analogies about how different programming languages are like different vehicles or power tools or etc go way back and have their place, but they have to recognize that sometimes one design approach largely supersedes another for practical purposes. Maybe the analogy would be clearer comparing jets and trains which each have their place, to horse-drawn carriages which still exist but are virtually never chosen for their functional benefits.

I cut my teeth on C/C++, and I still develop the same stuff faster in Python, with which I have less overall experience by almost 18 years. Python is also much easier to learn than, say, Rust, or the current standard of C++ which is a veritable and intimidating behemoth.

In many domains, it doesn’t really matter if the resulting program runs in 0.01 seconds or 0.1 seconds, because the dominant time cost will be in user input, DB connection etc. anyway. But it matters if you can crank out your basic model in a week vs. two.

> Python is also much easier to learn than, say, Rust

I don’t doubt it, but learning is only the first step to using a technology for a series of projects over years or even decades, and that step doesn’t last that long.

People report being able to pick up Rust in a few weeks and being very productive. I was one of them, if you already got over the hill that was C++ then it sounds like you would be too. The point is that you and your team stay that productive as the project gets larger, because you can all enforce invariants for yourselves rather than have to carry their cognitive load and make up the extra slack with more testing that would be redundant with types.

Outside of maybe a 3 month internship, when is it worthwhile to penalize years of software maintenance to save a few weeks of once-off up-front learning? And it’s not like you save it completely, writing correct Python still takes some learning too, e.g. beginners easily get confused about when mutable data structures are silently being shared and thus modified when they don’t expect it. People who are already very comfortable with Python forget this part of their own learning curve, just like people very comfortable with Rust forget their first borrow check header scratcher.

I never made a performance argument in this thread so I’m not sure why 0.01 or 0.1 seconds matters here. Even the software that got you into a commercial market has to be maintained once you get there. Ask Meta how they feel about the PHP they’re stuck with, for example.

I removed the SDKs of some big (big for the wrong reasons) open source projects which generates a lot of code using python3 scripts.

In those custom SDKs, I do generate all the code at the start of the build, which takes a significant amount of time for mostly non-pertinent anymore/inappropiately done code generation.. I will really feel python3 speed improvement for those builds.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *