|
|
Seems a bit silly to think that- Guido is still involved with Python… and in fact is the one heading the Faster CPython project at Microsoft which is responsible for many of these improvements.
|
|
5.5% compounded over 5 years is a bit over 30%: not a huge amount but an easily noticeable speed-up. What were you thinking of when you typed âsignificantly fasterâ?
|
|
Compunding a decrease works differently than an increase. If something gets 10% faster twice it actually got 19% faster. In other words, the runtime is 90% of 90%, i.e. 81%.
|
|
I envy these small and steady improvements!!
I spent about one week implementing PyPy’s storage strategies in my language’s collection types. When I finished the vector type modifications, I benchmarked it and saw the ~10% speed up claimed in the paperš. The catch is performance increased only for unusually large vectors, like thousands of elements. Small vectors were actually slowed down by about the same amount. For some reason I decided to press on and implement it on my hash table type too which is used everywhere. That slowed the entire interpreter down by nearly 20%. The branch is still sitting there, unmerged. I can’t imagine how difficult it must have been for these guys to write a compiler and succeed at speeding up the Python interpreter. š https://tratt.net/laurie/research/pubs/html/bolz_diekmann_tr… |
|
I’d rather they add up. Minus -5% runtime there, another -5% there… Soon enough, python will be so fast my scripts terminate before I even run them, allowing me to send messages to my past self.
|
|
log(2)álog(1.1) ~= 7.27, so in principle sustained 10% improvements could double performance every 7 releases. But at some point we’re bound to face diminishing returns.
|
|
I personally like Quart, which is like Flask, but with asyncio. Django is also incredibly popular and has been around forever, so it is very battle-tested.
|
|
I’ve been using FastAPI which is made by the same guy as Flask but taking it seriously this time and using asyncio and making space for multithreaded. It’s almost a drop-in replacement for Flask.
|
|
Wasn’t CPython supposed to remain very simple in its codebase, with the heavy optimization left for other implementations to tackle? I seem to remember hearing as much a few years back.
|
|
Template JITs in general aren’t a new technique, but Copy-and-Patch is a specific method of implementing it (leveraging a build time step to generate the templates from C code + ELF relocations).
|
|
QBasic was a slightly cut down version of Quickbasic that didn’t include the compiler, so your assumption was correct in that case. QBasic was bundled with DOS but you had to buy Quickbasic.
|
|
AI heavy lifting isn’t just model training. There’s about a million data pipelines and processes before the training data gets loaded into a PyTorch tensor.
|
|
The article describes that the new JIT is a “copy-and-patch JIT” (I’ve previously heard this called a “splat JIT”). This is a relatively simple JIT architecture where you have essentially pre-compiled blobs of machine code for each interpreter instruction that you patch immediate arguments into by copying over them.
I once wrote an article about very simple JITs, and the first example in my article uses this style: https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-… I take some issue with this statement, made later in the article, about the pros/cons vs a “full” JIT: > The big downside with a âfullâ JIT is that the process of compiling once into IL and then again into machine code is slow. Not only is it slow, but it is memory intensive. I used to think this was true also, because my main exposure to JITs was the JVM, which is indeed memory-intensive and slow. But then in 2013, a miraculous thing happened. LuaJIT 2.0 was released, and it was incredibly fast to JIT compile. LuaJIT is undoubtedly a “full” JIT compiler. It uses SSA form and performs many optimizations (https://github.com/tarantool/tarantool/wiki/LuaJIT-Optimizat…). And yet feels no more heavyweight than an interpreter when you run it. It does not have any noticeable warm up time, unlike the JVM. Ever since then, I’ve rejected the idea that JIT compilers have to be slow and heavyweight. |
|
ÂťLuaJIT is undoubtedly a “full” JIT compiler.ÂŤ
Yes, and it’s practically unmaintained. Pull requests to add support for various architectures have remained largely unanswered, including RISC-V. |
|
To each his own, but the things you list are largely subjective/inaccurate, and there are many, many, many developers who use Python because they enjoy it and like it a lot.
|
|
> but for devs passionate about programming languages, python is a relic they hope vanish
If you asked me what language I would consider to be a relic that I hope would vanish, I’d go with Perl. |
|
Why has it taken so much longer for CPython to get a JIT than, say, PyPy? I would imagine the latter has far less engineering effort and funding put into it.
|
|
âPython code runs 15% faster and and 20% cheaper on azure than aws, thanks to our optimized azurePython runtime. Use it for azure functions and ml trainingâ
Just a guess at the pitch. |
|
what are those future optimization he talks about?
he talks about an IL, but what’s that IL? does that mean that the future optimization will involve that IL? |
|
Well, GraalPython is a Python JIT compiler which can exploit dynamically determined types, and it advertises 4.3x faster, so it’s possible to do drastically better than a few percent. I think that’s state of the art but might be wrong.
That’s for this benchmark: https://pyperformance.readthedocs.io/ Note that this is with a relatively small investment as these things go, the GraalPython team is about ~3 people I guess, looking at the GH repo. It’s an independent implementation so most of the work went into being compatible with Python including native extensions (the hard part). But this speedup depends a lot on what you’re doing. Some types of code can go much faster. Others will be slower even than CPython, for example if you want to sandbox the native code extensions. |
|
But type declarations in Python are not required to be correct, are they? You are allowed to write
and it should print “nopenope”. Right? |
|
Of course, this is not a good example of good, high-performance code, only an answer to the specific question… the questioner certainly also knows MyPy.
|
|
Support for generating machine code at all seems like a necessary building block to me and probably is quite a bit of effort to work on top of a portable interpreter code base.
|
|
PyPy is only twice as slow as v8 and is about an order of magnitude faster than CPython. It is quite an achievement. I would be very happy if CPython could get this performance but I doubt.
|
|
Also recall that a 50% speed improvement in SQLite was caused by 50-100 different optimisations that each eeked out 0.5-1% speedups. On phone now donât have the ref but it all adds up.
|
I highly recommend the blog posts if you’re into learning how languages are implemented, by the way. They’re incredible deep dives, but he uses the details-element to keep the metaphorical descents into Mariana Trench optional so it doesn’t get too overwhelming.
I even had the privilege of congratulating him the 1000th star of the GH repo[3], where he reassured me and others that he’s still working on it despite the long pause after the last blog post, and that this mainly has to do with behind-the-scenes rewrites that make no sense to publish in part.
[0] https://arxiv.org/abs/2011.13127
[1] https://sillycross.github.io/2022/11/22/2022-11-22/
[2] https://sillycross.github.io/2023/05/12/2023-05-12/
[3] https://github.com/luajit-remake/luajit-remake/issues/11