Uncategorized

Python Packaging, One Year Later: A Look Back at 2023 in Python Packaging


A year ago, I wrote about the sad state of Python packaging. The large number of tools in the space, the emphasis on writing vague standards instead of rallying around the One True Tool, and the complicated venv-based ecosystem instead of a solution similar to node_modules. What has changed in the past year? Has anything improved, is everything the same, or are things worse than they were before?

The standards

Last time we looked at packaging standards, we focused on PEP 582. It proposed the introduction of __pypackages__, which would be a place for third-party packages to be installed to locally, on a per-project basis, without involving virtual environments, similarly to what node_modules is for node. The PEP was ultimately rejected in March 2023. The PEP wasn’t perfect, and some of its choices were questionable or insufficient (such as not recursively searching for __pypackages__ in parent directories, or focusing on simple use-cases only). No new standards for something in that vein (with a better design) were proposed to this day.

Another contentious topic is lock files. Lock files for packaging systems are useful for reproducible dependency installations. The lock file records all installed packages (i.e. includes transitive dependencies) and their versions. Lock files often include checksums (like sha512) of the installed packages, and they often support telling apart packages installed via different groups of dependencies (runtime, buildtime, optional, development, etc.).

The classic way of achieving this goal are requirements.txt files. They are specific to pip, and they only contain a list of packages, versions, and possibly checksums. Those files can be generated by pip freeze, or the third-party pip-compile from pip-tools. pip freeze is very basic, pip-compile can’t handle different groups of dependencies other than making multiple requirements.in files, compiling them, and hoping there are no conflicts.

Pipenv, Poetry, and PDM have their own lockfile implementations, incompatible with one another. Rye piggybacks on top of pip-tools. Hatch doesn’t have anything in core; they’re waiting for a standard implementation (there are some plugins though). PEP 665 was rejected in January 2022. Its author, Brett Cannon, is working on a PoC of something that might become a standard (named mousebender).

This is the danger of the working model adopted by the Python packaging world. Even for something as simple as lock files, there are at least four incompatible standards. An attempt at a specification was rejected due to “lukewarm reception”, even though there exist at least four implementations which are achieving roughly the same goals, and other ecosystems also went through this before.

Another thing important to Python are extension modules. Extension modules are written in C, and they are usually used to interact with libraries written in other languages (and also sometimes for performance). Poetry, PDM, and Hatchling don’t really support building extension modules. Setuptools does; SciPy and NumPy migrated from their custom numpy.distutils to Meson. The team behind the PyO3 Rust bindings for Python develops Maturin, which allows for building Rust-based extension modules — but it’s not useful if you’re working with C.

There weren’t many packaging-related standards that were accepted in 2023. A standard worth mentioning is PEP 668, which allows distributors to prevent pip from working (to avoid breaking distro-owned site packages) by adding an EXTERNALLY-MANAGED file. It was accepted in June 2022, but pip only implemented support for it in January 2023, and many distros already have enabled this feature in 2023. Preventing broken systems is a good thing.

But some standards did make it through. Minor and small ones aside, the most prominent 2023 standard would be PEP 723: inline script metadata. It allows to add a comment block at the top of the file, that specifies the dependencies and the minimum Python version in a way that can be consumed by tools. Is it super useful? I don’t think so; setting up a project with pyproject.toml would easily allow things to grow. If you’re sending something via a GitHub gist, just make a repo. If you’re sending something by e-mail, just tar the folder. That approach promotes messy programming without source control.

Learning curves and the deception of “simple”

Microsoft Word is simple, and a great beginner’s writing tool. You can make text bold with a single click. You can also make it blue in two clicks. But it’s easy to make an inconsistent mess. To make section headers, many users may just make the text bold and a bit bigger, without any consistency or semantics . Making a consistent document with semantic formatting is hard in Word. Adding section numbering requires you to select a heading and turn it into a list. There’s also supposedly some magic involved, that magic doesn’t work for me, and I have to tell Word to update the heading style. Even if you try doing things nicely, Word will randomly break, mess up the styles, mix up styles and inline ad-hoc formatting, and your document may look differently on different computers.

LaTeX is very confusing to a beginner, and has a massive learning curve. And you can certainly write \textbf{hello} everywhere. But with some learning, you’ll be producing beautiful documents. You’ll define a \code{} command that makes code monospace and adds a border today, but it might change the background and typeset in Comic Sans tomorrow if you so desire. You’ll use packages that can render code from external files with syntax highlighting. Heading numbering is on by default, but it can easily be disabled for a section. LaTeX can also automatically put new sections on new pages, for example. LaTeX was built for scientific publishing, so it has stellar support for maths and bibliographies, among other things.

Let’s now talk about programming. Python is simple, and a great beginner’s programming language. You can write hello world in a single line of code. The syntax is simpler, there are no confusing leftovers from C (like the index-based for loop) or machine-level code (like break in switch), no pointers in sight. You also don’t need to write classes at all; you don’t need to write a class only to put a public static void main(String[] args) method there . You don’t need an IDE, you can just write code using any editor (even notepad.exe will do for the first day or so), you can save it as a .py file and run it using python whatever.py.

Your code got more complicated? No worry, you can split it into multiple .py files, use import name_of_other_file_without_py and it will just work. Do you need more structure, grouping into folders perhaps? Well, forget about python whatever.py, you must use python -m whatever, and you must cd to where your code is, or mess with PYTHONPATH, or install your thing with pip. This simple yet common action (grouping things into folders) has massively increased complexity.

The standard library is not enough and you need a third-party dependency? You find some tutorial that tells you to pip install, but pip will now tell you to use apt. And apt may work, but it may give you an ancient version that does not match the tutorial you’re reading. Or it may not have the package. Or the Internet will tell you not to use Python packages from apt. So now you need to learn about venvs (which add more complexity, more things to remember; most tutorials teach activation, venvs are easy to mess up via basic operations like renaming a folder, and you may end up with a venv in git or your code in a venv). Or you need to pick one of the many one-stop-shop tools to manage things.

In other ecosystems, an IDE is often a necessity, even for beginners. The IDE will force you into a project system (maybe not the best or most common one by default, but it will still be a coherent project system). Java will force you to make more than one file with the “1 public class = 1 file” rule, and it will be easy to do so, you won’t even need an import.

Do you want folders? In Java or C#, you just create a folder in the IDE, and create a class there. The new file may have a different package/namespace, but the IDE will help you to add the correct import/using to the codebase, and there is no risk of you using too many directories (including something like src) or using too few (not making a top-level package for all your code) that will require correcting all imports. The disruption from adding a folder in Java or C# is minimal.

The project system will also handle third-party packages without you needing to think about where they’re downloaded or what a virtual environment is and how to activate it from different contexts. A few clicks and you’re done. And if you don’t like IDEs? Living in the CLI is certainly possible in many ecosystems, they have reasonable CLI tools for common management tasks, as well as building and running your project.

PEP 723 solves a very niche problem: dependency management for single-file programs. Improving life for one-off things and messy code was apparently more important to the packaging community than any other improvements for big projects.

By the way, you could adapt this lesson to static and dynamic typing. Dynamic typing is easier to get started with and requires less typing, but compile-type checking can prevent many bugs — bugs that require higher test coverage to catch with dynamic typing. That’s why the JS world has TypeScript, that’s why mypy/pyright/typing has gained a lot of mindshare in the Python world.

The future…

Looking at the Python Packaging Discourse, there were some discussions about ways to improve things.

For example, this discussion about porting off setup.py was started by Gregory Szorc, who had a long list of complaints, pointing out the issues with the communication from the packaging world, and documentation mess (his post is worth a read, or at least a skim, because it’s long and full of packaging failures). There’s one page which recommends setuptools, another which has four options with Hatchling as a default, and another still promoting Pipenv. We’ve seen this a year ago, nothing changed in that regard. Some people tried finding solutions, some people shared their opinions… and then the Discourse moderator decided to protect his PyPA friends from having to read user feedback and locked the thread.

Many other threads about visions were had, like the one about 10-year views or about singular packaging tools. The strategy discussions, based on the user survey, had a second part (the first one concluded in January 2023), but it saw less posts than the first one, and discussions did not continue (and there were discussions about how to hold the discussions). There are plans to create a packaging council — design-by-committee at its finest.

But all those discussions, even when not locked by an overzealous moderator, haven’t had any meaningful effect. The packaging ecosystem is still severely fragmented and confusing. The PyPA docs and tutorials still contradict each other. The PyPA-affiliated tools still have less features than the unaffiliated competition (even the upstart Rye has some form of lockfiles, unlike Hatch or Flit), and going by the PEP 517 build backend usage statistics, they are more popular than the modern PyPA tools. The authors of similar yet competing tools have not joined forces to produce the One True Packaging Tool.

…is looking pretty bleak

On the other hand, if you look at the 2023 contribution graphs for most packaging tools, you might be worried about the state of the packaging ecosystem.

  • Pip has had a healthy mix of contributors and a lot of commits going into it.

  • Pipenv and setuptools have two lead committers, but still a healthy amount of commits.

  • Hatch, however, is a one-man-show: Ofek Lev (the project founder) made 184 commits, the second place belongs to Dependabot with 6 commits, and the third-place contributor (who is a human) has five commits. The bus factor of Hatch and Hatchling is 1.

The non-PyPA tools aren’t doing much better:

  • Poetry has two top contributors, but at least there are four human contributors with a double-digit number of commits.

  • PDM is a one-man-show, like Hatch.

  • Rye has one main contributor, and three with a double-digit number of commits; note it’s pretty new (started in late April 2023) and it’s not as popular as the others.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *