Uncategorized

Know your Python container types


Published: December 24, 2023. Filed under:
Django, Python.

This is the last of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.

Python contains multitudes

There are a lot of container types available in the Python standard library, and it can be confusing sometimes to keep track of them all. So since it’s Christmas Eve and time to wrap any last-minute gifts, I’d like to wrap up this “Advent calendar” series with a guide to the most common types of data containers and what kinds of things you might want to wrap in them.

  • list is a mutable data type, and often used to store multiple objects of the same type — though unlike, say, arrays in other languages like C, there’s no requirement that all values in a list be of the same type. Just note that many static type checkers for Python will default to assuming a list’s contents are heterogeneous (i.e., if it sees you put an int in a given list, it will assume a type of list[int] and error if you then add a value of another type).
  • tuple is an immutable/heterogeneous data type, closer in purpose to the “record” types or structs of other languages. Very often you’ll see code which generates lots of tuples which each have the same “structure” — for example, a color library might represent color values as 3-tuples of int.
  • collections.namedtuple and typing.NamedTuple are two different ways to write the same thing: tuple subclasses with fields that can be accessed by name as well as by numeric index, and instantiated using keyword-argument syntax. The key difference is the typing.NamedTuple version supports a type-hint-based declarative syntax. I like using named tuples as a way to define tuple types that will be reused a lot (in the example above, it would probably make sense to define an RGBColor named tuple with red, green, and blue fields).
  • set is a container which enforces uniqueness of its elements. No matter how many times you add the same value to a set, it still ends up with only one copy.
  • dict (short for “dictionary”) is a hash table, mapping keys to values; there is no requirement that all keys or values be the same type, but type checkers will generally still assume such a requirement. There’s also typing.TypedDict for explicitly type-hinting the expected structure of a dictionary.
  • dataclasses.dataclass is not really a “container” at all, though it sometimes gets used as one. The dataclass decorator is primarily a shortcut for declaring a class with a set of attributes (using type-hint syntax) and having it auto-derive a constructor for you which will expect arguments for those attributes and set them appropriately (though it will not do runtime type-checking of the values of those arguments).

There are also other container types in the standard library — the collections module and the array module, for example, provide some specialized container types like Counter, which acts as a histogram structure, or array.array which works like a numeric-type array in C — but they’re more rarely used.

In general, my advice is:

  • Use a list for most cases where you just want an iterable/indexable sequence.
  • Use a tuple as a struct-like type where multiple instances will have the same structure, but consider using named tuples to make that structure clearer (a lot of people don’t like named tuples because of the fact that they support iteration and numeric indexing as well as named field access, but I don’t personally mind this).
  • Use dict for key-value mappings.
  • Use set when uniqueness matters, though this is not super common; most of the value of sets is in the union/intersection/etc. operations they support.
  • Don’t use dataclass as a “super-tuple”; if what you really want is just a plain data container with named field access, just use a named tuple. Use dataclass when you also want the result to be an ordinary mutable Python class (tuples are immutable) with potentially extra behavior attached via methods.
  • Avoid most of the other container types unless you know they’re the right thing for your specific use case. And if you’re not sure whether they’re right, they aren’t; you generally will just know when one of them is the right fit (for example, collections.Counter is very useful for the exact specific thing it does, and not really useful at all for anything else).



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *