Uncategorized

Conversion between Python and C integers – Core Development



Python supports integers in unlimited range (if memory is enough), C has several types of integers with limited ranges. There are several ways to convert Python integer to C integer and back:

  • Dedicated C API functions like PyLong_AsLong() and PyLong_FromLong().
  • PyArg_Parse() with corresponding format unit like 'l'. Py_BuildValue() with a similar format unit.
  • PyMemberDef with corresponding type like Py_T_LONG.

These sets are not equivalent, especially for unsigned integers.

  1. Most of C API functions except PyNumber_AsSsize_t() has the PyLong_ prefix. There is usually three variants for conversion to the C integer:

    • PyLong_AsLong() converts integers in range LONG_MIN to LONG_MAX to signed long.
    • PyLong_AsUnsignedLong() converts integers in range 0 to ULONG_MAX to usigned long.
    • PyLong_AsUnsignedLongMask() accepts arbitrary integers and convert them to usigned long module ULONG_MAX+1.
  2. PyArg_Parse() has variants of format units for signed and unsigned types. For example, 'l' works like PyLong_AsLong() and 'k' works like PyLong_AsUnsignedLongMask(). There is no variant for PyLong_AsUnsignedLong(), the only way to convert to unsigned long with range check is to use a custom converter.

  3. PyMemberDef API also has variants for signed and unsigned types. Py_T_LONG is equivalent to PyLong_AsLong(), but Py_T_ULONG which converts to unsigned long is more tricky. It accepts Python integers in range LONG_MIN to ULONG_MAX. It is larger than the range of unsigned long, so it converts negative integers in range LONG_MIN to -1 modulo ULONG_MAX+1.

Why there is so strange API for unsigned types? I think there are several reasons:

  • In is not clear whether some types like uid_t or dev_t are implemented as signed or unsigned types (it varies between OSes).
  • Even if some types are unsigned and supports values larger than maximal limit for corresponding type (like uid_t or dev_t on some OSes), some negative values can still be used as special signs for unknown or unavaliable value, so you can see (uid_t)-1 or (size_t)-1 in the C code. It is better to accept Python integer -1 as a special value than require to use 4294967295 or 18446744073709551615.

There are also differences in supporting int-like objects with __index__() method, but this is a different painful issue.

Due to to differences between these three sets, it is diffucult to write a code that supports the same range as argument as a value for attribute setters. It is difficult to change the code from using PyArg_Parse() to manual parsing with the C API and vica verse. How can we unify these APIs? API like PyLong_AsUnsignedLongMask() is the most lenient, but it allows integer overflow errors. Should we limit its range as in Py_T_ULONG? Or maybe limit it even more, allowing only -1 as negative value? There is a specialized private C API like _Py_Uid_Converter() which only accepts -1 as negative value. In some cases any negative value is invalid (when we specify a length etc)and all positive values that fits the target type are valid, so there is a value of more strict PyLong_AsUnsignedLong(). Should we add corresponding strict codes in PyArg_Parse() and PyMemberDef?

I am going to add wrappers for some C structs, and need support of types like uint32_t and off_t for this, so I need to resolve these questions for older types before adding support for new types.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *