Python ports: should we use wheel files?

Sat Mar 31 10:11:55 UTC 2018

Hi all,

While updating the py-tensorflow port to 1.6.0 I started to think about this question: we shortly discussed about it on the corresponding PR (https://github.com/macports/macports-ports/pull/1499) and decided to bring it up in the mailing list.

Safe harbour statement: I'm a Python noob.

The definition of a wheel file is the following:

> Wheel (in this context) is a project that adds the bdist_wheel command to distutils/setuptools. This produces a cross platform binary packaging format (called “wheels” or “wheel files” and defined in PEP 427) that allows Python libraries, even those including binary extensions, to be installed on a system without needing to be built locally.  In the case in point: one TensorFlow dependency has a native component that needs to be built locally.

Now, the problem.  macOS is a platform where TensorFlow is _built_ and _tested_ by upstream.  Some of its dependencies (e.g.: grpcio) are packaged and distributed as wheel files.  Hence, what upstream supports is the state of a system after the dependencies are installed from wheel files.

Given the above assumption, my impression is that we have an opportunity to leverage wheel files instead of rebuild each dependency from source.  The advantages in my opiion are:

  * The _state_ of the system after the installation of the wheel files shall be _exactly_ se seen, tested and validated by upstream.

  * We're not duplicating effort already put in by upstream.

  * We may not be introducing subtle problems.

I'm well aware that the philosophy of MacPorts (and BSD ports in general) is to rebuild every package from source.  But I'm wondering, for the sake of argument, whether a parallel can be drawn between (Python, wheel file, PyPi) and (Java, *AR files, Maven Central).  After all we already ship Java byte-code without recompiling it, and sometimes Java projects ship precompiled native libraries.

Closing remarks: my expertise with Python is very limited, so I apologise if my reasoning may be flaky.  But in the last few months I couldn't but observe that the general guideline given by most Python projects I'm using is (a) installing Python packages (using pip and/or wheel files) or (b) even build a virtual env (again, using a package-driven process).

I like the fact that MacPorts helps us centralise package installations and lets us extends our Python installation using ports.  And from a Portfile author I really appreciate how easy it is to build a port from a PyPi package that uses setuptools to get installed.  But as far as wheel files are concerned, especially wheel files that contain native compiled code, isn't it worthwhile to at least leverage all the compiling, testing, packaging and distribution work that has already been performed by upstream when macOS is a supported platform (as in this case)?

Thanks for sharing your point of view.

Cheers,
-- 
Enrico