Discussion of idea: Auto-detection of build dependencies [GSoC][Application Process]

Mojca Miklavec mojca at macports.org
Thu Feb 28 07:35:15 UTC 2019


On Thu, 28 Feb 2019 at 06:25, arghya bhattacharay
<arghya.b at research.iiit.ac.in> wrote:
>>
>> Hello all,
>>
>> On Feb 28, 2019 4:21 AM, Mojca Miklavec <mojca at macports.org> wrote:
>>
>> Dear Arghya,
>>
>> Welcome in the MacPorts community!
>
>> On Wed, 27 Feb 2019 at 17:37, Arghya Bhattacharya wrote:
>> >
>> > - Is the project in reference to changing "port reclaim", more specifically change port reclaim to not delete build dependencies?
>>
>> These two are not related, but both tasks could be worked on. They
>> potential mentor for any of those tasks would be Marcus, so I hope
>> that he jumps in with more relevant answers.
>>
>> I assume that you took auto-detection of build dependencies from our
>> list of ideas, and the "port reclaim" from a recent discussion on the
>> mailing list?
>
> Yes, I went through the archives hoping to find more insight into the projects from the discussions.
>
> Would  Port reclaim be a separate project in that case?

It's a different and relatively independent task, but either of those
two tasks is probably not enough on its own to fill the entire summer,
so these two could easily be combined.

> Seems like an extension of computing build dependencies to me. Maybe I'm missing the point here.

# RECLAIM

"port reclaim" was a GSOC project a few years back:
    https://trac.macports.org/wiki/ksammons

What it does is take care of removing stuff you no longer need.
If you've been using MacPorts for a long time and never run a cleanup,
old versions start accumulating on the machine.

The "port reclaim" command interactively guides you through the
process of removal of those files:
- all the sources you no longer need
- all the binary packages that are no longer active
- all the installed packages that you have never explicitly requested
and that have no dependent ports

The last one is sometimes problematic and was recently discussed on the list.

For example, if you install some port that depends on perl5.26, then
update the port which now requires perl5.28 (but you never actually
need perl5.26 yourself), the "port reclaim" would offer you to remove
perl5.26.

This is all fine and well, except that it might also offer you to
remove the latest version of compiler (which you have painfully built
yourself on a super slow 15-year old PowerPC machine) or git or
something else that will be needed as soon as you update your next
port. Technically the compiler is not needed to run the software, but
it's painful to have to rebuild it next time when you update any other
port.

To fix this behaviour you may need to improve the database of
installed ports (and fix the install phase, so that it writes the
necessary information to the database), so that it would be easy
enough to figure out whether any installed ports have a build
dependency on the port in question. I believe we are currently lacking
that information in the database, but all the information is already
encoded in the Portfiles, so it's not like you would need to invent
that information yourself. And once you have that information in the
database, fix port reclaim, so that it can easily include or exclude
removal of such build dependencies.

# BUILD DEPENDENCIES

This is a different scenario. Suppose that you want to package a new
piece of software (or update to a new version which just switched to
completely different dependencies). In this case you need to assume
that all the information about dependencies that you might have in
your Portfile is plain wrong (either lacking or listing too many
dependencies; with "port reclaim" on the other hand you assume that
information in the Portfile is 100% correct).

The idea is to observe which files are being accessed during the
installation process, then consult port registry (database) to tell
you to which port the accessed file(s) belongs, and finally assemble
the list of ports that have been used during the installation, and
report that list to the user, so that the user would manually compare
the reported list of dependencies versus the dependencies specified in
the Portfile. Or well, the "port --something install" would tell the
user:
- here's a list of dependencies that you declared and were used: ...
- here's a list of dependencies that were used, but are missing in the
Portfile: ...
- here's a list of ports that you declared, but were never accessed: ...

The ideas how to start should come from the way "trace mode" is
implemented. That one forbids you to access dependencies that were not
declared beforehand, which is a slightly different strategy, but
similar nature of approach.

> Any Port would have compulsory dependencies and optional dependencies.
> So the issue is with detecting which optional dependencies are being used?

The issue is that the first time you try to package software you don't
even know which dependencies are compulsory before you spend a lot of
time researching, checking the sources, configure script, Makefiles
etc. Even if you don't specify any dependencies at all, the software
will still build.

At this point I would ignore the notion of compulsory vs. optional
dependencies, that's for later. Consider any dependency that you want
a compulsory dependency. It's just that currently we have absolutely
no idea which dependencies are needed. We go from assumption that the
Portfile is incomplete.

> How does the problem persist with compulsory dependencies too? Aren't they listed in the Port configuration files?

If talking about this project idea, you need to assume that the
dependencies are not listed anywhere.

>> * many dependencies link opportunistically: they'll be used if
>> present, and skipped if absent.
>
> these are the dependencies to install the Port or the ones needed to run them?

Dependencies needed to run the port (like an optional python library)
are not problematic. They might be missing during the build, but you
may install that library any time later and you'll get full
functionality.

Opportunistic linking is a problem during the build.

It's problematic if you have software that will link against Qt if Qt
is installed, but install without complaining (and crippled /
different functionality) when Qt is missing. I don't know what
software you know, but if you have ever used "gnuplot": it can either
work with Aqua, with X11, with Qt, with wxWidgets ... any of those
will work, but if you don't have X11 or Qt or wxWidgets, it will just
build without support for that UI. If you install X11 after installing
gnuplot, you won't be able to use gnuplot with X11, but if you install
X11 before gnuplot, you would be able to use it. (The portfile for
gnuplot doesn't have that issue since we explicitly enable and disable
functionality, it's just to demonstrate what I meant.)

>> We would want the "port" command to automatically report which
>> dependencies were used during the build (either build dependencies
>> like pkg-config or libraries, ...)
>
> By keeping track of what files are accessed during build ?

Yes.

>> (2)
>> We recently discussed the issue that "port reclaim" removes build
>> dependencies. This would be also nice to fix, yes.
>
>
> So, if AutoDetecting & storing all dependencies of a port becomes a method attribute of the Port.

What do you mean by method attribute?

> Then this would be as trivial as accessing the list on a trigger of port reclaim?

I'm not sure what you asked, but I tried to explain above. Yes, you
need to read from the database and simply don't let reclaim delete the
build dependencies (you must have assembled that list/database
before). This step should be easy.

> I've used the git method of install, updated head of my installation to my fork of port-base.
> I just don't find any instructions on how I can compile my changes to the tcl scripts so I can test the commands with my changes ?

In case this particular question / answer gets too long & unresolved,
please open a new thread on the mailing list. There might be some
people who are not particularly interested in GSOC and might skip this
too long thread, but they might be able to help you with this small
independent question.

What I usually did was to simply install MacPorts from the installer
first (just to get everything in place; but also because I had MP
installed long before I even started touching anything), and then do
    ./configure
    make
    sudo make install
from the git clone of macports-base. That would overwrite the existing
MacPorts installation and you'll start using your own changes.

If you modify something, just run "make && sudo make install" again.

You probably don't need to install from package first, you just need
to make sure that you have macports-ports somewhere. By default it's
in
    /opt/local/var/macports/sources/rsync.macports.org/macports/release/tarballs/ports
but to me it's useful to have a git clone instead and point to it from
    /opt/local/etc/macports/sources.conf

I would probably start by
    sudo port selfupdate # or one of alternatives like sync
    sudo port install any-port-of-your-choice
and keep playing with the port command. You could increase revision of
the just installed port, run install or upgrade again to get one
inactive port etc.

Mojca


More information about the macports-dev mailing list