GSoC 15 - Dependency Calculation using SAT

Clemens Lang cal at macports.org
Sat Mar 21 11:04:27 PDT 2015


Hi,

----- On 21 Mar, 2015, at 14:00, Jackson Isaac jacksonisaac2008 at gmail.com wrote:

>> The problem with these is that currently, like for so many other things,
>> Portfiles must be completely evaluated (that is, the TCL code executed)
>> in order to have a complete list of dependencies -- and this "complete
>> list" isn't static but depends on the deactivated or activated variants,
>> the platform(!!! -- like darwin, linux, freebsd) ... and subplatform
>> (like macosx) and their specific versions.
>> […]
>> The problem is that this is not static. See above. There are 2^n*
>> different states for n variants. For a port like mplayer-devel (or my
>> own ports, mpv and audacious-plugins), the amount of variant
>> combinations is *very* high and for each combination the dependencies
>> differ.
>>
>> You do NOT want to record this in PortIndex, which is supposed to be
>> fast cache.

Mihi is right on the general analysis. As you see, there's a problem
there, because most SAT solving methods expect you to have a
representation of the complete package universe and find a selection
of necessary dependencies given a list of packages that should be
installed.

Now, if the complete package universe can only be accurately computed
by executing all Portfiles in it, that would increase the time to
compute it to about 15 minutes (per installation!). Since this is
obviously not practical, we need
 (a) a cache, which might become complex due to the combinatorial
     explosion of variants
 (b) a simplification that allows us to avoid the execution of Portfiles
     for accurate dependency calculation (which is basically the same
     as a cache, it's just in a different place). That's what the
     Portindex already is to a certain extent, it's just not used for
     dependency calculation.

A possible solution to the dilemma would be to limit the semantics of
what variants can do and where you are allowed to modify dependencies.
For example:

Find a representation of the effects that a variant has on the set of
dependencies and store that, rather than storing the set of dependencies
after evaluating all variants. That removes the exponential from time
and space requirements of keeping a cache on variants. This data could
be generated by instrumenting a couple of keywords currently used in
Portfiles (variant, variant_isset, if, and all depends_* operations).
The downsides would be
 - we'd have to change semantics so the order of evaluation of the
   effects of variants is undefined (I don't think that's unreasonable)
 - since we would no longer execute variants for dependency info, the
   behavior of variants must not differ depending on a user's environment
   (i.e. depending on whether a user has a certain port installed). I
   think this behavior is bad style anyway, so I would support dropping
   it.
With this in place, we could stop evaluating Portfiles, because we would
be able to compute accurate dependency information *without* running the
Portfile. I'm aware of one case that is often used but would break using
this approach, and that's setting the default_variant of a port
depending on whether other ports are installed. We could support a
workaround for the few ports that do this, e.g. by supporting a small
snippet of code that – if defined – is executed and returns the default
variant selection in the current environment.


> So we will have to calculate the dependency tree while we install.
> Since there can be many variants possible and each will have different
> dependencies.
> 
> Currently, we are fetching the dependencies from the Portfile correct?
> based on the dependencies defined by the maintainer of the port.
> 
> Using SAT method we will be defining minimal dependencies and these
> dependencies will be further checked for their dependencies or we need
> to still define all the dependencies for each package?
> 
> Eg. x depends on y. so in portfile of x say it depends on y. Now we
> will check if y depends on any a,b,c further. Like this compute
> different possible combinations and select the best course and also
> taking the variant into account (using Boolean).
> 
> What if while the port files are indexed we also create another file
> where these dependency trees are pre-computed and can be checked when
> the package needs to be installed. It would save computation time
> during installation. But also question about space that would be
> needed for this arises.

Yeah, that's exactly what I was thinking of. As mentioned, special care
needs to be taken w.r.t. how variants are handled to avoid the 2^n growth
in both space and time requirements for n variants.

Once we have this data, we'd also have to find a suitable representation
of variants in the SAT solver input, if we want to support dependencies
on variants (i.e. portA +variantA depends on portB +variantB -variantC).

-- 
Clemens Lang


More information about the macports-dev mailing list