GSoC 15 - Dependency Calculation using SAT

Clemens Lang cal at macports.org
Sun Mar 22 04:42:02 PDT 2015


Hi,

----- On 21 Mar, 2015, at 22:42, Mihai Moldovan ionic at macports.org wrote:

> On 21.03.2015 07:04 PM, Clemens Lang wrote:
>> Mihi is right on the general analysis. As you see, there's a problem

Sorry for getting your name wrong.

>> there, because most SAT solving methods expect you to have a
>> representation of the complete package universe and find a selection
>> of necessary dependencies given a list of packages that should be
>> installed.
>> 
>> Now, if the complete package universe can only be accurately computed
>> by executing all Portfiles in it, that would increase the time to
>> compute it to about 15 minutes (per installation!). Since this is
>> obviously not practical, we need
> 
> Well, one does not need to calculate a solution for *all* ports, merely
> the selected ones and their dependencies, taking variants into account.

But the whole point of that would be that you don't need to know which
ports are dependencies of the set of selected ports. That's what the SAT
solver should tell you. Of course you only need the dependency information
of the selected ports and their dependencies, but if you already know what
the dependencies are, you don't need to do the SAT solving.


> If something like proper variant...ic dependencies come into play, also
> that (like, depends_libs foo+variant or foo-variant.)
> 
> This solution can be cached in either memory or disk, transformed into a
> SAT boolean expression and solved using a SAT solver.
> 
> This can then be transformed back into an ordered list of ports and
> variants and cached in memory or disk.
> 
> All in all, this operation could take up to 10 minutes, but is only
> needed to be calculated once for every "port phase foo" command. I think
> that's reasonable.
> 
> Package managers on Gentoo do just that.

Oh, so you're suggesting we should pre-compute the solutions to the SAT
problem for each port? How would that work with private port trees?


> If we want to support resuming a previous calculated solution, which
> happens to fail to build due to bugs in specific ports, this cached
> ordered list MUST be on a non-volatile storage and updated regularly
> after each operation. Operation SHALL BE defined as an atomic phase like
> fetch, extract, configure, build, destroot, install.
> 
> BTW, this could also help with rolling back interrupted phases, because
> we know the last phase to succeed and the interrupted phase MUST be the
> next one.
> 
> 
>> A possible solution to the dilemma would be to limit the semantics of
>> what variants can do and where you are allowed to modify dependencies.
>> For example:
>> 
>> Find a representation of the effects that a variant has on the set of
>> dependencies and store that, rather than storing the set of dependencies
>> after evaluating all variants.
> 
> This. I'd do it by evaluating the Portfiles and remembering the specific
> selected variants and dependencies.

I'd rather do it by replacing a couple of instructions with different
implementations that record their action – e.g. depends_lib-append foo
inside a variant block is stored as:
  portA:
    variant var:
      appends "foo" to depends_lib


>> That removes the exponential from time
>> and space requirements of keeping a cache on variants. This data could
>> be generated by instrumenting a couple of keywords currently used in
>> Portfiles (variant, variant_isset, if, and all depends_* operations).
>> The downsides would be
>>  - we'd have to change semantics so the order of evaluation of the
>>    effects of variants is undefined (I don't think that's unreasonable)
>>  - since we would no longer execute variants for dependency info, the
>>    behavior of variants must not differ depending on a user's environment
>>    (i.e. depending on whether a user has a certain port installed). I
>>    think this behavior is bad style anyway, so I would support dropping
>>    it.
>> With this in place, we could stop evaluating Portfiles, because we would
>> be able to compute accurate dependency information *without* running the
>> Portfile.
> 
> I honestly don't think that's a good idea. That basically requires
> implementing an own parser for the TCL code to avoid evaluating it.
> Doesn't really make too much sense in terms of speed and less so in
> terms of accuracy (e.g., you'd have yet another potential buggy
> component to worry about.)

No, it requires you to use the normal Tcl interpreter, but replace a couple
of instructions and builtins. Since that's possible in Tcl I don't see why
reimplementing is necessary.

We're in the situation that our input format is turing complete. That was a
bad idea to begin with, at least for the package metadata such as
dependencies. I'm drafting a way out of this dilemma, and there will be use
cases that break when we do this, but I still think it's worth the effort
in the long run.


>> I'm aware of one case that is often used but would break using
>> this approach, and that's setting the default_variant of a port
>> depending on whether other ports are installed. We could support a
>> workaround for the few ports that do this, e.g. by supporting a small
>> snippet of code that – if defined – is executed and returns the default
>> variant selection in the current environment.
> 
> Isn't this use case more or less very common due to the universal variant?

No. The universal variant is treated specially by the MacPorts code.
Basically the dependency calculation adds it where necessary, and while it
does that, basically emulates variant dependencies for this special case.
This could easily be implemented using
  depends_lib port:foo
  variant universal {
    depends_lib port:foo:+universal
  }
and that doesn't require executing code.


> With all that said both by Clemens and me, I'd like to again point out
> that NOT all of these tasks are required to be implemented in the 2015
> GSoC, that would be pure madness. Any start is good.

Correct. We're still discussing this now, because these things might
influence how your implementation should be started. It does not mean you
have to implement all of this.

Btw, this complexity is also why I don't think you should try working two
topics over the summer… this one is complex enough as-is.

-- 
Clemens Lang


More information about the macports-dev mailing list