GSoC 2019 - trace mode improvements

Davide Gerhard rainbow at irh.it
Sat Apr 6 15:02:00 UTC 2019


Sorry, I am late but I saw only the other day that MacPorts
participate to the Google Summer of Code.

Any comment will be appreciate.


* Personal Details

Name: Davide Gerhard
Email: rainbow at irh.it
IRC: ra1nb0w at freenode
Github: @ra1nb0w
University: University of Trento (Italy)
Location: Italy
Short Bio:

I have always been interested in Informatics, Security and Networking
in particularly developed by the BSD/free-software community.
This passion has lead me to study information security.
I took a Bachelor degree in Security of Computer Systems and Networks
and now I am taking a Master degree on Computer Science.

In the last few months I contributed to macports-ports, mainly in the
following areas:

- SDR, mainly with @michaelld and @mf2k
- mercurial and related, mainly with @mojca


* Project Idea

** Goal

The main goal of my GSoC will be to:

- implement a function that auto-detect build dependencies of a new port
  and print out which are needed; especially useful when the developer
  is using his everyday installation.

- speed up trace mode: improve performance of the sandbox particularly
  during the build process, with so many syscalls and I/O.

** Abstract (with) Methodology

The main argument of my GSoC will be understanding and improving the
trace mode used on macports and learn the related low level
functionality of macOS.  This will be done mainly in three steps: first
understand how it works and which functionality already implements and
how. Second, trace which files a configuration script read(2)/stat(2) or
try to access, like during automake/autoconfig or cmake phase, and build
a complete and minimal dependency tree that will be shown to the
developer. In this stage, should be pay attention to present only direct
build dependencies and avoiding which one are non pertinent to macOS
configuration. Third, analyze the trace mode with dynamic tracing tools,
like dtrace and flame graph, to identify which functions are slow down
the process; identify which ones are more important or more slow and
implement a solution. Re-iterate this path until the execution is
acceptable for the normal usage. At the end of the process I expect a
dylib that could be used easily and fast from many other parts, like
permit fakeroot or phasing out XCode.

* Technical Details

** Schedule

At this stage, I generally don't trace a strict timetable because some
steps are more longer than others and I am not conscious of every single
effort.

Phase 1:

- deeply understand macports-base and its architecture;
- learn darwintracelib1.0 code and try to hack some functionality to
  better understand which section I should change.

Phase 2:

- implement the functions that trace the configuration phase and build
  an approximate dependency tree;
- try that code with many different ports and languages to detect
  glitches or cases not considered at the implementation time; verify
  manually that the build dependency tree is complete and minimal;
- review the implementation: adding new changes, if they are little, or
  try new approaches;
- do many tests with real packages and verify that everything is
  correct.

Phase 3:

- after understanding the tracelib, I can start to analyze the
  performance of the injected library; as I already know dtrace and how
  to plot the results as flame graph, this will be my first way to
  understand which function/call is slow. Considering that many low
  level things of macOS are unknown to me, I will be happy to find new
  way to analyze problem like this;
- define which functions could be speed-up and which kind of
  algorithm/structure could be used to improve the performance;
  generally these changes require caches or new approaches to the
  problem;
- implement one improvement at time and create some border line tests to
  verify that the new code works; after this, identify few packages
  that use that function a lot and verify the improvement with the tool
  used during the analysis;
- reiterate the above statement for every function that was source of
  bottleneck until acceptable performance is reached; this aspect must
  be considered reached only with real ports and not with test cases;
- test if the changes works well with the goal of phase 2 and see if
  need changes to generalize the usage, like fakeroot or phasing out
  XCode.


* Additional Questions

** What are your experiences with macOS so far? (How long do you use
   it, did you switch from Windows/Linux, etc.)

I used for fifteen years only OpenBSD/FreeBSD/Linux (Gentoo, Archlinux
and Debian).  Now it is 9 months that I am using macOS as main laptop.

** How long have you been using MacPorts?

7 months

** Do you have experience with other package management systems?

Yes

** How much experience do you have with Tcl and C?

Many years of C but very little with Tcl

** Will you be available after the project ends?

Yes


* Availability

** Do you plan to go on vacations, have exams, internship or be
   otherwise absent during the GSOC? If so, when?

probably the second week of August

** To which other organisations did you send a GSOC proposal?

No one else



Thank you
/davide


More information about the macports-dev mailing list