GSoC 2019 - trace mode improvements
Davide Gerhard
rainbow at irh.it
Sat Apr 6 15:02:00 UTC 2019
Sorry, I am late but I saw only the other day that MacPorts
participate to the Google Summer of Code.
Any comment will be appreciate.
* Personal Details
Name: Davide Gerhard
Email: rainbow at irh.it
IRC: ra1nb0w at freenode
Github: @ra1nb0w
University: University of Trento (Italy)
Location: Italy
Short Bio:
I have always been interested in Informatics, Security and Networking
in particularly developed by the BSD/free-software community.
This passion has lead me to study information security.
I took a Bachelor degree in Security of Computer Systems and Networks
and now I am taking a Master degree on Computer Science.
In the last few months I contributed to macports-ports, mainly in the
following areas:
- SDR, mainly with @michaelld and @mf2k
- mercurial and related, mainly with @mojca
* Project Idea
** Goal
The main goal of my GSoC will be to:
- implement a function that auto-detect build dependencies of a new port
and print out which are needed; especially useful when the developer
is using his everyday installation.
- speed up trace mode: improve performance of the sandbox particularly
during the build process, with so many syscalls and I/O.
** Abstract (with) Methodology
The main argument of my GSoC will be understanding and improving the
trace mode used on macports and learn the related low level
functionality of macOS. This will be done mainly in three steps: first
understand how it works and which functionality already implements and
how. Second, trace which files a configuration script read(2)/stat(2) or
try to access, like during automake/autoconfig or cmake phase, and build
a complete and minimal dependency tree that will be shown to the
developer. In this stage, should be pay attention to present only direct
build dependencies and avoiding which one are non pertinent to macOS
configuration. Third, analyze the trace mode with dynamic tracing tools,
like dtrace and flame graph, to identify which functions are slow down
the process; identify which ones are more important or more slow and
implement a solution. Re-iterate this path until the execution is
acceptable for the normal usage. At the end of the process I expect a
dylib that could be used easily and fast from many other parts, like
permit fakeroot or phasing out XCode.
* Technical Details
** Schedule
At this stage, I generally don't trace a strict timetable because some
steps are more longer than others and I am not conscious of every single
effort.
Phase 1:
- deeply understand macports-base and its architecture;
- learn darwintracelib1.0 code and try to hack some functionality to
better understand which section I should change.
Phase 2:
- implement the functions that trace the configuration phase and build
an approximate dependency tree;
- try that code with many different ports and languages to detect
glitches or cases not considered at the implementation time; verify
manually that the build dependency tree is complete and minimal;
- review the implementation: adding new changes, if they are little, or
try new approaches;
- do many tests with real packages and verify that everything is
correct.
Phase 3:
- after understanding the tracelib, I can start to analyze the
performance of the injected library; as I already know dtrace and how
to plot the results as flame graph, this will be my first way to
understand which function/call is slow. Considering that many low
level things of macOS are unknown to me, I will be happy to find new
way to analyze problem like this;
- define which functions could be speed-up and which kind of
algorithm/structure could be used to improve the performance;
generally these changes require caches or new approaches to the
problem;
- implement one improvement at time and create some border line tests to
verify that the new code works; after this, identify few packages
that use that function a lot and verify the improvement with the tool
used during the analysis;
- reiterate the above statement for every function that was source of
bottleneck until acceptable performance is reached; this aspect must
be considered reached only with real ports and not with test cases;
- test if the changes works well with the goal of phase 2 and see if
need changes to generalize the usage, like fakeroot or phasing out
XCode.
* Additional Questions
** What are your experiences with macOS so far? (How long do you use
it, did you switch from Windows/Linux, etc.)
I used for fifteen years only OpenBSD/FreeBSD/Linux (Gentoo, Archlinux
and Debian). Now it is 9 months that I am using macOS as main laptop.
** How long have you been using MacPorts?
7 months
** Do you have experience with other package management systems?
Yes
** How much experience do you have with Tcl and C?
Many years of C but very little with Tcl
** Will you be available after the project ends?
Yes
* Availability
** Do you plan to go on vacations, have exams, internship or be
otherwise absent during the GSOC? If so, when?
probably the second week of August
** To which other organisations did you send a GSOC proposal?
No one else
Thank you
/davide
More information about the macports-dev
mailing list