"Updating database of binaries" step very slow under Yosemite?

William H. Magill magill at mac.com
Wed Oct 22 14:46:32 PDT 2014

> On Oct 22, 2014, at 4:24 PM, James Berry <jberry at macports.org> wrote:
>> On Oct 22, 2014, at 1:06 PM, René J.V. Bertin <rjvbertin at gmail.com> wrote:
>> On Wednesday October 22 2014 15:51:38 Jeremy Lavergne wrote:
>>> On Oct 22, 2014, at 3:47 PM, Arno Hautala wrote:
>>>> I'm pretty sure that the scan only checks newly installed files.
>>> Correct.
>>> Only "new" (yet-to-be-scanned) files get added, then only the applicable files are checked during rev-upgrade.
>> What's the point in scanning newly installed files? I thought the whole idea was to scan already installed files for ABI issues due to the newly installed (dylib) files?
> The phase that seems to take some time, “Updating database of binaries” is really just recording which of the newly installed files are binary; this is then used for later checking of whether those binaries have been broken.
> The code basically gets from the database the list of files whose state is not yet known (binary null), determines which are binaries, and writes that information to the database.
> Something in that process takes longer than one would expect; that’s what needs to be investigated.
> To investigate:
> 	- It’s pretty easy to null out the binary field in the database
> 	- I would run the code three times with the binary fields all nulled: once as it normally is; once by stubbing out the binary check (but writing a static value to the database); and once by performing the binary check but not writing to the database.
> 	- This would at least give some information about what aspect of the task is taking so much time, and might give some ideas on where we need to look next.
> I haven’t had a chance to do this, but it would be interesting to do…
>> Or are we talking about the (other?) scan that bugs you after you cleaned out all those useless doc or translation files, claiming that such or so  file is missing? :)

What follows is somewhat orthogonal to this train of thought, but may possibly relate to the issue, or it may be completely worthless.
And  bare with me, I've been retired now for 10 years, but spent A LOT of time as an OSF/1 System Administrator on Dec Alphas, so I may not have the details exactly right anymore.
1- This release of OSX TIGHTLY integrates local I/O with iCloud (i.e. network) I/O. This implies that there was probably some significant work done in the Kernel Level I/O routines.
    (I know I/O isn't done by the Kernel in OSX as it is not a Monolithic-kernel. Or at least it didn't start out that way, I have no idea what it has morphed into today.)

2- Long ago, before there was OSF/1, RCA had an operating system called VS/9 -- it was discovered that a dramatic performance improvement could be obtained by "fixing" the I/O modules to increase the read and write block sizes to actually FIT the much larger modern hardware -- Not simply doing multiple writes of smaller records to fill up a single track on the disk. I/O was optimized for the hardware if you will.

3- It is "well known" that the NUMBER of I/Os is a significant determining factor in read and write performance issues., one of the basic concepts behind I/O Caches.

4- Here I'm into serious speculation as this is all "new technology" in my experience quiver...

Given that reads and writes to iCloud are most likely much smaller in size than those to local hardware AND issues relating to acknowledgements of the success or failures of such reads and writes, the potential exists for significant issues with "timings" in general, the question becomes -- how much have these considerations actually impacted the functionality of OSX and various applications.

What I have no knowledge of is HOW (if at all) OSX completely turns off or otherwise ignores or bypasses iCloud I/O if it is not being used. I.e. is a check made on each I/O or does the I/O subsystem know to completely ignore it? 
Any "check" takes time. When there is a very small number of checks being done, that time is small, but if the number of checks increases that time can increase dramatically. Similarly, HOW OSX is optimized to provide "redundancy," "fast launch," etc. all come into play.

I can easily see how something like this database check does MUCH MORE I/O per given unit of time than the OS design parameters test for.
Testing of extremes normally yields some totally unexpected results, which usually results in having those "extremes" modified. Frequently, in something as complex as OSX, a "normal user" range of values is tested and higher values are simply "assumed" to work the same way.

So, in short, to my way of thinking, James comments are "spot on" as they say in the UK.

William H. Magill
# iMac11,3 Core i7 [2.93GHz - 8 GB 1067MHz] OS X 10.10
# Macmini6,1 Intel Core i5 [2.5 Ghz - 4GB 1600MHz] OS X 10.10 OSX Server (now dead)

magill at icloud.com
magill at mac.com
whmagill at gmail.com

More information about the macports-users mailing list