pbzip2 isn't faster

Eric A. Borisch eborisch at macports.org
Fri Apr 4 06:35:19 PDT 2014


I've definitely had success; the main thing to realize is that pbzip2 can
only speed up decompression on a file created by pbzip2. (pbzip2 files are
still valid bzip2 files.)

Compression (at least for bzip2) is still slow relative to disk speeds, so
parallelizing works very well. As bzip2 is a block compressor (900k chunks
by default) it maps to a parallel process very nicely.

Here's an example with a locally compiled (and therefore run through pbzip2
as that is how I have MP set) MacPorts archive.

MacPro:software$ time pbzip2 -dc
clang-3.4/clang-3.4-3.4_0+analyzer+python27.darwin_10.x86_64.tbz2 >
/dev/null

real 0m12.084s
user 0m45.713s
sys 0m0.534s
MacPro:software$ time bzip2 -dc
clang-3.4/clang-3.4-3.4_0+analyzer+python27.darwin_10.x86_64.tbz2 >
/dev/null

real 0m37.223s
user 0m36.872s
sys 0m0.343s
MacPro:software$ uname -a
Darwin <host> 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT
2011; root:xnu-1504.15.3~1/RELEASE_I386 i386

Note the user time is up (no free lunch; there is some overhead) but real
or "wall" time is down by a factor of 3. (I have a 4 core 2007 machine.)
The software autodetects (by default) how many cores to use by default, but
you can set with -pN. Do you see all cores being used during compression?

 -- Eric
[re-sent to list with correct address]


On Fri, Apr 4, 2014 at 4:49 AM, Ryan Schmidt <ryandesign at macports.org>wrote:

>
> On Apr 4, 2014, at 03:33, René J.V. Bertin wrote:
>
> > On Apr 04, 2014, at 10:19, Ryan Schmidt wrote:
> >
> >> While waiting minutes for clang-3.5 to install (compress) and then
> activate (decompress) a 600MB archive, I wondered why I was sitting here
> waiting for a single-threaded process to complete when I have a multi-core
> Mac.
> >
> > Is that 600MB raw, or 600MB when compressed?
>
> 610 MB compressed. clang is enormous.
>
>
> >> Has anybody successfully achieved the promised parallel operation of
> pbzip2 on OS X? If so, I wonder if it depends on the OS X version or the
> compiler used. I'm on OS X 10.9.2 with Xcode 5.1's Apple LLVM version 5.1
> (clang-503.0.38) (based on LLVM 3.4svn).
> >
> > The correct question to ask is for what cases pbzip2 is faster, if any
> ... A compressed file is essentially a 1D string that's not segmented like
> multimedia data (how common is it to use multiple threads to [de]compress
> audio?). I may be wrong, but for now I'm not at all amazed that
> parallelisation of uncompressing such data entails a lot of overhead, esp.
> if it also means letting the disk seek so many times more
>
> The homepage says "PBZIP2 is a parallel implementation of the bzip2
> block-sorting file compressor that uses pthreads and achieves near-linear
> speedup on SMP machines."
>
> If this didn't actually work, there would be no reason for the program to
> exist, but it has, for over a decade.
>
> > (have you tried to compare to [de]compress from one disk to another, or
> using an SSD?)
>
> I am using an SSD. I have also tried decompressing to /dev/null, with no
> change in speed.
>
> > Also, decompression tends to be so much cheaper than compressing that
> the parallel overhead will count even more.
>
> _______________________________________________
> macports-users mailing list
> macports-users at lists.macosforge.org
> https://lists.macosforge.org/mailman/listinfo/macports-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.macosforge.org/pipermail/macports-users/attachments/20140404/90fc27fe/attachment.html>


More information about the macports-users mailing list