how to deal with large data files

Artur Szostak aszostak at partner.eso.org
Mon Apr 1 13:00:39 UTC 2019


Our approach to this kind of use case was to add download commands to the pre-activate stage of the Portfile.
See the example below.
_____________________________________________________

# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=tcl:et:sw=4:ts=4:sts=4
# $Id$
PortSystem      1.0

set instrument  uves
name            esopipe-${instrument}-datademo
conflicts       esopipe-${instrument}-datademo-devel
version         4.4
revision        10
categories      science
license         GPL-2+
platforms       darwin
maintainers     eso.org:usd-help
homepage        http://www.eso.org/sci/software/pipelines/
supported_archs noarch
description     ESO UVES instrument pipeline (demo data)
long_description ESO data reduction pipeline for the UVES instrument. \
                 See www.eso.org/pipelines for a description of the ESO pipeline systems. \
                 This package will download the Reflex demo data from the ESO FTP. \
                 The data can be used to test the pipeline workflows.
master_sites
distfiles

set datademodir ${prefix}/share/esopipes/datademo/${instrument}

use_configure   no
build {}
test {}

destroot {
    xinstall -m 755 -d ${destroot}${datademodir}
    system "echo Files under this directory have been downloaded by the ${name} port > ${destroot}${datademodir}/README.txt"
}

# The downloading of the demo data is delayed to the pre-activate stage to
# prevent large archive packages from being generated and staged, as would
# normally happened in the destroot phase. We also use pre-activate rather than
# post-activate since only under pre-activate will aborted downloads actually be
# restarted on a consecutive "port install" command. Lastly we have to use a
# temporary directory as a staging area and do a final move in the post-activate
# phase because when upgrading a port the pre-activate is called before the
# post-deactivate, and so we would end up downloading and then immediately
# removing all the files again (PIPE-5567).
pre-activate {
    set demoreflex ftp://ftp.eso.org/pub/dfs/pipelines/instruments/${instrument}/${instrument}-demo-reflex-${version}.tar.gz
    xinstall -m 755 -d ${datademodir}_tmp_download_area
    system "echo curl ${demoreflex}"
    system "cd ${datademodir}_tmp_download_area ; for ((N=1;N<100;N++)) ; do curl ${demoreflex} | tar -zxpo ; if test \$? -eq 0 ; then break ; fi ; done ; test \$? -eq 0 || exit 1"
    system "find ${datademodir}_tmp_download_area -type d -exec chmod 755 {} \\;"
    system "find ${datademodir}_tmp_download_area ! -type d -exec chmod 644 {} \\;"
}

post-activate {
    system "mv ${datademodir}_tmp_download_area/* ${datademodir}/ && rmdir ${datademodir}_tmp_download_area"
}

post-deactivate {
    delete file ${datademodir}
}
_____________________________________________________

Kind regards.

Artur

________________________________________
From: macports-dev <macports-dev-bounces at lists.macports.org> on behalf of Ryan Schmidt <ryandesign at macports.org>
Sent: 29 March 2019 09:05:07
To: Joshua Root
Cc: MacPorts Developers; Renee Otten
Subject: Re: how to deal with large data files

On Mar 27, 2019, at 23:12, Joshua Root wrote:

> As a port, even if it's not mirrored, it's still going to be taking up
> gigabytes per OS version on the builders

I would indeed want to avoid keeping it installed on the buildbot workers; see https://trac.macports.org/ticket/57464


> and in the private archives.

Disk space on the private archives server is not a problem at this point.



More information about the macports-dev mailing list