how to deal with large data files
Artur Szostak
aszostak at partner.eso.org
Mon Apr 1 13:00:39 UTC 2019
Our approach to this kind of use case was to add download commands to the pre-activate stage of the Portfile.
See the example below.
_____________________________________________________
# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=tcl:et:sw=4:ts=4:sts=4
# $Id$
PortSystem 1.0
set instrument uves
name esopipe-${instrument}-datademo
conflicts esopipe-${instrument}-datademo-devel
version 4.4
revision 10
categories science
license GPL-2+
platforms darwin
maintainers eso.org:usd-help
homepage http://www.eso.org/sci/software/pipelines/
supported_archs noarch
description ESO UVES instrument pipeline (demo data)
long_description ESO data reduction pipeline for the UVES instrument. \
See www.eso.org/pipelines for a description of the ESO pipeline systems. \
This package will download the Reflex demo data from the ESO FTP. \
The data can be used to test the pipeline workflows.
master_sites
distfiles
set datademodir ${prefix}/share/esopipes/datademo/${instrument}
use_configure no
build {}
test {}
destroot {
xinstall -m 755 -d ${destroot}${datademodir}
system "echo Files under this directory have been downloaded by the ${name} port > ${destroot}${datademodir}/README.txt"
}
# The downloading of the demo data is delayed to the pre-activate stage to
# prevent large archive packages from being generated and staged, as would
# normally happened in the destroot phase. We also use pre-activate rather than
# post-activate since only under pre-activate will aborted downloads actually be
# restarted on a consecutive "port install" command. Lastly we have to use a
# temporary directory as a staging area and do a final move in the post-activate
# phase because when upgrading a port the pre-activate is called before the
# post-deactivate, and so we would end up downloading and then immediately
# removing all the files again (PIPE-5567).
pre-activate {
set demoreflex ftp://ftp.eso.org/pub/dfs/pipelines/instruments/${instrument}/${instrument}-demo-reflex-${version}.tar.gz
xinstall -m 755 -d ${datademodir}_tmp_download_area
system "echo curl ${demoreflex}"
system "cd ${datademodir}_tmp_download_area ; for ((N=1;N<100;N++)) ; do curl ${demoreflex} | tar -zxpo ; if test \$? -eq 0 ; then break ; fi ; done ; test \$? -eq 0 || exit 1"
system "find ${datademodir}_tmp_download_area -type d -exec chmod 755 {} \\;"
system "find ${datademodir}_tmp_download_area ! -type d -exec chmod 644 {} \\;"
}
post-activate {
system "mv ${datademodir}_tmp_download_area/* ${datademodir}/ && rmdir ${datademodir}_tmp_download_area"
}
post-deactivate {
delete file ${datademodir}
}
_____________________________________________________
Kind regards.
Artur
________________________________________
From: macports-dev <macports-dev-bounces at lists.macports.org> on behalf of Ryan Schmidt <ryandesign at macports.org>
Sent: 29 March 2019 09:05:07
To: Joshua Root
Cc: MacPorts Developers; Renee Otten
Subject: Re: how to deal with large data files
On Mar 27, 2019, at 23:12, Joshua Root wrote:
> As a port, even if it's not mirrored, it's still going to be taking up
> gigabytes per OS version on the builders
I would indeed want to avoid keeping it installed on the buildbot workers; see https://trac.macports.org/ticket/57464
> and in the private archives.
Disk space on the private archives server is not a problem at this point.
More information about the macports-dev
mailing list