WACZ (Web Archive Collection Zipped) software
akierig
akierig at fastmail.de
Tue Mar 28 03:29:52 UTC 2023
On 2023-03-27 (KW 13) at 16:07:45 (-0500) Eric Gallager via
macports-users wrote:
> So, the Internet Archive has recently added an "Email me a WACZ file
> with the results" option to their "Save Page Now" service in the
> Wayback Machine, so I tried that out and got some WACZ files, although
> now I don't know what to do with them. Is anyone aware of any software
> for handling WACZ files that's available in MacPorts? Or, if there
> isn't any yet, could some be added?
> More info on the format can be found here:
> https://replayweb.page/docs/wacz-format
> There are some python tools for interacting with the format, but I
> couldn't get pypi2port to generate a Portfile for me for them, and
> plus there are kind of too many python things in MacPorts anyways:
> https://github.com/webrecorder/py-wacz
> Anything else?
> Thanks,
> Eric Gallager
I’m a librarian who does a fair bit with web archives. the short
version is this:
replayweb.page will work to ‘play’ a web archive (warc/wacz). there
is a desktop application (electron) that you can grab from github. I
find it better than trying to load something like that into firefox. I
don’t know what the policy is about adding an electron app into
macports is but speaking as a maintainer for an electron app on a linux
distro...I’d personally avoid it.
py-wacz is great for converting warc files into wacz. the primary
difference is that the later are compressed. That’s the primary
function it has.
One thing for creating warc files in Macports is wget which works with
something like: wget -pkrm --warc-cdx --warc-file=foo -e robots=off
[https://foo.org](https://foo.org). I did a write up about it back in
2020.
I hope that helps a little bit.
ander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-users/attachments/20230327/b7727573/attachment.htm>
More information about the macports-users
mailing list