[DISCUSS] Tar + Gzip vs. Zip

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Tar + Gzip vs. Zip

Andy LoPresto-2
Hi folks, 

I do not want to start a long-running argument or entrenched battle. However, having just performed the RM duties for the latest release, I believe I have identified a resource inefficiency in the fact that we generate, upload, host, and distribute two compressed archives of the binary which are functionally equivalent. For 1.7.0, both the .tar.gz and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs. 1_224_392_000 bytes for zip). The time to build and sign these is substantial, but the true cost comes in uploading and hosting them. While the fabled extension registry will save all of us from this burden, it isn’t arriving tomorrow, and I think we could drastically improve this before the next release. 

I have no personal preference between the two formats. In earlier days, there were platform inconsistencies and the tools weren’t available on all systems, but now they are pretty standard for all users. This [1] is an interesting article I found which had some good info on the origins, and here are some additional resources for anyone interested [2][3]. I don’t care which we pick, but I propose removing one of the options for the build going forward (toolkit as well). 

That said, if someone has a good reason that both are necessary, I would love to hear it. I didn’t find anything on the Apache Release Policy which stated we must offer both, but maybe I missed it. Thanks. 



Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Mike Thomsen
I would lean toward Zip because it is the format that is supported by
Windows, macOS and Linux out of the box. I think the ease of use for
Windows users is particularly important.

On Mon, Jun 25, 2018 at 11:34 PM Andy LoPresto <[hidden email]> wrote:

> Hi folks,
>
> I do not want to start a long-running argument or entrenched battle.
> However, having just performed the RM duties for the latest release, I
> believe I have identified a resource inefficiency in the fact that we
> generate, upload, host, and distribute two compressed archives of the
> binary which are functionally equivalent. For 1.7.0, both the .tar.gz and
> .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs. 1_224_392_000
> bytes for zip). The time to build and sign these is substantial, but the
> true cost comes in uploading and hosting them. While the fabled extension
> registry will save all of us from this burden, it isn’t arriving tomorrow,
> and I think we could drastically improve this before the next release.
>
> I have no personal preference between the two formats. In earlier days,
> there were platform inconsistencies and the tools weren’t available on all
> systems, but now they are pretty standard for all users. This [1] is an
> interesting article I found which had some good info on the origins, and
> here are some additional resources for anyone interested [2][3]. I don’t
> care which we pick, but I propose removing one of the options for the build
> going forward (toolkit as well).
>
> That said, if someone has a good reason that both are necessary, I would
> love to hear it. I didn’t find anything on the Apache Release Policy which
> stated we must offer both, but maybe I missed it. Thanks.
>
> [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> [2] https://superuser.com/a/1257441/40003
> [3] https://superuser.com/a/173995/40003
> [4] https://www.apache.org/legal/release-policy.html#artifacts
>
>
> Andy LoPresto
> [hidden email]
> *[hidden email] <[hidden email]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Jeff Zemerick
As a user I always download the zip file. Echoing Mike's reply, I work
across Linux, Windows, and OSX and my mouse always goes toward the zip.
I've never run into any file permission/attribute issues with the zip
distribution. Everything that should be executable always has been. So if
you axed one, my non-binding, FWIW vote would be to keep zip. :)

Jeff


On Tue, Jun 26, 2018 at 5:28 AM Mike Thomsen <[hidden email]> wrote:

> I would lean toward Zip because it is the format that is supported by
> Windows, macOS and Linux out of the box. I think the ease of use for
> Windows users is particularly important.
>
> On Mon, Jun 25, 2018 at 11:34 PM Andy LoPresto <[hidden email]>
> wrote:
>
> > Hi folks,
> >
> > I do not want to start a long-running argument or entrenched battle.
> > However, having just performed the RM duties for the latest release, I
> > believe I have identified a resource inefficiency in the fact that we
> > generate, upload, host, and distribute two compressed archives of the
> > binary which are functionally equivalent. For 1.7.0, both the .tar.gz and
> > .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs. 1_224_392_000
> > bytes for zip). The time to build and sign these is substantial, but the
> > true cost comes in uploading and hosting them. While the fabled extension
> > registry will save all of us from this burden, it isn’t arriving
> tomorrow,
> > and I think we could drastically improve this before the next release.
> >
> > I have no personal preference between the two formats. In earlier days,
> > there were platform inconsistencies and the tools weren’t available on
> all
> > systems, but now they are pretty standard for all users. This [1] is an
> > interesting article I found which had some good info on the origins, and
> > here are some additional resources for anyone interested [2][3]. I don’t
> > care which we pick, but I propose removing one of the options for the
> build
> > going forward (toolkit as well).
> >
> > That said, if someone has a good reason that both are necessary, I would
> > love to hear it. I didn’t find anything on the Apache Release Policy
> which
> > stated we must offer both, but maybe I missed it. Thanks.
> >
> > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > [2] https://superuser.com/a/1257441/40003
> > [3] https://superuser.com/a/173995/40003
> > [4] https://www.apache.org/legal/release-policy.html#artifacts
> >
> >
> > Andy LoPresto
> > [hidden email]
> > *[hidden email] <[hidden email]>*
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Josh Elser-2
In reply to this post by Andy LoPresto-2


On 6/25/18 11:34 PM, Andy LoPresto wrote:

> Hi folks,
>
> I do not want to start a long-running argument or entrenched battle.
> However, having just performed the RM duties for the latest release, I
> believe I have identified a resource inefficiency in the fact that we
> generate, upload, host, and distribute two compressed archives of the
> binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> 1_224_392_000 bytes for zip). The time to build and sign these is
> substantial, but the true cost comes in uploading and hosting them.
> While the fabled extension registry will save all of us from this
> burden, it isn’t arriving tomorrow, and I think we could drastically
> improve this before the next release.
>
> I have no personal preference between the two formats. In earlier days,
> there were platform inconsistencies and the tools weren’t available on
> all systems, but now they are pretty standard for all users. This [1] is
> an interesting article I found which had some good info on the origins,
> and here are some additional resources for anyone interested [2][3]. I
> don’t care which we pick, but I propose removing one of the options for
> the build going forward (toolkit as well).
>
> That said, if someone has a good reason that both are necessary, I would
> love to hear it. I didn’t find anything on the Apache Release Policy
> which stated we must offer both, but maybe I missed it. Thanks.

I'm not aware of any ASF policy. I think it mostly stems from default
convention you get out of the maven-assembly-plugin.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Tony Kurc
My preference is zip.

On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:

>
>
> On 6/25/18 11:34 PM, Andy LoPresto wrote:
> > Hi folks,
> >
> > I do not want to start a long-running argument or entrenched battle.
> > However, having just performed the RM duties for the latest release, I
> > believe I have identified a resource inefficiency in the fact that we
> > generate, upload, host, and distribute two compressed archives of the
> > binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> > and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> > 1_224_392_000 bytes for zip). The time to build and sign these is
> > substantial, but the true cost comes in uploading and hosting them.
> > While the fabled extension registry will save all of us from this
> > burden, it isn’t arriving tomorrow, and I think we could drastically
> > improve this before the next release.
> >
> > I have no personal preference between the two formats. In earlier days,
> > there were platform inconsistencies and the tools weren’t available on
> > all systems, but now they are pretty standard for all users. This [1] is
> > an interesting article I found which had some good info on the origins,
> > and here are some additional resources for anyone interested [2][3]. I
> > don’t care which we pick, but I propose removing one of the options for
> > the build going forward (toolkit as well).
> >
> > That said, if someone has a good reason that both are necessary, I would
> > love to hear it. I didn’t find anything on the Apache Release Policy
> > which stated we must offer both, but maybe I missed it. Thanks.
>
> I'm not aware of any ASF policy. I think it mostly stems from default
> convention you get out of the maven-assembly-plugin.
>
> > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > [2] https://superuser.com/a/1257441/40003
> > [3] https://superuser.com/a/173995/40003
> > [4] https://www.apache.org/legal/release-policy.html#artifacts
> >
> >
> > Andy LoPresto
> > [hidden email] <mailto:[hidden email]>
> > /[hidden email] <mailto:[hidden email]>/
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Otto Fowler
I end up using zip all the time.  zip +1


On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:

My preference is zip.

On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:

>
>
> On 6/25/18 11:34 PM, Andy LoPresto wrote:
> > Hi folks,
> >
> > I do not want to start a long-running argument or entrenched battle.
> > However, having just performed the RM duties for the latest release, I
> > believe I have identified a resource inefficiency in the fact that we
> > generate, upload, host, and distribute two compressed archives of the
> > binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> > and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> > 1_224_392_000 bytes for zip). The time to build and sign these is
> > substantial, but the true cost comes in uploading and hosting them.
> > While the fabled extension registry will save all of us from this
> > burden, it isn’t arriving tomorrow, and I think we could drastically
> > improve this before the next release.
> >
> > I have no personal preference between the two formats. In earlier days,
> > there were platform inconsistencies and the tools weren’t available on
> > all systems, but now they are pretty standard for all users. This [1]
is
> > an interesting article I found which had some good info on the origins,
> > and here are some additional resources for anyone interested [2][3]. I
> > don’t care which we pick, but I propose removing one of the options for
> > the build going forward (toolkit as well).
> >
> > That said, if someone has a good reason that both are necessary, I
would

> > love to hear it. I didn’t find anything on the Apache Release Policy
> > which stated we must offer both, but maybe I missed it. Thanks.
>
> I'm not aware of any ASF policy. I think it mostly stems from default
> convention you get out of the maven-assembly-plugin.
>
> > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > [2] https://superuser.com/a/1257441/40003
> > [3] https://superuser.com/a/173995/40003
> > [4] https://www.apache.org/legal/release-policy.html#artifacts
> >
> >
> > Andy LoPresto
> > [hidden email] <mailto:[hidden email]>
> > /[hidden email] <mailto:[hidden email]>/
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

James Wing
It's a great idea, Andy, I strongly support just one format.  I think Zip
is a good choice.

On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
wrote:

> I end up using zip all the time.  zip +1
>
>
> On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:
>
> My preference is zip.
>
> On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:
>
> >
> >
> > On 6/25/18 11:34 PM, Andy LoPresto wrote:
> > > Hi folks,
> > >
> > > I do not want to start a long-running argument or entrenched battle.
> > > However, having just performed the RM duties for the latest release, I
> > > believe I have identified a resource inefficiency in the fact that we
> > > generate, upload, host, and distribute two compressed archives of the
> > > binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> > > and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> > > 1_224_392_000 bytes for zip). The time to build and sign these is
> > > substantial, but the true cost comes in uploading and hosting them.
> > > While the fabled extension registry will save all of us from this
> > > burden, it isn’t arriving tomorrow, and I think we could drastically
> > > improve this before the next release.
> > >
> > > I have no personal preference between the two formats. In earlier days,
> > > there were platform inconsistencies and the tools weren’t available on
> > > all systems, but now they are pretty standard for all users. This [1]
> is
> > > an interesting article I found which had some good info on the origins,
> > > and here are some additional resources for anyone interested [2][3]. I
> > > don’t care which we pick, but I propose removing one of the options for
> > > the build going forward (toolkit as well).
> > >
> > > That said, if someone has a good reason that both are necessary, I
> would
> > > love to hear it. I didn’t find anything on the Apache Release Policy
> > > which stated we must offer both, but maybe I missed it. Thanks.
> >
> > I'm not aware of any ASF policy. I think it mostly stems from default
> > convention you get out of the maven-assembly-plugin.
> >
> > > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > > [2] https://superuser.com/a/1257441/40003
> > > [3] https://superuser.com/a/173995/40003
> > > [4] https://www.apache.org/legal/release-policy.html#artifacts
> > >
> > >
> > > Andy LoPresto
> > > [hidden email] <mailto:[hidden email]>
> > > /[hidden email] <mailto:[hidden email]>/
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Andy LoPresto-2
Thanks for everyone’s input. It seems to be a clear consensus to eliminate .tar.gz and only provide .zip moving forward. I’d like to keep this discussion thread going for another day or two to field any objections. After that time (Friday-ish), I’ll create a Jira to do this unless things change. 

I will probably keep the possibility to generate the .tar.gz through an inactive profile to allow people who need that offering to use it. There will be a subtask Jira to update the release guide moving forward as well. 


Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:

It's a great idea, Andy, I strongly support just one format.  I think Zip
is a good choice.

On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
wrote:

I end up using zip all the time.  zip +1


On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:

My preference is zip.

On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:



On 6/25/18 11:34 PM, Andy LoPresto wrote:
Hi folks,

I do not want to start a long-running argument or entrenched battle.
However, having just performed the RM duties for the latest release, I
believe I have identified a resource inefficiency in the fact that we
generate, upload, host, and distribute two compressed archives of the
binary which are functionally equivalent. For 1.7.0, both the .tar.gz
and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
1_224_392_000 bytes for zip). The time to build and sign these is
substantial, but the true cost comes in uploading and hosting them.
While the fabled extension registry will save all of us from this
burden, it isn’t arriving tomorrow, and I think we could drastically
improve this before the next release.

I have no personal preference between the two formats. In earlier days,
there were platform inconsistencies and the tools weren’t available on
all systems, but now they are pretty standard for all users. This [1]
is
an interesting article I found which had some good info on the origins,
and here are some additional resources for anyone interested [2][3]. I
don’t care which we pick, but I propose removing one of the options for
the build going forward (toolkit as well).

That said, if someone has a good reason that both are necessary, I
would
love to hear it. I didn’t find anything on the Apache Release Policy
which stated we must offer both, but maybe I missed it. Thanks.

I'm not aware of any ASF policy. I think it mostly stems from default
convention you get out of the maven-assembly-plugin.

[1] https://itsfoss.com/tar-vs-zip-vs-gz/
[2] https://superuser.com/a/1257441/40003
[3] https://superuser.com/a/173995/40003
[4] https://www.apache.org/legal/release-policy.html#artifacts


Andy LoPresto
[hidden email] <[hidden email]>
/[hidden email] <[hidden email]>/
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69





signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Aldrin Piri
Be mindful to also update the Dockerfile used for Docker Hub as this will
require some adjustments.  Unfortunately, the ADD instruction does not
support zip files.  This isn't a major inconvenience but will require a
multi-stage build to help keep our image size svelte.  I believe we should
be safe as we have been publishing both tarballs and zips for prior
releases, so the Dockerfile should still work in that scenario.

On Wed, Jun 27, 2018 at 4:06 PM Andy LoPresto <[hidden email]> wrote:

> Thanks for everyone’s input. It seems to be a clear consensus to eliminate
> .tar.gz and only provide .zip moving forward. I’d like to keep this
> discussion thread going for another day or two to field any objections.
> After that time (Friday-ish), I’ll create a Jira to do this unless things
> change.
>
> I will probably keep the possibility to generate the .tar.gz through an
> inactive profile to allow people who need that offering to use it. There
> will be a subtask Jira to update the release guide moving forward as well.
>
>
> Andy LoPresto
> [hidden email]
> *[hidden email] <[hidden email]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:
>
> It's a great idea, Andy, I strongly support just one format.  I think Zip
> is a good choice.
>
> On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
> wrote:
>
> I end up using zip all the time.  zip +1
>
>
> On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:
>
> My preference is zip.
>
> On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:
>
>
>
> On 6/25/18 11:34 PM, Andy LoPresto wrote:
>
> Hi folks,
>
> I do not want to start a long-running argument or entrenched battle.
> However, having just performed the RM duties for the latest release, I
> believe I have identified a resource inefficiency in the fact that we
> generate, upload, host, and distribute two compressed archives of the
> binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> 1_224_392_000 bytes for zip). The time to build and sign these is
> substantial, but the true cost comes in uploading and hosting them.
> While the fabled extension registry will save all of us from this
> burden, it isn’t arriving tomorrow, and I think we could drastically
> improve this before the next release.
>
> I have no personal preference between the two formats. In earlier days,
> there were platform inconsistencies and the tools weren’t available on
> all systems, but now they are pretty standard for all users. This [1]
>
> is
>
> an interesting article I found which had some good info on the origins,
> and here are some additional resources for anyone interested [2][3]. I
> don’t care which we pick, but I propose removing one of the options for
> the build going forward (toolkit as well).
>
> That said, if someone has a good reason that both are necessary, I
>
> would
>
> love to hear it. I didn’t find anything on the Apache Release Policy
> which stated we must offer both, but maybe I missed it. Thanks.
>
>
> I'm not aware of any ASF policy. I think it mostly stems from default
> convention you get out of the maven-assembly-plugin.
>
> [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> [2] https://superuser.com/a/1257441/40003
> [3] https://superuser.com/a/173995/40003
> [4] https://www.apache.org/legal/release-policy.html#artifacts
>
>
> Andy LoPresto
> [hidden email] <mailto:[hidden email] <[hidden email]>>
> /[hidden email] <mailto:[hidden email]
> <[hidden email]>>/
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Andy LoPresto-2
Thanks Aldrin. I am not knowledgeable on Docker — do either of these options help us? We could also use a RUN to curl the Zip resource and COPY the unzipped directory?



Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jun 28, 2018, at 6:22 PM, Aldrin Piri <[hidden email]> wrote:

Be mindful to also update the Dockerfile used for Docker Hub as this will
require some adjustments.  Unfortunately, the ADD instruction does not
support zip files.  This isn't a major inconvenience but will require a
multi-stage build to help keep our image size svelte.  I believe we should
be safe as we have been publishing both tarballs and zips for prior
releases, so the Dockerfile should still work in that scenario.

On Wed, Jun 27, 2018 at 4:06 PM Andy LoPresto <[hidden email]> wrote:

Thanks for everyone’s input. It seems to be a clear consensus to eliminate
.tar.gz and only provide .zip moving forward. I’d like to keep this
discussion thread going for another day or two to field any objections.
After that time (Friday-ish), I’ll create a Jira to do this unless things
change.

I will probably keep the possibility to generate the .tar.gz through an
inactive profile to allow people who need that offering to use it. There
will be a subtask Jira to update the release guide moving forward as well.


Andy LoPresto
[hidden email]
*[hidden email] <[hidden email]>*
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:

It's a great idea, Andy, I strongly support just one format.  I think Zip
is a good choice.

On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
wrote:

I end up using zip all the time.  zip +1


On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:

My preference is zip.

On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:



On 6/25/18 11:34 PM, Andy LoPresto wrote:

Hi folks,

I do not want to start a long-running argument or entrenched battle.
However, having just performed the RM duties for the latest release, I
believe I have identified a resource inefficiency in the fact that we
generate, upload, host, and distribute two compressed archives of the
binary which are functionally equivalent. For 1.7.0, both the .tar.gz
and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
1_224_392_000 bytes for zip). The time to build and sign these is
substantial, but the true cost comes in uploading and hosting them.
While the fabled extension registry will save all of us from this
burden, it isn’t arriving tomorrow, and I think we could drastically
improve this before the next release.

I have no personal preference between the two formats. In earlier days,
there were platform inconsistencies and the tools weren’t available on
all systems, but now they are pretty standard for all users. This [1]

is

an interesting article I found which had some good info on the origins,
and here are some additional resources for anyone interested [2][3]. I
don’t care which we pick, but I propose removing one of the options for
the build going forward (toolkit as well).

That said, if someone has a good reason that both are necessary, I

would

love to hear it. I didn’t find anything on the Apache Release Policy
which stated we must offer both, but maybe I missed it. Thanks.


I'm not aware of any ASF policy. I think it mostly stems from default
convention you get out of the maven-assembly-plugin.

[1] https://itsfoss.com/tar-vs-zip-vs-gz/
[2] https://superuser.com/a/1257441/40003
[3] https://superuser.com/a/173995/40003
[4] https://www.apache.org/legal/release-policy.html#artifacts


Andy LoPresto
[hidden email] <mailto:[hidden email] <[hidden email]>>
/[hidden email] <mailto:[hidden email]
<[hidden email]>>/
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69







signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Peter Wilcsinszky
Hi,

I wrote about a different solution for which I implemented a PoC for in
https://lists.apache.org/thread.html/6122674030b8f99a63d586dcdbdaf6b31841572aed63fcc9dcfb5eea@%3Cdev.nifi.apache.org%3E
but multistage build could be a better option and I'm happy to create an
issue and fix it for the next release.

On Fri, Jun 29, 2018 at 3:42 AM Andy LoPresto <[hidden email]> wrote:

> Thanks Aldrin. I am not knowledgeable on Docker — do either of these
> options help us? We could also use a RUN to curl the Zip resource and COPY
> the unzipped directory?
>
> [1] https://github.com/moby/moby/issues/15036#issuecomment-322177465
> [2] https://github.com/jlhawn/dockramp
>
>
> Andy LoPresto
> [hidden email]
> *[hidden email] <[hidden email]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Jun 28, 2018, at 6:22 PM, Aldrin Piri <[hidden email]> wrote:
>
> Be mindful to also update the Dockerfile used for Docker Hub as this will
> require some adjustments.  Unfortunately, the ADD instruction does not
> support zip files.  This isn't a major inconvenience but will require a
> multi-stage build to help keep our image size svelte.  I believe we should
> be safe as we have been publishing both tarballs and zips for prior
> releases, so the Dockerfile should still work in that scenario.
>
> On Wed, Jun 27, 2018 at 4:06 PM Andy LoPresto <[hidden email]>
> wrote:
>
> Thanks for everyone’s input. It seems to be a clear consensus to eliminate
> .tar.gz and only provide .zip moving forward. I’d like to keep this
> discussion thread going for another day or two to field any objections.
> After that time (Friday-ish), I’ll create a Jira to do this unless things
> change.
>
> I will probably keep the possibility to generate the .tar.gz through an
> inactive profile to allow people who need that offering to use it. There
> will be a subtask Jira to update the release guide moving forward as well.
>
>
> Andy LoPresto
> [hidden email]
> *[hidden email] <[hidden email]>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:
>
> It's a great idea, Andy, I strongly support just one format.  I think Zip
> is a good choice.
>
> On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
> wrote:
>
> I end up using zip all the time.  zip +1
>
>
> On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:
>
> My preference is zip.
>
> On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:
>
>
>
> On 6/25/18 11:34 PM, Andy LoPresto wrote:
>
> Hi folks,
>
> I do not want to start a long-running argument or entrenched battle.
> However, having just performed the RM duties for the latest release, I
> believe I have identified a resource inefficiency in the fact that we
> generate, upload, host, and distribute two compressed archives of the
> binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> 1_224_392_000 bytes for zip). The time to build and sign these is
> substantial, but the true cost comes in uploading and hosting them.
> While the fabled extension registry will save all of us from this
> burden, it isn’t arriving tomorrow, and I think we could drastically
> improve this before the next release.
>
> I have no personal preference between the two formats. In earlier days,
> there were platform inconsistencies and the tools weren’t available on
> all systems, but now they are pretty standard for all users. This [1]
>
> is
>
> an interesting article I found which had some good info on the origins,
> and here are some additional resources for anyone interested [2][3]. I
> don’t care which we pick, but I propose removing one of the options for
> the build going forward (toolkit as well).
>
> That said, if someone has a good reason that both are necessary, I
>
> would
>
> love to hear it. I didn’t find anything on the Apache Release Policy
> which stated we must offer both, but maybe I missed it. Thanks.
>
>
> I'm not aware of any ASF policy. I think it mostly stems from default
> convention you get out of the maven-assembly-plugin.
>
> [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> [2] https://superuser.com/a/1257441/40003
> [3] https://superuser.com/a/173995/40003
> [4] https://www.apache.org/legal/release-policy.html#artifacts
>
>
> Andy LoPresto
> [hidden email] <mailto:[hidden email] <[hidden email]>>
> /[hidden email] <mailto:[hidden email]
> <[hidden email]>>/
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Aldrin Piri
Hi Peter,

I remember seeing this but the criteria about working only on Mac and
Windows makes it a challenge, in my opinion.

I also need to apologize as I certainly confused the Dockerfiles between
the Maven plugin and the Docker Hub.  My prior email should have been
directed toward the Maven scenario as that is using the ADD.  Docker Hub
will just require an updating of the curl command to the .zip extension and
we should be set.  Regardless, Andy, when you make the issue for this
change feel free to create a subtask of that to update the Dockerfiles.
Looks like Peter is up to the task but I am also happy to help make the
adjustments and verify.  The first linked item you provided is the
multistage approach mentioned.  Multistage builds allow you to effectively
create throw away images only selecting specific artifacts from them to use
in a new image.

Thanks!
--aldrin

On Fri, Jun 29, 2018 at 7:11 AM Peter Wilcsinszky <
[hidden email]> wrote:

> Hi,
>
> I wrote about a different solution for which I implemented a PoC for in
>
> https://lists.apache.org/thread.html/6122674030b8f99a63d586dcdbdaf6b31841572aed63fcc9dcfb5eea@%3Cdev.nifi.apache.org%3E
> but multistage build could be a better option and I'm happy to create an
> issue and fix it for the next release.
>
> On Fri, Jun 29, 2018 at 3:42 AM Andy LoPresto <[hidden email]>
> wrote:
>
> > Thanks Aldrin. I am not knowledgeable on Docker — do either of these
> > options help us? We could also use a RUN to curl the Zip resource and
> COPY
> > the unzipped directory?
> >
> > [1] https://github.com/moby/moby/issues/15036#issuecomment-322177465
> > [2] https://github.com/jlhawn/dockramp
> >
> >
> > Andy LoPresto
> > [hidden email]
> > *[hidden email] <[hidden email]>*
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > On Jun 28, 2018, at 6:22 PM, Aldrin Piri <[hidden email]> wrote:
> >
> > Be mindful to also update the Dockerfile used for Docker Hub as this will
> > require some adjustments.  Unfortunately, the ADD instruction does not
> > support zip files.  This isn't a major inconvenience but will require a
> > multi-stage build to help keep our image size svelte.  I believe we
> should
> > be safe as we have been publishing both tarballs and zips for prior
> > releases, so the Dockerfile should still work in that scenario.
> >
> > On Wed, Jun 27, 2018 at 4:06 PM Andy LoPresto <[hidden email]>
> > wrote:
> >
> > Thanks for everyone’s input. It seems to be a clear consensus to
> eliminate
> > .tar.gz and only provide .zip moving forward. I’d like to keep this
> > discussion thread going for another day or two to field any objections.
> > After that time (Friday-ish), I’ll create a Jira to do this unless things
> > change.
> >
> > I will probably keep the possibility to generate the .tar.gz through an
> > inactive profile to allow people who need that offering to use it. There
> > will be a subtask Jira to update the release guide moving forward as
> well.
> >
> >
> > Andy LoPresto
> > [hidden email]
> > *[hidden email] <[hidden email]>*
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:
> >
> > It's a great idea, Andy, I strongly support just one format.  I think Zip
> > is a good choice.
> >
> > On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
> > wrote:
> >
> > I end up using zip all the time.  zip +1
> >
> >
> > On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:
> >
> > My preference is zip.
> >
> > On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:
> >
> >
> >
> > On 6/25/18 11:34 PM, Andy LoPresto wrote:
> >
> > Hi folks,
> >
> > I do not want to start a long-running argument or entrenched battle.
> > However, having just performed the RM duties for the latest release, I
> > believe I have identified a resource inefficiency in the fact that we
> > generate, upload, host, and distribute two compressed archives of the
> > binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> > and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> > 1_224_392_000 bytes for zip). The time to build and sign these is
> > substantial, but the true cost comes in uploading and hosting them.
> > While the fabled extension registry will save all of us from this
> > burden, it isn’t arriving tomorrow, and I think we could drastically
> > improve this before the next release.
> >
> > I have no personal preference between the two formats. In earlier days,
> > there were platform inconsistencies and the tools weren’t available on
> > all systems, but now they are pretty standard for all users. This [1]
> >
> > is
> >
> > an interesting article I found which had some good info on the origins,
> > and here are some additional resources for anyone interested [2][3]. I
> > don’t care which we pick, but I propose removing one of the options for
> > the build going forward (toolkit as well).
> >
> > That said, if someone has a good reason that both are necessary, I
> >
> > would
> >
> > love to hear it. I didn’t find anything on the Apache Release Policy
> > which stated we must offer both, but maybe I missed it. Thanks.
> >
> >
> > I'm not aware of any ASF policy. I think it mostly stems from default
> > convention you get out of the maven-assembly-plugin.
> >
> > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > [2] https://superuser.com/a/1257441/40003
> > [3] https://superuser.com/a/173995/40003
> > [4] https://www.apache.org/legal/release-policy.html#artifacts
> >
> >
> > Andy LoPresto
> > [hidden email] <mailto:[hidden email] <[hidden email]
> >>
> > /[hidden email] <mailto:[hidden email]
> > <[hidden email]>>/
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> >
> >
> >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Tar + Gzip vs. Zip

Peter Wilcsinszky
Yes, I mean with this (multistage build) we cannot get rid of the two
separate modules (maven and dockerhub) but we can get rid of the ADD
instruction which I think has the benefit of making the build clearer and
more explicit as well.

On Fri, Jun 29, 2018 at 1:23 PM Aldrin Piri <[hidden email]> wrote:

> Hi Peter,
>
> I remember seeing this but the criteria about working only on Mac and
> Windows makes it a challenge, in my opinion.
>
> I also need to apologize as I certainly confused the Dockerfiles between
> the Maven plugin and the Docker Hub.  My prior email should have been
> directed toward the Maven scenario as that is using the ADD.  Docker Hub
> will just require an updating of the curl command to the .zip extension and
> we should be set.  Regardless, Andy, when you make the issue for this
> change feel free to create a subtask of that to update the Dockerfiles.
> Looks like Peter is up to the task but I am also happy to help make the
> adjustments and verify.  The first linked item you provided is the
> multistage approach mentioned.  Multistage builds allow you to effectively
> create throw away images only selecting specific artifacts from them to use
> in a new image.
>
> Thanks!
> --aldrin
>
> On Fri, Jun 29, 2018 at 7:11 AM Peter Wilcsinszky <
> [hidden email]> wrote:
>
> > Hi,
> >
> > I wrote about a different solution for which I implemented a PoC for in
> >
> >
> https://lists.apache.org/thread.html/6122674030b8f99a63d586dcdbdaf6b31841572aed63fcc9dcfb5eea@%3Cdev.nifi.apache.org%3E
> > but multistage build could be a better option and I'm happy to create an
> > issue and fix it for the next release.
> >
> > On Fri, Jun 29, 2018 at 3:42 AM Andy LoPresto <[hidden email]>
> > wrote:
> >
> > > Thanks Aldrin. I am not knowledgeable on Docker — do either of these
> > > options help us? We could also use a RUN to curl the Zip resource and
> > COPY
> > > the unzipped directory?
> > >
> > > [1] https://github.com/moby/moby/issues/15036#issuecomment-322177465
> > > [2] https://github.com/jlhawn/dockramp
> > >
> > >
> > > Andy LoPresto
> > > [hidden email]
> > > *[hidden email] <[hidden email]>*
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >
> > > On Jun 28, 2018, at 6:22 PM, Aldrin Piri <[hidden email]> wrote:
> > >
> > > Be mindful to also update the Dockerfile used for Docker Hub as this
> will
> > > require some adjustments.  Unfortunately, the ADD instruction does not
> > > support zip files.  This isn't a major inconvenience but will require a
> > > multi-stage build to help keep our image size svelte.  I believe we
> > should
> > > be safe as we have been publishing both tarballs and zips for prior
> > > releases, so the Dockerfile should still work in that scenario.
> > >
> > > On Wed, Jun 27, 2018 at 4:06 PM Andy LoPresto <[hidden email]>
> > > wrote:
> > >
> > > Thanks for everyone’s input. It seems to be a clear consensus to
> > eliminate
> > > .tar.gz and only provide .zip moving forward. I’d like to keep this
> > > discussion thread going for another day or two to field any objections.
> > > After that time (Friday-ish), I’ll create a Jira to do this unless
> things
> > > change.
> > >
> > > I will probably keep the possibility to generate the .tar.gz through an
> > > inactive profile to allow people who need that offering to use it.
> There
> > > will be a subtask Jira to update the release guide moving forward as
> > well.
> > >
> > >
> > > Andy LoPresto
> > > [hidden email]
> > > *[hidden email] <[hidden email]>*
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >
> > > On Jun 26, 2018, at 7:52 PM, James Wing <[hidden email]> wrote:
> > >
> > > It's a great idea, Andy, I strongly support just one format.  I think
> Zip
> > > is a good choice.
> > >
> > > On Tue, Jun 26, 2018 at 11:16 AM Otto Fowler <[hidden email]>
> > > wrote:
> > >
> > > I end up using zip all the time.  zip +1
> > >
> > >
> > > On June 26, 2018 at 13:30:33, Tony Kurc ([hidden email]) wrote:
> > >
> > > My preference is zip.
> > >
> > > On Tue, Jun 26, 2018, 9:21 AM Josh Elser <[hidden email]> wrote:
> > >
> > >
> > >
> > > On 6/25/18 11:34 PM, Andy LoPresto wrote:
> > >
> > > Hi folks,
> > >
> > > I do not want to start a long-running argument or entrenched battle.
> > > However, having just performed the RM duties for the latest release, I
> > > believe I have identified a resource inefficiency in the fact that we
> > > generate, upload, host, and distribute two compressed archives of the
> > > binary which are functionally equivalent. For 1.7.0, both the .tar.gz
> > > and .zip files are 1.2 GB (1_224_352_000 bytes for tar.gz vs.
> > > 1_224_392_000 bytes for zip). The time to build and sign these is
> > > substantial, but the true cost comes in uploading and hosting them.
> > > While the fabled extension registry will save all of us from this
> > > burden, it isn’t arriving tomorrow, and I think we could drastically
> > > improve this before the next release.
> > >
> > > I have no personal preference between the two formats. In earlier days,
> > > there were platform inconsistencies and the tools weren’t available on
> > > all systems, but now they are pretty standard for all users. This [1]
> > >
> > > is
> > >
> > > an interesting article I found which had some good info on the origins,
> > > and here are some additional resources for anyone interested [2][3]. I
> > > don’t care which we pick, but I propose removing one of the options for
> > > the build going forward (toolkit as well).
> > >
> > > That said, if someone has a good reason that both are necessary, I
> > >
> > > would
> > >
> > > love to hear it. I didn’t find anything on the Apache Release Policy
> > > which stated we must offer both, but maybe I missed it. Thanks.
> > >
> > >
> > > I'm not aware of any ASF policy. I think it mostly stems from default
> > > convention you get out of the maven-assembly-plugin.
> > >
> > > [1] https://itsfoss.com/tar-vs-zip-vs-gz/
> > > [2] https://superuser.com/a/1257441/40003
> > > [3] https://superuser.com/a/173995/40003
> > > [4] https://www.apache.org/legal/release-policy.html#artifacts
> > >
> > >
> > > Andy LoPresto
> > > [hidden email] <mailto:[hidden email] <
> [hidden email]
> > >>
> > > /[hidden email] <mailto:[hidden email]
> > > <[hidden email]>>/
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>