[DISCUSS] Apache NiFi distribution has grown too large

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Apache NiFi distribution has grown too large

Joe Witt
Team,

The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
in the latest release.  Apache infra expanded it to 1.6GB allowance
for us but has stated this is the last time.
https://issues.apache.org/jira/browse/INFRA-15816

We need consider:
1) removing old nars/less commonly used nars/or particularly massive
nars from the assembly we distribute by default.  Folks can still use
these things if they want just not from our convenience binary
2) collapsing nars with highly repeating deps
3) Getting the extension registry baked into the Flow Registry then
moving to separate releases for extension bundles.  The main release
then would be just the NiFi framework.

Any other ideas ?

I'll plan to start identifying candiates for removal soon.

Thanks
Joe
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Joseph Niemiec
just a random thought.

Drop In Lib packs... All the Hadoop ones in one package for example that
can be added to a slim Nifi install. Another may be for Cloud, or Database
Interactions, Integration (JMS, FTP, etc) of course defining these groups
would be the tricky part... Or perhaps some type of installer which allows
you to elect which packages to download to add to the slim install?


On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:

> Team,
>
> The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
> in the latest release.  Apache infra expanded it to 1.6GB allowance
> for us but has stated this is the last time.
> https://issues.apache.org/jira/browse/INFRA-15816
>
> We need consider:
> 1) removing old nars/less commonly used nars/or particularly massive
> nars from the assembly we distribute by default.  Folks can still use
> these things if they want just not from our convenience binary
> 2) collapsing nars with highly repeating deps
> 3) Getting the extension registry baked into the Flow Registry then
> moving to separate releases for extension bundles.  The main release
> then would be just the NiFi framework.
>
> Any other ideas ?
>
> I'll plan to start identifying candiates for removal soon.
>
> Thanks
> Joe
>



--
Joseph
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Bryan Bende
Long term I'd like to see the extension registry take form and have
that be the solution (#3).

In the more near term, we could separate all of the NARs, except for
framework and maybe standard processors & services, into a separate
git repo.

In that new git repo we could organize things like Joe N just
described according to some kind of functional grouping. Each of these
functional bundles could produce its own tar/zip which we can make
available for download.

That would separate the release cycles between core NiFi and the other
NARs, and also avoid having any single binary artifact that gets too
large.



On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]> wrote:

> just a random thought.
>
> Drop In Lib packs... All the Hadoop ones in one package for example that
> can be added to a slim Nifi install. Another may be for Cloud, or Database
> Interactions, Integration (JMS, FTP, etc) of course defining these groups
> would be the tricky part... Or perhaps some type of installer which allows
> you to elect which packages to download to add to the slim install?
>
>
> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
>
>> Team,
>>
>> The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
>> in the latest release.  Apache infra expanded it to 1.6GB allowance
>> for us but has stated this is the last time.
>> https://issues.apache.org/jira/browse/INFRA-15816
>>
>> We need consider:
>> 1) removing old nars/less commonly used nars/or particularly massive
>> nars from the assembly we distribute by default.  Folks can still use
>> these things if they want just not from our convenience binary
>> 2) collapsing nars with highly repeating deps
>> 3) Getting the extension registry baked into the Flow Registry then
>> moving to separate releases for extension bundles.  The main release
>> then would be just the NiFi framework.
>>
>> Any other ideas ?
>>
>> I'll plan to start identifying candiates for removal soon.
>>
>> Thanks
>> Joe
>>
>
>
>
> --
> Joseph
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Chris Herrera
I very much like the solution proposed by Bryan below. This would allow for a cleaner docker image as well, while still proving the functionality as needed. For sure, the extension registry will be great, but in the mean time this is an adequate mid step.

Regards,
Chris

On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>, wrote:

> Long term I'd like to see the extension registry take form and have
> that be the solution (#3).
>
> In the more near term, we could separate all of the NARs, except for
> framework and maybe standard processors & services, into a separate
> git repo.
>
> In that new git repo we could organize things like Joe N just
> described according to some kind of functional grouping. Each of these
> functional bundles could produce its own tar/zip which we can make
> available for download.
>
> That would separate the release cycles between core NiFi and the other
> NARs, and also avoid having any single binary artifact that gets too
> large.
>
>
>
> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]> wrote:
> > just a random thought.
> >
> > Drop In Lib packs... All the Hadoop ones in one package for example that
> > can be added to a slim Nifi install. Another may be for Cloud, or Database
> > Interactions, Integration (JMS, FTP, etc) of course defining these groups
> > would be the tricky part... Or perhaps some type of installer which allows
> > you to elect which packages to download to add to the slim install?
> >
> >
> > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
> >
> > > Team,
> > >
> > > The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
> > > in the latest release. Apache infra expanded it to 1.6GB allowance
> > > for us but has stated this is the last time.
> > > https://issues.apache.org/jira/browse/INFRA-15816
> > >
> > > We need consider:
> > > 1) removing old nars/less commonly used nars/or particularly massive
> > > nars from the assembly we distribute by default. Folks can still use
> > > these things if they want just not from our convenience binary
> > > 2) collapsing nars with highly repeating deps
> > > 3) Getting the extension registry baked into the Flow Registry then
> > > moving to separate releases for extension bundles. The main release
> > > then would be just the NiFi framework.
> > >
> > > Any other ideas ?
> > >
> > > I'll plan to start identifying candiates for removal soon.
> > >
> > > Thanks
> > > Joe
> > >
> >
> >
> >
> > --
> > Joseph
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Michael Moser
Long term I would also like to see #3 be the solution.  I think what Joseph
N described could be part of the capabilities of #3.

I would like to add a note of caution with respect to reorganizing and
releasing extension bundles separately:

   - the burden on release manager expands because many more projects have
   to be released; probably not all on each release cycle but it could still
   be many
   - the chance of accidentally forgetting to release a project in a
   release cycle becomes non-zero
   - sharing code between projects gets a bit harder because you have to
   manage releasing projects in a specific order
   - it becomes harder to find all of the projects that need to change when
   shared code is added
   - the simple act of finding code becomes harder ... in which project is
   that class in? (IDEs like IntelliJ can search in 1 project, but if they
   search across multiple projects, then I haven't learned how)

I used to maintain several nars in separate projects, and recently
reorganized them into 1 project (following NiFi's multi-module maven build)
and life has become much easier!

-- Mike



On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <[hidden email]>
wrote:

> I very much like the solution proposed by Bryan below. This would allow
> for a cleaner docker image as well, while still proving the functionality
> as needed. For sure, the extension registry will be great, but in the mean
> time this is an adequate mid step.
>
> Regards,
> Chris
>
> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>, wrote:
> > Long term I'd like to see the extension registry take form and have
> > that be the solution (#3).
> >
> > In the more near term, we could separate all of the NARs, except for
> > framework and maybe standard processors & services, into a separate
> > git repo.
> >
> > In that new git repo we could organize things like Joe N just
> > described according to some kind of functional grouping. Each of these
> > functional bundles could produce its own tar/zip which we can make
> > available for download.
> >
> > That would separate the release cycles between core NiFi and the other
> > NARs, and also avoid having any single binary artifact that gets too
> > large.
> >
> >
> >
> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]>
> wrote:
> > > just a random thought.
> > >
> > > Drop In Lib packs... All the Hadoop ones in one package for example
> that
> > > can be added to a slim Nifi install. Another may be for Cloud, or
> Database
> > > Interactions, Integration (JMS, FTP, etc) of course defining these
> groups
> > > would be the tricky part... Or perhaps some type of installer which
> allows
> > > you to elect which packages to download to add to the slim install?
> > >
> > >
> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
> > >
> > > > Team,
> > > >
> > > > The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
> > > > in the latest release. Apache infra expanded it to 1.6GB allowance
> > > > for us but has stated this is the last time.
> > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > >
> > > > We need consider:
> > > > 1) removing old nars/less commonly used nars/or particularly massive
> > > > nars from the assembly we distribute by default. Folks can still use
> > > > these things if they want just not from our convenience binary
> > > > 2) collapsing nars with highly repeating deps
> > > > 3) Getting the extension registry baked into the Flow Registry then
> > > > moving to separate releases for extension bundles. The main release
> > > > then would be just the NiFi framework.
> > > >
> > > > Any other ideas ?
> > > >
> > > > I'll plan to start identifying candiates for removal soon.
> > > >
> > > > Thanks
> > > > Joe
> > > >
> > >
> > >
> > >
> > > --
> > > Joseph
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Michael Moser
And of course, as I hit <send> I thought of one more thing.

We could keep all of the code in 1 git repo (1 project) but the
nifi-assembly part of the build could be broken up to build core NiFi
separately from the tar/zip functional grouping of other NARs.

On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]> wrote:

> Long term I would also like to see #3 be the solution.  I think what
> Joseph N described could be part of the capabilities of #3.
>
> I would like to add a note of caution with respect to reorganizing and
> releasing extension bundles separately:
>
>    - the burden on release manager expands because many more projects
>    have to be released; probably not all on each release cycle but it could
>    still be many
>    - the chance of accidentally forgetting to release a project in a
>    release cycle becomes non-zero
>    - sharing code between projects gets a bit harder because you have to
>    manage releasing projects in a specific order
>    - it becomes harder to find all of the projects that need to change
>    when shared code is added
>    - the simple act of finding code becomes harder ... in which project
>    is that class in? (IDEs like IntelliJ can search in 1 project, but if they
>    search across multiple projects, then I haven't learned how)
>
> I used to maintain several nars in separate projects, and recently
> reorganized them into 1 project (following NiFi's multi-module maven build)
> and life has become much easier!
>
> -- Mike
>
>
>
> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <[hidden email]>
> wrote:
>
>> I very much like the solution proposed by Bryan below. This would allow
>> for a cleaner docker image as well, while still proving the functionality
>> as needed. For sure, the extension registry will be great, but in the mean
>> time this is an adequate mid step.
>>
>> Regards,
>> Chris
>>
>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>, wrote:
>> > Long term I'd like to see the extension registry take form and have
>> > that be the solution (#3).
>> >
>> > In the more near term, we could separate all of the NARs, except for
>> > framework and maybe standard processors & services, into a separate
>> > git repo.
>> >
>> > In that new git repo we could organize things like Joe N just
>> > described according to some kind of functional grouping. Each of these
>> > functional bundles could produce its own tar/zip which we can make
>> > available for download.
>> >
>> > That would separate the release cycles between core NiFi and the other
>> > NARs, and also avoid having any single binary artifact that gets too
>> > large.
>> >
>> >
>> >
>> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]>
>> wrote:
>> > > just a random thought.
>> > >
>> > > Drop In Lib packs... All the Hadoop ones in one package for example
>> that
>> > > can be added to a slim Nifi install. Another may be for Cloud, or
>> Database
>> > > Interactions, Integration (JMS, FTP, etc) of course defining these
>> groups
>> > > would be the tricky part... Or perhaps some type of installer which
>> allows
>> > > you to elect which packages to download to add to the slim install?
>> > >
>> > >
>> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
>> > >
>> > > > Team,
>> > > >
>> > > > The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
>> > > > in the latest release. Apache infra expanded it to 1.6GB allowance
>> > > > for us but has stated this is the last time.
>> > > > https://issues.apache.org/jira/browse/INFRA-15816
>> > > >
>> > > > We need consider:
>> > > > 1) removing old nars/less commonly used nars/or particularly massive
>> > > > nars from the assembly we distribute by default. Folks can still use
>> > > > these things if they want just not from our convenience binary
>> > > > 2) collapsing nars with highly repeating deps
>> > > > 3) Getting the extension registry baked into the Flow Registry then
>> > > > moving to separate releases for extension bundles. The main release
>> > > > then would be just the NiFi framework.
>> > > >
>> > > > Any other ideas ?
>> > > >
>> > > > I'll plan to start identifying candiates for removal soon.
>> > > >
>> > > > Thanks
>> > > > Joe
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Joseph
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Tony Kurc
I was looking at nar sizes, and thought some data may be helpful. I used my recent RC1 verification as a basis for getting file sizes, and just got the file size for each file in the assembly named "*.nar". I don't know whether the images I pasted in will go through, but I made some graphs.b The first is a histogram of nar file size in buckets of 10MB. The second basically is similar to a cumulative distribution, the x axis is the "rank" of the nar (smallest to largest), and the y-axis is how what fraction of the all the sizes of the nars together are that rank or lower. In other words, on the graph, the dot at 60 and ~27 means that the smallest 60 nars contribute only ~27% of the total. Of note, the standard and framework nars are at 83 and 84.





On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <[hidden email]> wrote:
And of course, as I hit <send> I thought of one more thing.

We could keep all of the code in 1 git repo (1 project) but the
nifi-assembly part of the build could be broken up to build core NiFi
separately from the tar/zip functional grouping of other NARs.

On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]> wrote:

> Long term I would also like to see #3 be the solution.  I think what
> Joseph N described could be part of the capabilities of #3.
>
> I would like to add a note of caution with respect to reorganizing and
> releasing extension bundles separately:
>
>    - the burden on release manager expands because many more projects
>    have to be released; probably not all on each release cycle but it could
>    still be many
>    - the chance of accidentally forgetting to release a project in a
>    release cycle becomes non-zero
>    - sharing code between projects gets a bit harder because you have to
>    manage releasing projects in a specific order
>    - it becomes harder to find all of the projects that need to change
>    when shared code is added
>    - the simple act of finding code becomes harder ... in which project
>    is that class in? (IDEs like IntelliJ can search in 1 project, but if they
>    search across multiple projects, then I haven't learned how)
>
> I used to maintain several nars in separate projects, and recently
> reorganized them into 1 project (following NiFi's multi-module maven build)
> and life has become much easier!
>
> -- Mike
>
>
>
> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <[hidden email]>
> wrote:
>
>> I very much like the solution proposed by Bryan below. This would allow
>> for a cleaner docker image as well, while still proving the functionality
>> as needed. For sure, the extension registry will be great, but in the mean
>> time this is an adequate mid step.
>>
>> Regards,
>> Chris
>>
>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>, wrote:
>> > Long term I'd like to see the extension registry take form and have
>> > that be the solution (#3).
>> >
>> > In the more near term, we could separate all of the NARs, except for
>> > framework and maybe standard processors & services, into a separate
>> > git repo.
>> >
>> > In that new git repo we could organize things like Joe N just
>> > described according to some kind of functional grouping. Each of these
>> > functional bundles could produce its own tar/zip which we can make
>> > available for download.
>> >
>> > That would separate the release cycles between core NiFi and the other
>> > NARs, and also avoid having any single binary artifact that gets too
>> > large.
>> >
>> >
>> >
>> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]>
>> wrote:
>> > > just a random thought.
>> > >
>> > > Drop In Lib packs... All the Hadoop ones in one package for example
>> that
>> > > can be added to a slim Nifi install. Another may be for Cloud, or
>> Database
>> > > Interactions, Integration (JMS, FTP, etc) of course defining these
>> groups
>> > > would be the tricky part... Or perhaps some type of installer which
>> allows
>> > > you to elect which packages to download to add to the slim install?
>> > >
>> > >
>> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
>> > >
>> > > > Team,
>> > > >
>> > > > The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
>> > > > in the latest release. Apache infra expanded it to 1.6GB allowance
>> > > > for us but has stated this is the last time.
>> > > > https://issues.apache.org/jira/browse/INFRA-15816
>> > > >
>> > > > We need consider:
>> > > > 1) removing old nars/less commonly used nars/or particularly massive
>> > > > nars from the assembly we distribute by default. Folks can still use
>> > > > these things if they want just not from our convenience binary
>> > > > 2) collapsing nars with highly repeating deps
>> > > > 3) Getting the extension registry baked into the Flow Registry then
>> > > > moving to separate releases for extension bundles. The main release
>> > > > then would be just the NiFi framework.
>> > > >
>> > > > Any other ideas ?
>> > > >
>> > > > I'll plan to start identifying candiates for removal soon.
>> > > >
>> > > > Thanks
>> > > > Joe
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Joseph
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Jeremy Dyer
So my favorite option is Bryan’s option number “three” of using the extension registry. Now my thought is do we really need to add complexity and do anything in the mean time or just focus on that? Meaning we have roughly 500mb of available capacity today so why don’t we spend those man hours we would spend on getting the second repo up on the extension registry instead?

@Bryan do you have thoughts about the deployment of those bars in the extension registry? Since we won’t be able to build the release binary anymore would we still need to create separate repos for the nars or no?? I have used the registry a little but I’m not 100% sure on your vision for the nars

- Jeremy Dyer

Sent from my iPhone

> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
>
> I was looking at nar sizes, and thought some data may be helpful. I used my recent RC1 verification as a basis for getting file sizes, and just got the file size for each file in the assembly named "*.nar". I don't know whether the images I pasted in will go through, but I made some graphs.b The first is a histogram of nar file size in buckets of 10MB. The second basically is similar to a cumulative distribution, the x axis is the "rank" of the nar (smallest to largest), and the y-axis is how what fraction of the all the sizes of the nars together are that rank or lower. In other words, on the graph, the dot at 60 and ~27 means that the smallest 60 nars contribute only ~27% of the total. Of note, the standard and framework nars are at 83 and 84.
>
>
>
>
>
>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <[hidden email]> wrote:
>> And of course, as I hit <send> I thought of one more thing.
>>
>> We could keep all of the code in 1 git repo (1 project) but the
>> nifi-assembly part of the build could be broken up to build core NiFi
>> separately from the tar/zip functional grouping of other NARs.
>>
>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]> wrote:
>>
>> > Long term I would also like to see #3 be the solution.  I think what
>> > Joseph N described could be part of the capabilities of #3.
>> >
>> > I would like to add a note of caution with respect to reorganizing and
>> > releasing extension bundles separately:
>> >
>> >    - the burden on release manager expands because many more projects
>> >    have to be released; probably not all on each release cycle but it could
>> >    still be many
>> >    - the chance of accidentally forgetting to release a project in a
>> >    release cycle becomes non-zero
>> >    - sharing code between projects gets a bit harder because you have to
>> >    manage releasing projects in a specific order
>> >    - it becomes harder to find all of the projects that need to change
>> >    when shared code is added
>> >    - the simple act of finding code becomes harder ... in which project
>> >    is that class in? (IDEs like IntelliJ can search in 1 project, but if they
>> >    search across multiple projects, then I haven't learned how)
>> >
>> > I used to maintain several nars in separate projects, and recently
>> > reorganized them into 1 project (following NiFi's multi-module maven build)
>> > and life has become much easier!
>> >
>> > -- Mike
>> >
>> >
>> >
>> > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <[hidden email]>
>> > wrote:
>> >
>> >> I very much like the solution proposed by Bryan below. This would allow
>> >> for a cleaner docker image as well, while still proving the functionality
>> >> as needed. For sure, the extension registry will be great, but in the mean
>> >> time this is an adequate mid step.
>> >>
>> >> Regards,
>> >> Chris
>> >>
>> >> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>, wrote:
>> >> > Long term I'd like to see the extension registry take form and have
>> >> > that be the solution (#3).
>> >> >
>> >> > In the more near term, we could separate all of the NARs, except for
>> >> > framework and maybe standard processors & services, into a separate
>> >> > git repo.
>> >> >
>> >> > In that new git repo we could organize things like Joe N just
>> >> > described according to some kind of functional grouping. Each of these
>> >> > functional bundles could produce its own tar/zip which we can make
>> >> > available for download.
>> >> >
>> >> > That would separate the release cycles between core NiFi and the other
>> >> > NARs, and also avoid having any single binary artifact that gets too
>> >> > large.
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <[hidden email]>
>> >> wrote:
>> >> > > just a random thought.
>> >> > >
>> >> > > Drop In Lib packs... All the Hadoop ones in one package for example
>> >> that
>> >> > > can be added to a slim Nifi install. Another may be for Cloud, or
>> >> Database
>> >> > > Interactions, Integration (JMS, FTP, etc) of course defining these
>> >> groups
>> >> > > would be the tricky part... Or perhaps some type of installer which
>> >> allows
>> >> > > you to elect which packages to download to add to the slim install?
>> >> > >
>> >> > >
>> >> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]> wrote:
>> >> > >
>> >> > > > Team,
>> >> > > >
>> >> > > > The NiFi convenience binary (tar.gz/zip) size has grown to 1.1GB now
>> >> > > > in the latest release. Apache infra expanded it to 1.6GB allowance
>> >> > > > for us but has stated this is the last time.
>> >> > > > https://issues.apache.org/jira/browse/INFRA-15816
>> >> > > >
>> >> > > > We need consider:
>> >> > > > 1) removing old nars/less commonly used nars/or particularly massive
>> >> > > > nars from the assembly we distribute by default. Folks can still use
>> >> > > > these things if they want just not from our convenience binary
>> >> > > > 2) collapsing nars with highly repeating deps
>> >> > > > 3) Getting the extension registry baked into the Flow Registry then
>> >> > > > moving to separate releases for extension bundles. The main release
>> >> > > > then would be just the NiFi framework.
>> >> > > >
>> >> > > > Any other ideas ?
>> >> > > >
>> >> > > > I'll plan to start identifying candiates for removal soon.
>> >> > > >
>> >> > > > Thanks
>> >> > > > Joe
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Joseph
>> >>
>> >
>> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

trkurc
Administrator
I put some of the data I was working with on the wiki -

https://cwiki.apache.org/confluence/display/NIFI/NiFi+1.5.0+nar+files

On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]> wrote:

> So my favorite option is Bryan’s option number “three” of using the
> extension registry. Now my thought is do we really need to add complexity
> and do anything in the mean time or just focus on that? Meaning we have
> roughly 500mb of available capacity today so why don’t we spend those man
> hours we would spend on getting the second repo up on the extension
> registry instead?
>
> @Bryan do you have thoughts about the deployment of those bars in the
> extension registry? Since we won’t be able to build the release binary
> anymore would we still need to create separate repos for the nars or no?? I
> have used the registry a little but I’m not 100% sure on your vision for
> the nars
>
> - Jeremy Dyer
>
> Sent from my iPhone
>
> > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
> >
> > I was looking at nar sizes, and thought some data may be helpful. I used
> my recent RC1 verification as a basis for getting file sizes, and just got
> the file size for each file in the assembly named "*.nar". I don't know
> whether the images I pasted in will go through, but I made some graphs.b
> The first is a histogram of nar file size in buckets of 10MB. The second
> basically is similar to a cumulative distribution, the x axis is the "rank"
> of the nar (smallest to largest), and the y-axis is how what fraction of
> the all the sizes of the nars together are that rank or lower. In other
> words, on the graph, the dot at 60 and ~27 means that the smallest 60 nars
> contribute only ~27% of the total. Of note, the standard and framework nars
> are at 83 and 84.
> >
> >
> >
> >
> >
> >> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <[hidden email]>
> wrote:
> >> And of course, as I hit <send> I thought of one more thing.
> >>
> >> We could keep all of the code in 1 git repo (1 project) but the
> >> nifi-assembly part of the build could be broken up to build core NiFi
> >> separately from the tar/zip functional grouping of other NARs.
> >>
> >> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]>
> wrote:
> >>
> >> > Long term I would also like to see #3 be the solution.  I think what
> >> > Joseph N described could be part of the capabilities of #3.
> >> >
> >> > I would like to add a note of caution with respect to reorganizing and
> >> > releasing extension bundles separately:
> >> >
> >> >    - the burden on release manager expands because many more projects
> >> >    have to be released; probably not all on each release cycle but it
> could
> >> >    still be many
> >> >    - the chance of accidentally forgetting to release a project in a
> >> >    release cycle becomes non-zero
> >> >    - sharing code between projects gets a bit harder because you have
> to
> >> >    manage releasing projects in a specific order
> >> >    - it becomes harder to find all of the projects that need to change
> >> >    when shared code is added
> >> >    - the simple act of finding code becomes harder ... in which
> project
> >> >    is that class in? (IDEs like IntelliJ can search in 1 project, but
> if they
> >> >    search across multiple projects, then I haven't learned how)
> >> >
> >> > I used to maintain several nars in separate projects, and recently
> >> > reorganized them into 1 project (following NiFi's multi-module maven
> build)
> >> > and life has become much easier!
> >> >
> >> > -- Mike
> >> >
> >> >
> >> >
> >> > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> [hidden email]>
> >> > wrote:
> >> >
> >> >> I very much like the solution proposed by Bryan below. This would
> allow
> >> >> for a cleaner docker image as well, while still proving the
> functionality
> >> >> as needed. For sure, the extension registry will be great, but in
> the mean
> >> >> time this is an adequate mid step.
> >> >>
> >> >> Regards,
> >> >> Chris
> >> >>
> >> >> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>,
> wrote:
> >> >> > Long term I'd like to see the extension registry take form and have
> >> >> > that be the solution (#3).
> >> >> >
> >> >> > In the more near term, we could separate all of the NARs, except
> for
> >> >> > framework and maybe standard processors & services, into a separate
> >> >> > git repo.
> >> >> >
> >> >> > In that new git repo we could organize things like Joe N just
> >> >> > described according to some kind of functional grouping. Each of
> these
> >> >> > functional bundles could produce its own tar/zip which we can make
> >> >> > available for download.
> >> >> >
> >> >> > That would separate the release cycles between core NiFi and the
> other
> >> >> > NARs, and also avoid having any single binary artifact that gets
> too
> >> >> > large.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> [hidden email]>
> >> >> wrote:
> >> >> > > just a random thought.
> >> >> > >
> >> >> > > Drop In Lib packs... All the Hadoop ones in one package for
> example
> >> >> that
> >> >> > > can be added to a slim Nifi install. Another may be for Cloud, or
> >> >> Database
> >> >> > > Interactions, Integration (JMS, FTP, etc) of course defining
> these
> >> >> groups
> >> >> > > would be the tricky part... Or perhaps some type of installer
> which
> >> >> allows
> >> >> > > you to elect which packages to download to add to the slim
> install?
> >> >> > >
> >> >> > >
> >> >> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]>
> wrote:
> >> >> > >
> >> >> > > > Team,
> >> >> > > >
> >> >> > > > The NiFi convenience binary (tar.gz/zip) size has grown to
> 1.1GB now
> >> >> > > > in the latest release. Apache infra expanded it to 1.6GB
> allowance
> >> >> > > > for us but has stated this is the last time.
> >> >> > > > https://issues.apache.org/jira/browse/INFRA-15816
> >> >> > > >
> >> >> > > > We need consider:
> >> >> > > > 1) removing old nars/less commonly used nars/or particularly
> massive
> >> >> > > > nars from the assembly we distribute by default. Folks can
> still use
> >> >> > > > these things if they want just not from our convenience binary
> >> >> > > > 2) collapsing nars with highly repeating deps
> >> >> > > > 3) Getting the extension registry baked into the Flow Registry
> then
> >> >> > > > moving to separate releases for extension bundles. The main
> release
> >> >> > > > then would be just the NiFi framework.
> >> >> > > >
> >> >> > > > Any other ideas ?
> >> >> > > >
> >> >> > > > I'll plan to start identifying candiates for removal soon.
> >> >> > > >
> >> >> > > > Thanks
> >> >> > > > Joe
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Joseph
> >> >>
> >> >
> >> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Joe Witt
thanks tony!

On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:

> I put some of the data I was working with on the wiki -
>
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+1.5.0+nar+files
>
> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]> wrote:
>
> > So my favorite option is Bryan’s option number “three” of using the
> > extension registry. Now my thought is do we really need to add complexity
> > and do anything in the mean time or just focus on that? Meaning we have
> > roughly 500mb of available capacity today so why don’t we spend those man
> > hours we would spend on getting the second repo up on the extension
> > registry instead?
> >
> > @Bryan do you have thoughts about the deployment of those bars in the
> > extension registry? Since we won’t be able to build the release binary
> > anymore would we still need to create separate repos for the nars or
> no?? I
> > have used the registry a little but I’m not 100% sure on your vision for
> > the nars
> >
> > - Jeremy Dyer
> >
> > Sent from my iPhone
> >
> > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
> > >
> > > I was looking at nar sizes, and thought some data may be helpful. I
> used
> > my recent RC1 verification as a basis for getting file sizes, and just
> got
> > the file size for each file in the assembly named "*.nar". I don't know
> > whether the images I pasted in will go through, but I made some graphs.b
> > The first is a histogram of nar file size in buckets of 10MB. The second
> > basically is similar to a cumulative distribution, the x axis is the
> "rank"
> > of the nar (smallest to largest), and the y-axis is how what fraction of
> > the all the sizes of the nars together are that rank or lower. In other
> > words, on the graph, the dot at 60 and ~27 means that the smallest 60
> nars
> > contribute only ~27% of the total. Of note, the standard and framework
> nars
> > are at 83 and 84.
> > >
> > >
> > >
> > >
> > >
> > >> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <[hidden email]>
> > wrote:
> > >> And of course, as I hit <send> I thought of one more thing.
> > >>
> > >> We could keep all of the code in 1 git repo (1 project) but the
> > >> nifi-assembly part of the build could be broken up to build core NiFi
> > >> separately from the tar/zip functional grouping of other NARs.
> > >>
> > >> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]>
> > wrote:
> > >>
> > >> > Long term I would also like to see #3 be the solution.  I think what
> > >> > Joseph N described could be part of the capabilities of #3.
> > >> >
> > >> > I would like to add a note of caution with respect to reorganizing
> and
> > >> > releasing extension bundles separately:
> > >> >
> > >> >    - the burden on release manager expands because many more
> projects
> > >> >    have to be released; probably not all on each release cycle but
> it
> > could
> > >> >    still be many
> > >> >    - the chance of accidentally forgetting to release a project in a
> > >> >    release cycle becomes non-zero
> > >> >    - sharing code between projects gets a bit harder because you
> have
> > to
> > >> >    manage releasing projects in a specific order
> > >> >    - it becomes harder to find all of the projects that need to
> change
> > >> >    when shared code is added
> > >> >    - the simple act of finding code becomes harder ... in which
> > project
> > >> >    is that class in? (IDEs like IntelliJ can search in 1 project,
> but
> > if they
> > >> >    search across multiple projects, then I haven't learned how)
> > >> >
> > >> > I used to maintain several nars in separate projects, and recently
> > >> > reorganized them into 1 project (following NiFi's multi-module maven
> > build)
> > >> > and life has become much easier!
> > >> >
> > >> > -- Mike
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > [hidden email]>
> > >> > wrote:
> > >> >
> > >> >> I very much like the solution proposed by Bryan below. This would
> > allow
> > >> >> for a cleaner docker image as well, while still proving the
> > functionality
> > >> >> as needed. For sure, the extension registry will be great, but in
> > the mean
> > >> >> time this is an adequate mid step.
> > >> >>
> > >> >> Regards,
> > >> >> Chris
> > >> >>
> > >> >> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>,
> > wrote:
> > >> >> > Long term I'd like to see the extension registry take form and
> have
> > >> >> > that be the solution (#3).
> > >> >> >
> > >> >> > In the more near term, we could separate all of the NARs, except
> > for
> > >> >> > framework and maybe standard processors & services, into a
> separate
> > >> >> > git repo.
> > >> >> >
> > >> >> > In that new git repo we could organize things like Joe N just
> > >> >> > described according to some kind of functional grouping. Each of
> > these
> > >> >> > functional bundles could produce its own tar/zip which we can
> make
> > >> >> > available for download.
> > >> >> >
> > >> >> > That would separate the release cycles between core NiFi and the
> > other
> > >> >> > NARs, and also avoid having any single binary artifact that gets
> > too
> > >> >> > large.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > [hidden email]>
> > >> >> wrote:
> > >> >> > > just a random thought.
> > >> >> > >
> > >> >> > > Drop In Lib packs... All the Hadoop ones in one package for
> > example
> > >> >> that
> > >> >> > > can be added to a slim Nifi install. Another may be for Cloud,
> or
> > >> >> Database
> > >> >> > > Interactions, Integration (JMS, FTP, etc) of course defining
> > these
> > >> >> groups
> > >> >> > > would be the tricky part... Or perhaps some type of installer
> > which
> > >> >> allows
> > >> >> > > you to elect which packages to download to add to the slim
> > install?
> > >> >> > >
> > >> >> > >
> > >> >> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <[hidden email]>
> > wrote:
> > >> >> > >
> > >> >> > > > Team,
> > >> >> > > >
> > >> >> > > > The NiFi convenience binary (tar.gz/zip) size has grown to
> > 1.1GB now
> > >> >> > > > in the latest release. Apache infra expanded it to 1.6GB
> > allowance
> > >> >> > > > for us but has stated this is the last time.
> > >> >> > > > https://issues.apache.org/jira/browse/INFRA-15816
> > >> >> > > >
> > >> >> > > > We need consider:
> > >> >> > > > 1) removing old nars/less commonly used nars/or particularly
> > massive
> > >> >> > > > nars from the assembly we distribute by default. Folks can
> > still use
> > >> >> > > > these things if they want just not from our convenience
> binary
> > >> >> > > > 2) collapsing nars with highly repeating deps
> > >> >> > > > 3) Getting the extension registry baked into the Flow
> Registry
> > then
> > >> >> > > > moving to separate releases for extension bundles. The main
> > release
> > >> >> > > > then would be just the NiFi framework.
> > >> >> > > >
> > >> >> > > > Any other ideas ?
> > >> >> > > >
> > >> >> > > > I'll plan to start identifying candiates for removal soon.
> > >> >> > > >
> > >> >> > > > Thanks
> > >> >> > > > Joe
> > >> >> > > >
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > --
> > >> >> > > Joseph
> > >> >>
> > >> >
> > >> >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Pierre Villard
Option #3 also has my preference. But it's probably a good idea to only
keep one git repo and play with the assembly and Maven profiles for the
releases, no? It'd be certainly easier for release management process. But
this decision could also depend on how the option #3 is going to be
implemented I guess.

2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:

> thanks tony!
>
> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
>
> > I put some of the data I was working with on the wiki -
> >
> > https://cwiki.apache.org/confluence/display/NIFI/NiFi+1.5.0+nar+files
> >
> > On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]> wrote:
> >
> > > So my favorite option is Bryan’s option number “three” of using the
> > > extension registry. Now my thought is do we really need to add
> complexity
> > > and do anything in the mean time or just focus on that? Meaning we have
> > > roughly 500mb of available capacity today so why don’t we spend those
> man
> > > hours we would spend on getting the second repo up on the extension
> > > registry instead?
> > >
> > > @Bryan do you have thoughts about the deployment of those bars in the
> > > extension registry? Since we won’t be able to build the release binary
> > > anymore would we still need to create separate repos for the nars or
> > no?? I
> > > have used the registry a little but I’m not 100% sure on your vision
> for
> > > the nars
> > >
> > > - Jeremy Dyer
> > >
> > > Sent from my iPhone
> > >
> > > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
> > > >
> > > > I was looking at nar sizes, and thought some data may be helpful. I
> > used
> > > my recent RC1 verification as a basis for getting file sizes, and just
> > got
> > > the file size for each file in the assembly named "*.nar". I don't know
> > > whether the images I pasted in will go through, but I made some
> graphs.b
> > > The first is a histogram of nar file size in buckets of 10MB. The
> second
> > > basically is similar to a cumulative distribution, the x axis is the
> > "rank"
> > > of the nar (smallest to largest), and the y-axis is how what fraction
> of
> > > the all the sizes of the nars together are that rank or lower. In other
> > > words, on the graph, the dot at 60 and ~27 means that the smallest 60
> > nars
> > > contribute only ~27% of the total. Of note, the standard and framework
> > nars
> > > are at 83 and 84.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <[hidden email]>
> > > wrote:
> > > >> And of course, as I hit <send> I thought of one more thing.
> > > >>
> > > >> We could keep all of the code in 1 git repo (1 project) but the
> > > >> nifi-assembly part of the build could be broken up to build core
> NiFi
> > > >> separately from the tar/zip functional grouping of other NARs.
> > > >>
> > > >> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <[hidden email]>
> > > wrote:
> > > >>
> > > >> > Long term I would also like to see #3 be the solution.  I think
> what
> > > >> > Joseph N described could be part of the capabilities of #3.
> > > >> >
> > > >> > I would like to add a note of caution with respect to reorganizing
> > and
> > > >> > releasing extension bundles separately:
> > > >> >
> > > >> >    - the burden on release manager expands because many more
> > projects
> > > >> >    have to be released; probably not all on each release cycle but
> > it
> > > could
> > > >> >    still be many
> > > >> >    - the chance of accidentally forgetting to release a project
> in a
> > > >> >    release cycle becomes non-zero
> > > >> >    - sharing code between projects gets a bit harder because you
> > have
> > > to
> > > >> >    manage releasing projects in a specific order
> > > >> >    - it becomes harder to find all of the projects that need to
> > change
> > > >> >    when shared code is added
> > > >> >    - the simple act of finding code becomes harder ... in which
> > > project
> > > >> >    is that class in? (IDEs like IntelliJ can search in 1 project,
> > but
> > > if they
> > > >> >    search across multiple projects, then I haven't learned how)
> > > >> >
> > > >> > I used to maintain several nars in separate projects, and recently
> > > >> > reorganized them into 1 project (following NiFi's multi-module
> maven
> > > build)
> > > >> > and life has become much easier!
> > > >> >
> > > >> > -- Mike
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > [hidden email]>
> > > >> > wrote:
> > > >> >
> > > >> >> I very much like the solution proposed by Bryan below. This would
> > > allow
> > > >> >> for a cleaner docker image as well, while still proving the
> > > functionality
> > > >> >> as needed. For sure, the extension registry will be great, but in
> > > the mean
> > > >> >> time this is an adequate mid step.
> > > >> >>
> > > >> >> Regards,
> > > >> >> Chris
> > > >> >>
> > > >> >> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]>,
> > > wrote:
> > > >> >> > Long term I'd like to see the extension registry take form and
> > have
> > > >> >> > that be the solution (#3).
> > > >> >> >
> > > >> >> > In the more near term, we could separate all of the NARs,
> except
> > > for
> > > >> >> > framework and maybe standard processors & services, into a
> > separate
> > > >> >> > git repo.
> > > >> >> >
> > > >> >> > In that new git repo we could organize things like Joe N just
> > > >> >> > described according to some kind of functional grouping. Each
> of
> > > these
> > > >> >> > functional bundles could produce its own tar/zip which we can
> > make
> > > >> >> > available for download.
> > > >> >> >
> > > >> >> > That would separate the release cycles between core NiFi and
> the
> > > other
> > > >> >> > NARs, and also avoid having any single binary artifact that
> gets
> > > too
> > > >> >> > large.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > [hidden email]>
> > > >> >> wrote:
> > > >> >> > > just a random thought.
> > > >> >> > >
> > > >> >> > > Drop In Lib packs... All the Hadoop ones in one package for
> > > example
> > > >> >> that
> > > >> >> > > can be added to a slim Nifi install. Another may be for
> Cloud,
> > or
> > > >> >> Database
> > > >> >> > > Interactions, Integration (JMS, FTP, etc) of course defining
> > > these
> > > >> >> groups
> > > >> >> > > would be the tricky part... Or perhaps some type of installer
> > > which
> > > >> >> allows
> > > >> >> > > you to elect which packages to download to add to the slim
> > > install?
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> [hidden email]>
> > > wrote:
> > > >> >> > >
> > > >> >> > > > Team,
> > > >> >> > > >
> > > >> >> > > > The NiFi convenience binary (tar.gz/zip) size has grown to
> > > 1.1GB now
> > > >> >> > > > in the latest release. Apache infra expanded it to 1.6GB
> > > allowance
> > > >> >> > > > for us but has stated this is the last time.
> > > >> >> > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > >> >> > > >
> > > >> >> > > > We need consider:
> > > >> >> > > > 1) removing old nars/less commonly used nars/or
> particularly
> > > massive
> > > >> >> > > > nars from the assembly we distribute by default. Folks can
> > > still use
> > > >> >> > > > these things if they want just not from our convenience
> > binary
> > > >> >> > > > 2) collapsing nars with highly repeating deps
> > > >> >> > > > 3) Getting the extension registry baked into the Flow
> > Registry
> > > then
> > > >> >> > > > moving to separate releases for extension bundles. The main
> > > release
> > > >> >> > > > then would be just the NiFi framework.
> > > >> >> > > >
> > > >> >> > > > Any other ideas ?
> > > >> >> > > >
> > > >> >> > > > I'll plan to start identifying candiates for removal soon.
> > > >> >> > > >
> > > >> >> > > > Thanks
> > > >> >> > > > Joe
> > > >> >> > > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > --
> > > >> >> > > Joseph
> > > >> >>
> > > >> >
> > > >> >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Brandon DeVries
I agree... Long term extension registry, short term one repo with different
assemblies (e.g. standard, slim, analytic, etc...).

Brandon

On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <[hidden email]>
wrote:

> Option #3 also has my preference. But it's probably a good idea to only
> keep one git repo and play with the assembly and Maven profiles for the
> releases, no? It'd be certainly easier for release management process. But
> this decision could also depend on how the option #3 is going to be
> implemented I guess.
>
> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
>
> > thanks tony!
> >
> > On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> >
> > > I put some of the data I was working with on the wiki -
> > >
> > > https://cwiki.apache.org/confluence/display/NIFI/NiFi+1.5.0+nar+files
> > >
> > > On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]>
> wrote:
> > >
> > > > So my favorite option is Bryan’s option number “three” of using the
> > > > extension registry. Now my thought is do we really need to add
> > complexity
> > > > and do anything in the mean time or just focus on that? Meaning we
> have
> > > > roughly 500mb of available capacity today so why don’t we spend those
> > man
> > > > hours we would spend on getting the second repo up on the extension
> > > > registry instead?
> > > >
> > > > @Bryan do you have thoughts about the deployment of those bars in the
> > > > extension registry? Since we won’t be able to build the release
> binary
> > > > anymore would we still need to create separate repos for the nars or
> > > no?? I
> > > > have used the registry a little but I’m not 100% sure on your vision
> > for
> > > > the nars
> > > >
> > > > - Jeremy Dyer
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
> > > > >
> > > > > I was looking at nar sizes, and thought some data may be helpful. I
> > > used
> > > > my recent RC1 verification as a basis for getting file sizes, and
> just
> > > got
> > > > the file size for each file in the assembly named "*.nar". I don't
> know
> > > > whether the images I pasted in will go through, but I made some
> > graphs.b
> > > > The first is a histogram of nar file size in buckets of 10MB. The
> > second
> > > > basically is similar to a cumulative distribution, the x axis is the
> > > "rank"
> > > > of the nar (smallest to largest), and the y-axis is how what fraction
> > of
> > > > the all the sizes of the nars together are that rank or lower. In
> other
> > > > words, on the graph, the dot at 60 and ~27 means that the smallest 60
> > > nars
> > > > contribute only ~27% of the total. Of note, the standard and
> framework
> > > nars
> > > > are at 83 and 84.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> [hidden email]>
> > > > wrote:
> > > > >> And of course, as I hit <send> I thought of one more thing.
> > > > >>
> > > > >> We could keep all of the code in 1 git repo (1 project) but the
> > > > >> nifi-assembly part of the build could be broken up to build core
> > NiFi
> > > > >> separately from the tar/zip functional grouping of other NARs.
> > > > >>
> > > > >> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> [hidden email]>
> > > > wrote:
> > > > >>
> > > > >> > Long term I would also like to see #3 be the solution.  I think
> > what
> > > > >> > Joseph N described could be part of the capabilities of #3.
> > > > >> >
> > > > >> > I would like to add a note of caution with respect to
> reorganizing
> > > and
> > > > >> > releasing extension bundles separately:
> > > > >> >
> > > > >> >    - the burden on release manager expands because many more
> > > projects
> > > > >> >    have to be released; probably not all on each release cycle
> but
> > > it
> > > > could
> > > > >> >    still be many
> > > > >> >    - the chance of accidentally forgetting to release a project
> > in a
> > > > >> >    release cycle becomes non-zero
> > > > >> >    - sharing code between projects gets a bit harder because you
> > > have
> > > > to
> > > > >> >    manage releasing projects in a specific order
> > > > >> >    - it becomes harder to find all of the projects that need to
> > > change
> > > > >> >    when shared code is added
> > > > >> >    - the simple act of finding code becomes harder ... in which
> > > > project
> > > > >> >    is that class in? (IDEs like IntelliJ can search in 1
> project,
> > > but
> > > > if they
> > > > >> >    search across multiple projects, then I haven't learned how)
> > > > >> >
> > > > >> > I used to maintain several nars in separate projects, and
> recently
> > > > >> > reorganized them into 1 project (following NiFi's multi-module
> > maven
> > > > build)
> > > > >> > and life has become much easier!
> > > > >> >
> > > > >> > -- Mike
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > > [hidden email]>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> I very much like the solution proposed by Bryan below. This
> would
> > > > allow
> > > > >> >> for a cleaner docker image as well, while still proving the
> > > > functionality
> > > > >> >> as needed. For sure, the extension registry will be great, but
> in
> > > > the mean
> > > > >> >> time this is an adequate mid step.
> > > > >> >>
> > > > >> >> Regards,
> > > > >> >> Chris
> > > > >> >>
> > > > >> >> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]
> >,
> > > > wrote:
> > > > >> >> > Long term I'd like to see the extension registry take form
> and
> > > have
> > > > >> >> > that be the solution (#3).
> > > > >> >> >
> > > > >> >> > In the more near term, we could separate all of the NARs,
> > except
> > > > for
> > > > >> >> > framework and maybe standard processors & services, into a
> > > separate
> > > > >> >> > git repo.
> > > > >> >> >
> > > > >> >> > In that new git repo we could organize things like Joe N just
> > > > >> >> > described according to some kind of functional grouping. Each
> > of
> > > > these
> > > > >> >> > functional bundles could produce its own tar/zip which we can
> > > make
> > > > >> >> > available for download.
> > > > >> >> >
> > > > >> >> > That would separate the release cycles between core NiFi and
> > the
> > > > other
> > > > >> >> > NARs, and also avoid having any single binary artifact that
> > gets
> > > > too
> > > > >> >> > large.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > > [hidden email]>
> > > > >> >> wrote:
> > > > >> >> > > just a random thought.
> > > > >> >> > >
> > > > >> >> > > Drop In Lib packs... All the Hadoop ones in one package for
> > > > example
> > > > >> >> that
> > > > >> >> > > can be added to a slim Nifi install. Another may be for
> > Cloud,
> > > or
> > > > >> >> Database
> > > > >> >> > > Interactions, Integration (JMS, FTP, etc) of course
> defining
> > > > these
> > > > >> >> groups
> > > > >> >> > > would be the tricky part... Or perhaps some type of
> installer
> > > > which
> > > > >> >> allows
> > > > >> >> > > you to elect which packages to download to add to the slim
> > > > install?
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > [hidden email]>
> > > > wrote:
> > > > >> >> > >
> > > > >> >> > > > Team,
> > > > >> >> > > >
> > > > >> >> > > > The NiFi convenience binary (tar.gz/zip) size has grown
> to
> > > > 1.1GB now
> > > > >> >> > > > in the latest release. Apache infra expanded it to 1.6GB
> > > > allowance
> > > > >> >> > > > for us but has stated this is the last time.
> > > > >> >> > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > > >> >> > > >
> > > > >> >> > > > We need consider:
> > > > >> >> > > > 1) removing old nars/less commonly used nars/or
> > particularly
> > > > massive
> > > > >> >> > > > nars from the assembly we distribute by default. Folks
> can
> > > > still use
> > > > >> >> > > > these things if they want just not from our convenience
> > > binary
> > > > >> >> > > > 2) collapsing nars with highly repeating deps
> > > > >> >> > > > 3) Getting the extension registry baked into the Flow
> > > Registry
> > > > then
> > > > >> >> > > > moving to separate releases for extension bundles. The
> main
> > > > release
> > > > >> >> > > > then would be just the NiFi framework.
> > > > >> >> > > >
> > > > >> >> > > > Any other ideas ?
> > > > >> >> > > >
> > > > >> >> > > > I'll plan to start identifying candiates for removal
> soon.
> > > > >> >> > > >
> > > > >> >> > > > Thanks
> > > > >> >> > > > Joe
> > > > >> >> > > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > >
> > > > >> >> > > --
> > > > >> >> > > Joseph
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Joey Frazee
I tend to have feelings similar to Michael about a multi-repo approach. I’ve rarely seen it help and more often seen it hurt — it’s confusing (especially to newcomers), stuff gets neglected because it’s easier to ignore, you need another master project or some such to do an entire build.

Maybe git submodules could help mitigate this, but creating independent assemblies or using different build profiles to enable building and packaging the binaries in different ways would satisfy everything except disentangling the releases.

-joey

On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>, wrote:

> I agree... Long term extension registry, short term one repo with different
> assemblies (e.g. standard, slim, analytic, etc...).
>
> Brandon
>
> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <[hidden email]
> wrote:
>
> > Option #3 also has my preference. But it's probably a good idea to only
> > keep one git repo and play with the assembly and Maven profiles for the
> > releases, no? It'd be certainly easier for release management process. But
> > this decision could also depend on how the option #3 is going to be
> > implemented I guess.
> >
> > 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
> >
> > > thanks tony!
> > >
> > > On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> > >
> > > > I put some of the data I was working with on the wiki -
> > > >
> > > > https://cwiki.apache.org/confluence/display/NIFI/NiFi+1.5.0+nar+files
> > > >
> > > > On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
> > wrote:
> > > >
> > > > > So my favorite option is Bryan’s option number “three” of using the
> > > > > extension registry. Now my thought is do we really need to add
> > > complexity
> > > > > and do anything in the mean time or just focus on that? Meaning we
> > have
> > > > > roughly 500mb of available capacity today so why don’t we spend those
> > > man
> > > > > hours we would spend on getting the second repo up on the extension
> > > > > registry instead?
> > > > >
> > > > > @Bryan do you have thoughts about the deployment of those bars in the
> > > > > extension registry? Since we won’t be able to build the release
> > binary
> > > > > anymore would we still need to create separate repos for the nars or
> > > > no?? I
> > > > > have used the registry a little but I’m not 100% sure on your vision
> > > for
> > > > > the nars
> > > > >
> > > > > - Jeremy Dyer
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]> wrote:
> > > > > >
> > > > > > I was looking at nar sizes, and thought some data may be helpful. I
> > > > used
> > > > > my recent RC1 verification as a basis for getting file sizes, and
> > just
> > > > got
> > > > > the file size for each file in the assembly named "*.nar". I don't
> > know
> > > > > whether the images I pasted in will go through, but I made some
> > > graphs.b
> > > > > The first is a histogram of nar file size in buckets of 10MB. The
> > > second
> > > > > basically is similar to a cumulative distribution, the x axis is the
> > > > "rank"
> > > > > of the nar (smallest to largest), and the y-axis is how what fraction
> > > of
> > > > > the all the sizes of the nars together are that rank or lower. In
> > other
> > > > > words, on the graph, the dot at 60 and ~27 means that the smallest 60
> > > > nars
> > > > > contribute only ~27% of the total. Of note, the standard and
> > framework
> > > > nars
> > > > > are at 83 and 84.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> > [hidden email]
> > > > > wrote:
> > > > > > > And of course, as I hit <send> I thought of one more thing.
> > > > > > >
> > > > > > > We could keep all of the code in 1 git repo (1 project) but the
> > > > > > > nifi-assembly part of the build could be broken up to build core
> > > NiFi
> > > > > > > separately from the tar/zip functional grouping of other NARs.
> > > > > > >
> > > > > > > On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> > [hidden email]
> > > > > wrote:
> > > > > > >
> > > > > > > > Long term I would also like to see #3 be the solution. I think
> > > what
> > > > > > > > Joseph N described could be part of the capabilities of #3.
> > > > > > > >
> > > > > > > > I would like to add a note of caution with respect to
> > reorganizing
> > > > and
> > > > > > > > releasing extension bundles separately:
> > > > > > > >
> > > > > > > > - the burden on release manager expands because many more
> > > > projects
> > > > > > > > have to be released; probably not all on each release cycle
> > but
> > > > it
> > > > > could
> > > > > > > > still be many
> > > > > > > > - the chance of accidentally forgetting to release a project
> > > in a
> > > > > > > > release cycle becomes non-zero
> > > > > > > > - sharing code between projects gets a bit harder because you
> > > > have
> > > > > to
> > > > > > > > manage releasing projects in a specific order
> > > > > > > > - it becomes harder to find all of the projects that need to
> > > > change
> > > > > > > > when shared code is added
> > > > > > > > - the simple act of finding code becomes harder ... in which
> > > > > project
> > > > > > > > is that class in? (IDEs like IntelliJ can search in 1
> > project,
> > > > but
> > > > > if they
> > > > > > > > search across multiple projects, then I haven't learned how)
> > > > > > > >
> > > > > > > > I used to maintain several nars in separate projects, and
> > recently
> > > > > > > > reorganized them into 1 project (following NiFi's multi-module
> > > maven
> > > > > build)
> > > > > > > > and life has become much easier!
> > > > > > > >
> > > > > > > > -- Mike
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > > > [hidden email]
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I very much like the solution proposed by Bryan below. This
> > would
> > > > > allow
> > > > > > > > > for a cleaner docker image as well, while still proving the
> > > > > functionality
> > > > > > > > > as needed. For sure, the extension registry will be great, but
> > in
> > > > > the mean
> > > > > > > > > time this is an adequate mid step.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Chris
> > > > > > > > >
> > > > > > > > > On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <[hidden email]
> > > ,
> > > > > wrote:
> > > > > > > > > > Long term I'd like to see the extension registry take form
> > and
> > > > have
> > > > > > > > > > that be the solution (#3).
> > > > > > > > > >
> > > > > > > > > > In the more near term, we could separate all of the NARs,
> > > except
> > > > > for
> > > > > > > > > > framework and maybe standard processors & services, into a
> > > > separate
> > > > > > > > > > git repo.
> > > > > > > > > >
> > > > > > > > > > In that new git repo we could organize things like Joe N just
> > > > > > > > > > described according to some kind of functional grouping. Each
> > > of
> > > > > these
> > > > > > > > > > functional bundles could produce its own tar/zip which we can
> > > > make
> > > > > > > > > > available for download.
> > > > > > > > > >
> > > > > > > > > > That would separate the release cycles between core NiFi and
> > > the
> > > > > other
> > > > > > > > > > NARs, and also avoid having any single binary artifact that
> > > gets
> > > > > too
> > > > > > > > > > large.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > > > [hidden email]
> > > > > > > > > wrote:
> > > > > > > > > > > just a random thought.
> > > > > > > > > > >
> > > > > > > > > > > Drop In Lib packs... All the Hadoop ones in one package for
> > > > > example
> > > > > > > > > that
> > > > > > > > > > > can be added to a slim Nifi install. Another may be for
> > > Cloud,
> > > > or
> > > > > > > > > Database
> > > > > > > > > > > Interactions, Integration (JMS, FTP, etc) of course
> > defining
> > > > > these
> > > > > > > > > groups
> > > > > > > > > > > would be the tricky part... Or perhaps some type of
> > installer
> > > > > which
> > > > > > > > > allows
> > > > > > > > > > > you to elect which packages to download to add to the slim
> > > > > install?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > > [hidden email]
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Team,
> > > > > > > > > > > >
> > > > > > > > > > > > The NiFi convenience binary (tar.gz/zip) size has grown
> > to
> > > > > 1.1GB now
> > > > > > > > > > > > in the latest release. Apache infra expanded it to 1.6GB
> > > > > allowance
> > > > > > > > > > > > for us but has stated this is the last time.
> > > > > > > > > > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > > > > > > > > > >
> > > > > > > > > > > > We need consider:
> > > > > > > > > > > > 1) removing old nars/less commonly used nars/or
> > > particularly
> > > > > massive
> > > > > > > > > > > > nars from the assembly we distribute by default. Folks
> > can
> > > > > still use
> > > > > > > > > > > > these things if they want just not from our convenience
> > > > binary
> > > > > > > > > > > > 2) collapsing nars with highly repeating deps
> > > > > > > > > > > > 3) Getting the extension registry baked into the Flow
> > > > Registry
> > > > > then
> > > > > > > > > > > > moving to separate releases for extension bundles. The
> > main
> > > > > release
> > > > > > > > > > > > then would be just the NiFi framework.
> > > > > > > > > > > >
> > > > > > > > > > > > Any other ideas ?
> > > > > > > > > > > >
> > > > > > > > > > > > I'll plan to start identifying candiates for removal
> > soon.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Joe
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Joseph
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

trkurc
Administrator
I added some more stats to the wiki page, trying to determine what
dependencies are included in jars. It seems like there is opportunity.

Highlights, 50 copies of what appears to be some version of bcprov-jdk15
for a total of 162M. 51 copies of jackson-databind.

total size       copies  jar
     30.97MB     65     META-INF/bundled-dependencies/commons-lang3-XXX.jar
     32.53MB     50     META-INF/bundled-dependencies/bcpkix-jdk15on-XXX.jar
     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
     39.62MB      1     META-INF/bundled-dependencies/jython-shaded-XXX.jar
     63.06MB     51
 META-INF/bundled-dependencies/jackson-databind-XXX.jar
    162.07MB     50     META-INF/bundled-dependencies/bcprov-jdk15on-XXX.jar


On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <[hidden email]> wrote:

> I tend to have feelings similar to Michael about a multi-repo approach.
> I’ve rarely seen it help and more often seen it hurt — it’s confusing
> (especially to newcomers), stuff gets neglected because it’s easier to
> ignore, you need another master project or some such to do an entire build.
>
> Maybe git submodules could help mitigate this, but creating independent
> assemblies or using different build profiles to enable building and
> packaging the binaries in different ways would satisfy everything except
> disentangling the releases.
>
> -joey
>
> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>, wrote:
> > I agree... Long term extension registry, short term one repo with
> different
> > assemblies (e.g. standard, slim, analytic, etc...).
> >
> > Brandon
> >
> > On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
> [hidden email]
> > wrote:
> >
> > > Option #3 also has my preference. But it's probably a good idea to only
> > > keep one git repo and play with the assembly and Maven profiles for the
> > > releases, no? It'd be certainly easier for release management process.
> But
> > > this decision could also depend on how the option #3 is going to be
> > > implemented I guess.
> > >
> > > 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
> > >
> > > > thanks tony!
> > > >
> > > > On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> > > >
> > > > > I put some of the data I was working with on the wiki -
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/NIFI/NiFi+
> 1.5.0+nar+files
> > > > >
> > > > > On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
> > > wrote:
> > > > >
> > > > > > So my favorite option is Bryan’s option number “three” of using
> the
> > > > > > extension registry. Now my thought is do we really need to add
> > > > complexity
> > > > > > and do anything in the mean time or just focus on that? Meaning
> we
> > > have
> > > > > > roughly 500mb of available capacity today so why don’t we spend
> those
> > > > man
> > > > > > hours we would spend on getting the second repo up on the
> extension
> > > > > > registry instead?
> > > > > >
> > > > > > @Bryan do you have thoughts about the deployment of those bars
> in the
> > > > > > extension registry? Since we won’t be able to build the release
> > > binary
> > > > > > anymore would we still need to create separate repos for the
> nars or
> > > > > no?? I
> > > > > > have used the registry a little but I’m not 100% sure on your
> vision
> > > > for
> > > > > > the nars
> > > > > >
> > > > > > - Jeremy Dyer
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
> wrote:
> > > > > > >
> > > > > > > I was looking at nar sizes, and thought some data may be
> helpful. I
> > > > > used
> > > > > > my recent RC1 verification as a basis for getting file sizes, and
> > > just
> > > > > got
> > > > > > the file size for each file in the assembly named "*.nar". I
> don't
> > > know
> > > > > > whether the images I pasted in will go through, but I made some
> > > > graphs.b
> > > > > > The first is a histogram of nar file size in buckets of 10MB. The
> > > > second
> > > > > > basically is similar to a cumulative distribution, the x axis is
> the
> > > > > "rank"
> > > > > > of the nar (smallest to largest), and the y-axis is how what
> fraction
> > > > of
> > > > > > the all the sizes of the nars together are that rank or lower. In
> > > other
> > > > > > words, on the graph, the dot at 60 and ~27 means that the
> smallest 60
> > > > > nars
> > > > > > contribute only ~27% of the total. Of note, the standard and
> > > framework
> > > > > nars
> > > > > > are at 83 and 84.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> > > [hidden email]
> > > > > > wrote:
> > > > > > > > And of course, as I hit <send> I thought of one more thing.
> > > > > > > >
> > > > > > > > We could keep all of the code in 1 git repo (1 project) but
> the
> > > > > > > > nifi-assembly part of the build could be broken up to build
> core
> > > > NiFi
> > > > > > > > separately from the tar/zip functional grouping of other
> NARs.
> > > > > > > >
> > > > > > > > On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> > > [hidden email]
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Long term I would also like to see #3 be the solution. I
> think
> > > > what
> > > > > > > > > Joseph N described could be part of the capabilities of #3.
> > > > > > > > >
> > > > > > > > > I would like to add a note of caution with respect to
> > > reorganizing
> > > > > and
> > > > > > > > > releasing extension bundles separately:
> > > > > > > > >
> > > > > > > > > - the burden on release manager expands because many more
> > > > > projects
> > > > > > > > > have to be released; probably not all on each release cycle
> > > but
> > > > > it
> > > > > > could
> > > > > > > > > still be many
> > > > > > > > > - the chance of accidentally forgetting to release a
> project
> > > > in a
> > > > > > > > > release cycle becomes non-zero
> > > > > > > > > - sharing code between projects gets a bit harder because
> you
> > > > > have
> > > > > > to
> > > > > > > > > manage releasing projects in a specific order
> > > > > > > > > - it becomes harder to find all of the projects that need
> to
> > > > > change
> > > > > > > > > when shared code is added
> > > > > > > > > - the simple act of finding code becomes harder ... in
> which
> > > > > > project
> > > > > > > > > is that class in? (IDEs like IntelliJ can search in 1
> > > project,
> > > > > but
> > > > > > if they
> > > > > > > > > search across multiple projects, then I haven't learned
> how)
> > > > > > > > >
> > > > > > > > > I used to maintain several nars in separate projects, and
> > > recently
> > > > > > > > > reorganized them into 1 project (following NiFi's
> multi-module
> > > > maven
> > > > > > build)
> > > > > > > > > and life has become much easier!
> > > > > > > > >
> > > > > > > > > -- Mike
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > > > > [hidden email]
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I very much like the solution proposed by Bryan below.
> This
> > > would
> > > > > > allow
> > > > > > > > > > for a cleaner docker image as well, while still proving
> the
> > > > > > functionality
> > > > > > > > > > as needed. For sure, the extension registry will be
> great, but
> > > in
> > > > > > the mean
> > > > > > > > > > time this is an adequate mid step.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Chris
> > > > > > > > > >
> > > > > > > > > > On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
> [hidden email]
> > > > ,
> > > > > > wrote:
> > > > > > > > > > > Long term I'd like to see the extension registry take
> form
> > > and
> > > > > have
> > > > > > > > > > > that be the solution (#3).
> > > > > > > > > > >
> > > > > > > > > > > In the more near term, we could separate all of the
> NARs,
> > > > except
> > > > > > for
> > > > > > > > > > > framework and maybe standard processors & services,
> into a
> > > > > separate
> > > > > > > > > > > git repo.
> > > > > > > > > > >
> > > > > > > > > > > In that new git repo we could organize things like Joe
> N just
> > > > > > > > > > > described according to some kind of functional
> grouping. Each
> > > > of
> > > > > > these
> > > > > > > > > > > functional bundles could produce its own tar/zip which
> we can
> > > > > make
> > > > > > > > > > > available for download.
> > > > > > > > > > >
> > > > > > > > > > > That would separate the release cycles between core
> NiFi and
> > > > the
> > > > > > other
> > > > > > > > > > > NARs, and also avoid having any single binary artifact
> that
> > > > gets
> > > > > > too
> > > > > > > > > > > large.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > > > > [hidden email]
> > > > > > > > > > wrote:
> > > > > > > > > > > > just a random thought.
> > > > > > > > > > > >
> > > > > > > > > > > > Drop In Lib packs... All the Hadoop ones in one
> package for
> > > > > > example
> > > > > > > > > > that
> > > > > > > > > > > > can be added to a slim Nifi install. Another may be
> for
> > > > Cloud,
> > > > > or
> > > > > > > > > > Database
> > > > > > > > > > > > Interactions, Integration (JMS, FTP, etc) of course
> > > defining
> > > > > > these
> > > > > > > > > > groups
> > > > > > > > > > > > would be the tricky part... Or perhaps some type of
> > > installer
> > > > > > which
> > > > > > > > > > allows
> > > > > > > > > > > > you to elect which packages to download to add to
> the slim
> > > > > > install?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > > > [hidden email]
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Team,
> > > > > > > > > > > > >
> > > > > > > > > > > > > The NiFi convenience binary (tar.gz/zip) size has
> grown
> > > to
> > > > > > 1.1GB now
> > > > > > > > > > > > > in the latest release. Apache infra expanded it to
> 1.6GB
> > > > > > allowance
> > > > > > > > > > > > > for us but has stated this is the last time.
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/INFRA-15816
> > > > > > > > > > > > >
> > > > > > > > > > > > > We need consider:
> > > > > > > > > > > > > 1) removing old nars/less commonly used nars/or
> > > > particularly
> > > > > > massive
> > > > > > > > > > > > > nars from the assembly we distribute by default.
> Folks
> > > can
> > > > > > still use
> > > > > > > > > > > > > these things if they want just not from our
> convenience
> > > > > binary
> > > > > > > > > > > > > 2) collapsing nars with highly repeating deps
> > > > > > > > > > > > > 3) Getting the extension registry baked into the
> Flow
> > > > > Registry
> > > > > > then
> > > > > > > > > > > > > moving to separate releases for extension bundles.
> The
> > > main
> > > > > > release
> > > > > > > > > > > > > then would be just the NiFi framework.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Any other ideas ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'll plan to start identifying candiates for
> removal
> > > soon.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Joe
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Joseph
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Brett Ryan
Why are core modules not listing everything as provided?

IDE’s solve this problem with the use of dependency libraries. As an example NetBeans nbm’s have a single purpose, you must export the packages to be exposed.

We do the same with confluence modules using felix.

Why is NiFi doing things different just so the person who wants to install many custom nars can be lazy?

> On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
>
> I added some more stats to the wiki page, trying to determine what
> dependencies are included in jars. It seems like there is opportunity.
>
> Highlights, 50 copies of what appears to be some version of bcprov-jdk15
> for a total of 162M. 51 copies of jackson-databind.
>
> total size       copies  jar
>     30.97MB     65     META-INF/bundled-dependencies/commons-lang3-XXX.jar
>     32.53MB     50     META-INF/bundled-dependencies/bcpkix-jdk15on-XXX.jar
>     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
>     39.62MB      1     META-INF/bundled-dependencies/jython-shaded-XXX.jar
>     63.06MB     51
> META-INF/bundled-dependencies/jackson-databind-XXX.jar
>    162.07MB     50     META-INF/bundled-dependencies/bcprov-jdk15on-XXX.jar
>
>
>> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <[hidden email]> wrote:
>>
>> I tend to have feelings similar to Michael about a multi-repo approach.
>> I’ve rarely seen it help and more often seen it hurt — it’s confusing
>> (especially to newcomers), stuff gets neglected because it’s easier to
>> ignore, you need another master project or some such to do an entire build.
>>
>> Maybe git submodules could help mitigate this, but creating independent
>> assemblies or using different build profiles to enable building and
>> packaging the binaries in different ways would satisfy everything except
>> disentangling the releases.
>>
>> -joey
>>
>>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>, wrote:
>>> I agree... Long term extension registry, short term one repo with
>> different
>>> assemblies (e.g. standard, slim, analytic, etc...).
>>>
>>> Brandon
>>>
>>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
>> [hidden email]
>>> wrote:
>>>
>>>> Option #3 also has my preference. But it's probably a good idea to only
>>>> keep one git repo and play with the assembly and Maven profiles for the
>>>> releases, no? It'd be certainly easier for release management process.
>> But
>>>> this decision could also depend on how the option #3 is going to be
>>>> implemented I guess.
>>>>
>>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
>>>>
>>>>> thanks tony!
>>>>>
>>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
>>>>>>
>>>>>> I put some of the data I was working with on the wiki -
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
>> 1.5.0+nar+files
>>>>>>
>>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
>>>> wrote:
>>>>>>
>>>>>>> So my favorite option is Bryan’s option number “three” of using
>> the
>>>>>>> extension registry. Now my thought is do we really need to add
>>>>> complexity
>>>>>>> and do anything in the mean time or just focus on that? Meaning
>> we
>>>> have
>>>>>>> roughly 500mb of available capacity today so why don’t we spend
>> those
>>>>> man
>>>>>>> hours we would spend on getting the second repo up on the
>> extension
>>>>>>> registry instead?
>>>>>>>
>>>>>>> @Bryan do you have thoughts about the deployment of those bars
>> in the
>>>>>>> extension registry? Since we won’t be able to build the release
>>>> binary
>>>>>>> anymore would we still need to create separate repos for the
>> nars or
>>>>>> no?? I
>>>>>>> have used the registry a little but I’m not 100% sure on your
>> vision
>>>>> for
>>>>>>> the nars
>>>>>>>
>>>>>>> - Jeremy Dyer
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
>> wrote:
>>>>>>>>
>>>>>>>> I was looking at nar sizes, and thought some data may be
>> helpful. I
>>>>>> used
>>>>>>> my recent RC1 verification as a basis for getting file sizes, and
>>>> just
>>>>>> got
>>>>>>> the file size for each file in the assembly named "*.nar". I
>> don't
>>>> know
>>>>>>> whether the images I pasted in will go through, but I made some
>>>>> graphs.b
>>>>>>> The first is a histogram of nar file size in buckets of 10MB. The
>>>>> second
>>>>>>> basically is similar to a cumulative distribution, the x axis is
>> the
>>>>>> "rank"
>>>>>>> of the nar (smallest to largest), and the y-axis is how what
>> fraction
>>>>> of
>>>>>>> the all the sizes of the nars together are that rank or lower. In
>>>> other
>>>>>>> words, on the graph, the dot at 60 and ~27 means that the
>> smallest 60
>>>>>> nars
>>>>>>> contribute only ~27% of the total. Of note, the standard and
>>>> framework
>>>>>> nars
>>>>>>> are at 83 and 84.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
>>>> [hidden email]
>>>>>>> wrote:
>>>>>>>>> And of course, as I hit <send> I thought of one more thing.
>>>>>>>>>
>>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
>> the
>>>>>>>>> nifi-assembly part of the build could be broken up to build
>> core
>>>>> NiFi
>>>>>>>>> separately from the tar/zip functional grouping of other
>> NARs.
>>>>>>>>>
>>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
>>>> [hidden email]
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Long term I would also like to see #3 be the solution. I
>> think
>>>>> what
>>>>>>>>>> Joseph N described could be part of the capabilities of #3.
>>>>>>>>>>
>>>>>>>>>> I would like to add a note of caution with respect to
>>>> reorganizing
>>>>>> and
>>>>>>>>>> releasing extension bundles separately:
>>>>>>>>>>
>>>>>>>>>> - the burden on release manager expands because many more
>>>>>> projects
>>>>>>>>>> have to be released; probably not all on each release cycle
>>>> but
>>>>>> it
>>>>>>> could
>>>>>>>>>> still be many
>>>>>>>>>> - the chance of accidentally forgetting to release a
>> project
>>>>> in a
>>>>>>>>>> release cycle becomes non-zero
>>>>>>>>>> - sharing code between projects gets a bit harder because
>> you
>>>>>> have
>>>>>>> to
>>>>>>>>>> manage releasing projects in a specific order
>>>>>>>>>> - it becomes harder to find all of the projects that need
>> to
>>>>>> change
>>>>>>>>>> when shared code is added
>>>>>>>>>> - the simple act of finding code becomes harder ... in
>> which
>>>>>>> project
>>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
>>>> project,
>>>>>> but
>>>>>>> if they
>>>>>>>>>> search across multiple projects, then I haven't learned
>> how)
>>>>>>>>>>
>>>>>>>>>> I used to maintain several nars in separate projects, and
>>>> recently
>>>>>>>>>> reorganized them into 1 project (following NiFi's
>> multi-module
>>>>> maven
>>>>>>> build)
>>>>>>>>>> and life has become much easier!
>>>>>>>>>>
>>>>>>>>>> -- Mike
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
>>>>>>> [hidden email]
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I very much like the solution proposed by Bryan below.
>> This
>>>> would
>>>>>>> allow
>>>>>>>>>>> for a cleaner docker image as well, while still proving
>> the
>>>>>>> functionality
>>>>>>>>>>> as needed. For sure, the extension registry will be
>> great, but
>>>> in
>>>>>>> the mean
>>>>>>>>>>> time this is an adequate mid step.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
>> [hidden email]
>>>>> ,
>>>>>>> wrote:
>>>>>>>>>>>> Long term I'd like to see the extension registry take
>> form
>>>> and
>>>>>> have
>>>>>>>>>>>> that be the solution (#3).
>>>>>>>>>>>>
>>>>>>>>>>>> In the more near term, we could separate all of the
>> NARs,
>>>>> except
>>>>>>> for
>>>>>>>>>>>> framework and maybe standard processors & services,
>> into a
>>>>>> separate
>>>>>>>>>>>> git repo.
>>>>>>>>>>>>
>>>>>>>>>>>> In that new git repo we could organize things like Joe
>> N just
>>>>>>>>>>>> described according to some kind of functional
>> grouping. Each
>>>>> of
>>>>>>> these
>>>>>>>>>>>> functional bundles could produce its own tar/zip which
>> we can
>>>>>> make
>>>>>>>>>>>> available for download.
>>>>>>>>>>>>
>>>>>>>>>>>> That would separate the release cycles between core
>> NiFi and
>>>>> the
>>>>>>> other
>>>>>>>>>>>> NARs, and also avoid having any single binary artifact
>> that
>>>>> gets
>>>>>>> too
>>>>>>>>>>>> large.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
>>>>>>> [hidden email]
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> just a random thought.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
>> package for
>>>>>>> example
>>>>>>>>>>> that
>>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
>> for
>>>>> Cloud,
>>>>>> or
>>>>>>>>>>> Database
>>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
>>>> defining
>>>>>>> these
>>>>>>>>>>> groups
>>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
>>>> installer
>>>>>>> which
>>>>>>>>>>> allows
>>>>>>>>>>>>> you to elect which packages to download to add to
>> the slim
>>>>>>> install?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
>>>>> [hidden email]
>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Team,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
>> grown
>>>> to
>>>>>>> 1.1GB now
>>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
>> 1.6GB
>>>>>>> allowance
>>>>>>>>>>>>>> for us but has stated this is the last time.
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need consider:
>>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
>>>>> particularly
>>>>>>> massive
>>>>>>>>>>>>>> nars from the assembly we distribute by default.
>> Folks
>>>> can
>>>>>>> still use
>>>>>>>>>>>>>> these things if they want just not from our
>> convenience
>>>>>> binary
>>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
>>>>>>>>>>>>>> 3) Getting the extension registry baked into the
>> Flow
>>>>>> Registry
>>>>>>> then
>>>>>>>>>>>>>> moving to separate releases for extension bundles.
>> The
>>>> main
>>>>>>> release
>>>>>>>>>>>>>> then would be just the NiFi framework.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any other ideas ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll plan to start identifying candiates for
>> removal
>>>> soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Joe
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Joseph
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Mike Thomsen
Since the limit was bumped to 1.6GB, it might be prudent to not do too much
NiFi 1.X and instead focus on a comprehensive solution that coincides with
2.0. I think that would be a time when a lot of users might expect and be
tolerant of breaking changes on issues like this.

Also, is there a clear process for deprecating processors? If not, there
should be because it would be really helpful for doing cleanup.

On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <[hidden email]> wrote:

> Why are core modules not listing everything as provided?
>
> IDE’s solve this problem with the use of dependency libraries. As an
> example NetBeans nbm’s have a single purpose, you must export the packages
> to be exposed.
>
> We do the same with confluence modules using felix.
>
> Why is NiFi doing things different just so the person who wants to install
> many custom nars can be lazy?
>
> > On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
> >
> > I added some more stats to the wiki page, trying to determine what
> > dependencies are included in jars. It seems like there is opportunity.
> >
> > Highlights, 50 copies of what appears to be some version of bcprov-jdk15
> > for a total of 162M. 51 copies of jackson-databind.
> >
> > total size       copies  jar
> >     30.97MB     65     META-INF/bundled-dependencies/
> commons-lang3-XXX.jar
> >     32.53MB     50     META-INF/bundled-dependencies/
> bcpkix-jdk15on-XXX.jar
> >     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
> >     39.62MB      1     META-INF/bundled-dependencies/
> jython-shaded-XXX.jar
> >     63.06MB     51
> > META-INF/bundled-dependencies/jackson-databind-XXX.jar
> >    162.07MB     50     META-INF/bundled-dependencies/
> bcprov-jdk15on-XXX.jar
> >
> >
> >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <[hidden email]>
> wrote:
> >>
> >> I tend to have feelings similar to Michael about a multi-repo approach.
> >> I’ve rarely seen it help and more often seen it hurt — it’s confusing
> >> (especially to newcomers), stuff gets neglected because it’s easier to
> >> ignore, you need another master project or some such to do an entire
> build.
> >>
> >> Maybe git submodules could help mitigate this, but creating independent
> >> assemblies or using different build profiles to enable building and
> >> packaging the binaries in different ways would satisfy everything except
> >> disentangling the releases.
> >>
> >> -joey
> >>
> >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>, wrote:
> >>> I agree... Long term extension registry, short term one repo with
> >> different
> >>> assemblies (e.g. standard, slim, analytic, etc...).
> >>>
> >>> Brandon
> >>>
> >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
> >> [hidden email]
> >>> wrote:
> >>>
> >>>> Option #3 also has my preference. But it's probably a good idea to
> only
> >>>> keep one git repo and play with the assembly and Maven profiles for
> the
> >>>> releases, no? It'd be certainly easier for release management process.
> >> But
> >>>> this decision could also depend on how the option #3 is going to be
> >>>> implemented I guess.
> >>>>
> >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
> >>>>
> >>>>> thanks tony!
> >>>>>
> >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> >>>>>>
> >>>>>> I put some of the data I was working with on the wiki -
> >>>>>>
> >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
> >> 1.5.0+nar+files
> >>>>>>
> >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
> >>>> wrote:
> >>>>>>
> >>>>>>> So my favorite option is Bryan’s option number “three” of using
> >> the
> >>>>>>> extension registry. Now my thought is do we really need to add
> >>>>> complexity
> >>>>>>> and do anything in the mean time or just focus on that? Meaning
> >> we
> >>>> have
> >>>>>>> roughly 500mb of available capacity today so why don’t we spend
> >> those
> >>>>> man
> >>>>>>> hours we would spend on getting the second repo up on the
> >> extension
> >>>>>>> registry instead?
> >>>>>>>
> >>>>>>> @Bryan do you have thoughts about the deployment of those bars
> >> in the
> >>>>>>> extension registry? Since we won’t be able to build the release
> >>>> binary
> >>>>>>> anymore would we still need to create separate repos for the
> >> nars or
> >>>>>> no?? I
> >>>>>>> have used the registry a little but I’m not 100% sure on your
> >> vision
> >>>>> for
> >>>>>>> the nars
> >>>>>>>
> >>>>>>> - Jeremy Dyer
> >>>>>>>
> >>>>>>> Sent from my iPhone
> >>>>>>>
> >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
> >> wrote:
> >>>>>>>>
> >>>>>>>> I was looking at nar sizes, and thought some data may be
> >> helpful. I
> >>>>>> used
> >>>>>>> my recent RC1 verification as a basis for getting file sizes, and
> >>>> just
> >>>>>> got
> >>>>>>> the file size for each file in the assembly named "*.nar". I
> >> don't
> >>>> know
> >>>>>>> whether the images I pasted in will go through, but I made some
> >>>>> graphs.b
> >>>>>>> The first is a histogram of nar file size in buckets of 10MB. The
> >>>>> second
> >>>>>>> basically is similar to a cumulative distribution, the x axis is
> >> the
> >>>>>> "rank"
> >>>>>>> of the nar (smallest to largest), and the y-axis is how what
> >> fraction
> >>>>> of
> >>>>>>> the all the sizes of the nars together are that rank or lower. In
> >>>> other
> >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
> >> smallest 60
> >>>>>> nars
> >>>>>>> contribute only ~27% of the total. Of note, the standard and
> >>>> framework
> >>>>>> nars
> >>>>>>> are at 83 and 84.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> >>>> [hidden email]
> >>>>>>> wrote:
> >>>>>>>>> And of course, as I hit <send> I thought of one more thing.
> >>>>>>>>>
> >>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
> >> the
> >>>>>>>>> nifi-assembly part of the build could be broken up to build
> >> core
> >>>>> NiFi
> >>>>>>>>> separately from the tar/zip functional grouping of other
> >> NARs.
> >>>>>>>>>
> >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> >>>> [hidden email]
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Long term I would also like to see #3 be the solution. I
> >> think
> >>>>> what
> >>>>>>>>>> Joseph N described could be part of the capabilities of #3.
> >>>>>>>>>>
> >>>>>>>>>> I would like to add a note of caution with respect to
> >>>> reorganizing
> >>>>>> and
> >>>>>>>>>> releasing extension bundles separately:
> >>>>>>>>>>
> >>>>>>>>>> - the burden on release manager expands because many more
> >>>>>> projects
> >>>>>>>>>> have to be released; probably not all on each release cycle
> >>>> but
> >>>>>> it
> >>>>>>> could
> >>>>>>>>>> still be many
> >>>>>>>>>> - the chance of accidentally forgetting to release a
> >> project
> >>>>> in a
> >>>>>>>>>> release cycle becomes non-zero
> >>>>>>>>>> - sharing code between projects gets a bit harder because
> >> you
> >>>>>> have
> >>>>>>> to
> >>>>>>>>>> manage releasing projects in a specific order
> >>>>>>>>>> - it becomes harder to find all of the projects that need
> >> to
> >>>>>> change
> >>>>>>>>>> when shared code is added
> >>>>>>>>>> - the simple act of finding code becomes harder ... in
> >> which
> >>>>>>> project
> >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
> >>>> project,
> >>>>>> but
> >>>>>>> if they
> >>>>>>>>>> search across multiple projects, then I haven't learned
> >> how)
> >>>>>>>>>>
> >>>>>>>>>> I used to maintain several nars in separate projects, and
> >>>> recently
> >>>>>>>>>> reorganized them into 1 project (following NiFi's
> >> multi-module
> >>>>> maven
> >>>>>>> build)
> >>>>>>>>>> and life has become much easier!
> >>>>>>>>>>
> >>>>>>>>>> -- Mike
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> >>>>>>> [hidden email]
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I very much like the solution proposed by Bryan below.
> >> This
> >>>> would
> >>>>>>> allow
> >>>>>>>>>>> for a cleaner docker image as well, while still proving
> >> the
> >>>>>>> functionality
> >>>>>>>>>>> as needed. For sure, the extension registry will be
> >> great, but
> >>>> in
> >>>>>>> the mean
> >>>>>>>>>>> time this is an adequate mid step.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Chris
> >>>>>>>>>>>
> >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
> >> [hidden email]
> >>>>> ,
> >>>>>>> wrote:
> >>>>>>>>>>>> Long term I'd like to see the extension registry take
> >> form
> >>>> and
> >>>>>> have
> >>>>>>>>>>>> that be the solution (#3).
> >>>>>>>>>>>>
> >>>>>>>>>>>> In the more near term, we could separate all of the
> >> NARs,
> >>>>> except
> >>>>>>> for
> >>>>>>>>>>>> framework and maybe standard processors & services,
> >> into a
> >>>>>> separate
> >>>>>>>>>>>> git repo.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In that new git repo we could organize things like Joe
> >> N just
> >>>>>>>>>>>> described according to some kind of functional
> >> grouping. Each
> >>>>> of
> >>>>>>> these
> >>>>>>>>>>>> functional bundles could produce its own tar/zip which
> >> we can
> >>>>>> make
> >>>>>>>>>>>> available for download.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That would separate the release cycles between core
> >> NiFi and
> >>>>> the
> >>>>>>> other
> >>>>>>>>>>>> NARs, and also avoid having any single binary artifact
> >> that
> >>>>> gets
> >>>>>>> too
> >>>>>>>>>>>> large.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> >>>>>>> [hidden email]
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>> just a random thought.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
> >> package for
> >>>>>>> example
> >>>>>>>>>>> that
> >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
> >> for
> >>>>> Cloud,
> >>>>>> or
> >>>>>>>>>>> Database
> >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
> >>>> defining
> >>>>>>> these
> >>>>>>>>>>> groups
> >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
> >>>> installer
> >>>>>>> which
> >>>>>>>>>>> allows
> >>>>>>>>>>>>> you to elect which packages to download to add to
> >> the slim
> >>>>>>> install?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> >>>>> [hidden email]
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Team,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
> >> grown
> >>>> to
> >>>>>>> 1.1GB now
> >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
> >> 1.6GB
> >>>>>>> allowance
> >>>>>>>>>>>>>> for us but has stated this is the last time.
> >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We need consider:
> >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
> >>>>> particularly
> >>>>>>> massive
> >>>>>>>>>>>>>> nars from the assembly we distribute by default.
> >> Folks
> >>>> can
> >>>>>>> still use
> >>>>>>>>>>>>>> these things if they want just not from our
> >> convenience
> >>>>>> binary
> >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
> >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
> >> Flow
> >>>>>> Registry
> >>>>>>> then
> >>>>>>>>>>>>>> moving to separate releases for extension bundles.
> >> The
> >>>> main
> >>>>>>> release
> >>>>>>>>>>>>>> then would be just the NiFi framework.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Any other ideas ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'll plan to start identifying candiates for
> >> removal
> >>>> soon.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>> Joe
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Joseph
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

James Wing
I think a reduced build is a good way forward until the extension registry
is ready.  If we can publish the remaining processors in one or more
additional artifacts, that would be ideal.  The admin burden of more git
repositories or separate releases does not appeal to me, especially since
we do not believe it to be our long-term path.

It's not going to be easy to decide on a "core" build with "extras" sold
separately. But we will have to confront the division for the registry
solution in any case, we might as well get started on it.

On Sun, Jan 14, 2018 at 1:37 PM, Mike Thomsen <[hidden email]>
wrote:

> Since the limit was bumped to 1.6GB, it might be prudent to not do too much
> NiFi 1.X and instead focus on a comprehensive solution that coincides with
> 2.0. I think that would be a time when a lot of users might expect and be
> tolerant of breaking changes on issues like this.
>
> Also, is there a clear process for deprecating processors? If not, there
> should be because it would be really helpful for doing cleanup.
>
> On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <[hidden email]> wrote:
>
> > Why are core modules not listing everything as provided?
> >
> > IDE’s solve this problem with the use of dependency libraries. As an
> > example NetBeans nbm’s have a single purpose, you must export the
> packages
> > to be exposed.
> >
> > We do the same with confluence modules using felix.
> >
> > Why is NiFi doing things different just so the person who wants to
> install
> > many custom nars can be lazy?
> >
> > > On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
> > >
> > > I added some more stats to the wiki page, trying to determine what
> > > dependencies are included in jars. It seems like there is opportunity.
> > >
> > > Highlights, 50 copies of what appears to be some version of
> bcprov-jdk15
> > > for a total of 162M. 51 copies of jackson-databind.
> > >
> > > total size       copies  jar
> > >     30.97MB     65     META-INF/bundled-dependencies/
> > commons-lang3-XXX.jar
> > >     32.53MB     50     META-INF/bundled-dependencies/
> > bcpkix-jdk15on-XXX.jar
> > >     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
> > >     39.62MB      1     META-INF/bundled-dependencies/
> > jython-shaded-XXX.jar
> > >     63.06MB     51
> > > META-INF/bundled-dependencies/jackson-databind-XXX.jar
> > >    162.07MB     50     META-INF/bundled-dependencies/
> > bcprov-jdk15on-XXX.jar
> > >
> > >
> > >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <[hidden email]>
> > wrote:
> > >>
> > >> I tend to have feelings similar to Michael about a multi-repo
> approach.
> > >> I’ve rarely seen it help and more often seen it hurt — it’s confusing
> > >> (especially to newcomers), stuff gets neglected because it’s easier to
> > >> ignore, you need another master project or some such to do an entire
> > build.
> > >>
> > >> Maybe git submodules could help mitigate this, but creating
> independent
> > >> assemblies or using different build profiles to enable building and
> > >> packaging the binaries in different ways would satisfy everything
> except
> > >> disentangling the releases.
> > >>
> > >> -joey
> > >>
> > >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>,
> wrote:
> > >>> I agree... Long term extension registry, short term one repo with
> > >> different
> > >>> assemblies (e.g. standard, slim, analytic, etc...).
> > >>>
> > >>> Brandon
> > >>>
> > >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
> > >> [hidden email]
> > >>> wrote:
> > >>>
> > >>>> Option #3 also has my preference. But it's probably a good idea to
> > only
> > >>>> keep one git repo and play with the assembly and Maven profiles for
> > the
> > >>>> releases, no? It'd be certainly easier for release management
> process.
> > >> But
> > >>>> this decision could also depend on how the option #3 is going to be
> > >>>> implemented I guess.
> > >>>>
> > >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
> > >>>>
> > >>>>> thanks tony!
> > >>>>>
> > >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> > >>>>>>
> > >>>>>> I put some of the data I was working with on the wiki -
> > >>>>>>
> > >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
> > >> 1.5.0+nar+files
> > >>>>>>
> > >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> So my favorite option is Bryan’s option number “three” of using
> > >> the
> > >>>>>>> extension registry. Now my thought is do we really need to add
> > >>>>> complexity
> > >>>>>>> and do anything in the mean time or just focus on that? Meaning
> > >> we
> > >>>> have
> > >>>>>>> roughly 500mb of available capacity today so why don’t we spend
> > >> those
> > >>>>> man
> > >>>>>>> hours we would spend on getting the second repo up on the
> > >> extension
> > >>>>>>> registry instead?
> > >>>>>>>
> > >>>>>>> @Bryan do you have thoughts about the deployment of those bars
> > >> in the
> > >>>>>>> extension registry? Since we won’t be able to build the release
> > >>>> binary
> > >>>>>>> anymore would we still need to create separate repos for the
> > >> nars or
> > >>>>>> no?? I
> > >>>>>>> have used the registry a little but I’m not 100% sure on your
> > >> vision
> > >>>>> for
> > >>>>>>> the nars
> > >>>>>>>
> > >>>>>>> - Jeremy Dyer
> > >>>>>>>
> > >>>>>>> Sent from my iPhone
> > >>>>>>>
> > >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
> > >> wrote:
> > >>>>>>>>
> > >>>>>>>> I was looking at nar sizes, and thought some data may be
> > >> helpful. I
> > >>>>>> used
> > >>>>>>> my recent RC1 verification as a basis for getting file sizes, and
> > >>>> just
> > >>>>>> got
> > >>>>>>> the file size for each file in the assembly named "*.nar". I
> > >> don't
> > >>>> know
> > >>>>>>> whether the images I pasted in will go through, but I made some
> > >>>>> graphs.b
> > >>>>>>> The first is a histogram of nar file size in buckets of 10MB. The
> > >>>>> second
> > >>>>>>> basically is similar to a cumulative distribution, the x axis is
> > >> the
> > >>>>>> "rank"
> > >>>>>>> of the nar (smallest to largest), and the y-axis is how what
> > >> fraction
> > >>>>> of
> > >>>>>>> the all the sizes of the nars together are that rank or lower. In
> > >>>> other
> > >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
> > >> smallest 60
> > >>>>>> nars
> > >>>>>>> contribute only ~27% of the total. Of note, the standard and
> > >>>> framework
> > >>>>>> nars
> > >>>>>>> are at 83 and 84.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> > >>>> [hidden email]
> > >>>>>>> wrote:
> > >>>>>>>>> And of course, as I hit <send> I thought of one more thing.
> > >>>>>>>>>
> > >>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
> > >> the
> > >>>>>>>>> nifi-assembly part of the build could be broken up to build
> > >> core
> > >>>>> NiFi
> > >>>>>>>>> separately from the tar/zip functional grouping of other
> > >> NARs.
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> > >>>> [hidden email]
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Long term I would also like to see #3 be the solution. I
> > >> think
> > >>>>> what
> > >>>>>>>>>> Joseph N described could be part of the capabilities of #3.
> > >>>>>>>>>>
> > >>>>>>>>>> I would like to add a note of caution with respect to
> > >>>> reorganizing
> > >>>>>> and
> > >>>>>>>>>> releasing extension bundles separately:
> > >>>>>>>>>>
> > >>>>>>>>>> - the burden on release manager expands because many more
> > >>>>>> projects
> > >>>>>>>>>> have to be released; probably not all on each release cycle
> > >>>> but
> > >>>>>> it
> > >>>>>>> could
> > >>>>>>>>>> still be many
> > >>>>>>>>>> - the chance of accidentally forgetting to release a
> > >> project
> > >>>>> in a
> > >>>>>>>>>> release cycle becomes non-zero
> > >>>>>>>>>> - sharing code between projects gets a bit harder because
> > >> you
> > >>>>>> have
> > >>>>>>> to
> > >>>>>>>>>> manage releasing projects in a specific order
> > >>>>>>>>>> - it becomes harder to find all of the projects that need
> > >> to
> > >>>>>> change
> > >>>>>>>>>> when shared code is added
> > >>>>>>>>>> - the simple act of finding code becomes harder ... in
> > >> which
> > >>>>>>> project
> > >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
> > >>>> project,
> > >>>>>> but
> > >>>>>>> if they
> > >>>>>>>>>> search across multiple projects, then I haven't learned
> > >> how)
> > >>>>>>>>>>
> > >>>>>>>>>> I used to maintain several nars in separate projects, and
> > >>>> recently
> > >>>>>>>>>> reorganized them into 1 project (following NiFi's
> > >> multi-module
> > >>>>> maven
> > >>>>>>> build)
> > >>>>>>>>>> and life has become much easier!
> > >>>>>>>>>>
> > >>>>>>>>>> -- Mike
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > >>>>>>> [hidden email]
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> I very much like the solution proposed by Bryan below.
> > >> This
> > >>>> would
> > >>>>>>> allow
> > >>>>>>>>>>> for a cleaner docker image as well, while still proving
> > >> the
> > >>>>>>> functionality
> > >>>>>>>>>>> as needed. For sure, the extension registry will be
> > >> great, but
> > >>>> in
> > >>>>>>> the mean
> > >>>>>>>>>>> time this is an adequate mid step.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards,
> > >>>>>>>>>>> Chris
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
> > >> [hidden email]
> > >>>>> ,
> > >>>>>>> wrote:
> > >>>>>>>>>>>> Long term I'd like to see the extension registry take
> > >> form
> > >>>> and
> > >>>>>> have
> > >>>>>>>>>>>> that be the solution (#3).
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> In the more near term, we could separate all of the
> > >> NARs,
> > >>>>> except
> > >>>>>>> for
> > >>>>>>>>>>>> framework and maybe standard processors & services,
> > >> into a
> > >>>>>> separate
> > >>>>>>>>>>>> git repo.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> In that new git repo we could organize things like Joe
> > >> N just
> > >>>>>>>>>>>> described according to some kind of functional
> > >> grouping. Each
> > >>>>> of
> > >>>>>>> these
> > >>>>>>>>>>>> functional bundles could produce its own tar/zip which
> > >> we can
> > >>>>>> make
> > >>>>>>>>>>>> available for download.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> That would separate the release cycles between core
> > >> NiFi and
> > >>>>> the
> > >>>>>>> other
> > >>>>>>>>>>>> NARs, and also avoid having any single binary artifact
> > >> that
> > >>>>> gets
> > >>>>>>> too
> > >>>>>>>>>>>> large.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > >>>>>>> [hidden email]
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>> just a random thought.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
> > >> package for
> > >>>>>>> example
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
> > >> for
> > >>>>> Cloud,
> > >>>>>> or
> > >>>>>>>>>>> Database
> > >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
> > >>>> defining
> > >>>>>>> these
> > >>>>>>>>>>> groups
> > >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
> > >>>> installer
> > >>>>>>> which
> > >>>>>>>>>>> allows
> > >>>>>>>>>>>>> you to elect which packages to download to add to
> > >> the slim
> > >>>>>>> install?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > >>>>> [hidden email]
> > >>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Team,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
> > >> grown
> > >>>> to
> > >>>>>>> 1.1GB now
> > >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
> > >> 1.6GB
> > >>>>>>> allowance
> > >>>>>>>>>>>>>> for us but has stated this is the last time.
> > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> We need consider:
> > >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
> > >>>>> particularly
> > >>>>>>> massive
> > >>>>>>>>>>>>>> nars from the assembly we distribute by default.
> > >> Folks
> > >>>> can
> > >>>>>>> still use
> > >>>>>>>>>>>>>> these things if they want just not from our
> > >> convenience
> > >>>>>> binary
> > >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
> > >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
> > >> Flow
> > >>>>>> Registry
> > >>>>>>> then
> > >>>>>>>>>>>>>> moving to separate releases for extension bundles.
> > >> The
> > >>>> main
> > >>>>>>> release
> > >>>>>>>>>>>>>> then would be just the NiFi framework.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Any other ideas ?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'll plan to start identifying candiates for
> > >> removal
> > >>>> soon.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>> Joe
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>> Joseph
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Mike Thomsen
One possibility: 3 "packs." Such as:

1. Big Data.
2. Search
3. Non-BD NoSQL.

Each pack would be an assembly of NARs that correspond to the category.

The core would have JDBC support and all of the data mutator processors.

On Mon, Jan 15, 2018 at 11:54 PM, James Wing <[hidden email]> wrote:

> I think a reduced build is a good way forward until the extension registry
> is ready.  If we can publish the remaining processors in one or more
> additional artifacts, that would be ideal.  The admin burden of more git
> repositories or separate releases does not appeal to me, especially since
> we do not believe it to be our long-term path.
>
> It's not going to be easy to decide on a "core" build with "extras" sold
> separately. But we will have to confront the division for the registry
> solution in any case, we might as well get started on it.
>
> On Sun, Jan 14, 2018 at 1:37 PM, Mike Thomsen <[hidden email]>
> wrote:
>
> > Since the limit was bumped to 1.6GB, it might be prudent to not do too
> much
> > NiFi 1.X and instead focus on a comprehensive solution that coincides
> with
> > 2.0. I think that would be a time when a lot of users might expect and be
> > tolerant of breaking changes on issues like this.
> >
> > Also, is there a clear process for deprecating processors? If not, there
> > should be because it would be really helpful for doing cleanup.
> >
> > On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <[hidden email]>
> wrote:
> >
> > > Why are core modules not listing everything as provided?
> > >
> > > IDE’s solve this problem with the use of dependency libraries. As an
> > > example NetBeans nbm’s have a single purpose, you must export the
> > packages
> > > to be exposed.
> > >
> > > We do the same with confluence modules using felix.
> > >
> > > Why is NiFi doing things different just so the person who wants to
> > install
> > > many custom nars can be lazy?
> > >
> > > > On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
> > > >
> > > > I added some more stats to the wiki page, trying to determine what
> > > > dependencies are included in jars. It seems like there is
> opportunity.
> > > >
> > > > Highlights, 50 copies of what appears to be some version of
> > bcprov-jdk15
> > > > for a total of 162M. 51 copies of jackson-databind.
> > > >
> > > > total size       copies  jar
> > > >     30.97MB     65     META-INF/bundled-dependencies/
> > > commons-lang3-XXX.jar
> > > >     32.53MB     50     META-INF/bundled-dependencies/
> > > bcpkix-jdk15on-XXX.jar
> > > >     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
> > > >     39.62MB      1     META-INF/bundled-dependencies/
> > > jython-shaded-XXX.jar
> > > >     63.06MB     51
> > > > META-INF/bundled-dependencies/jackson-databind-XXX.jar
> > > >    162.07MB     50     META-INF/bundled-dependencies/
> > > bcprov-jdk15on-XXX.jar
> > > >
> > > >
> > > >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <
> [hidden email]>
> > > wrote:
> > > >>
> > > >> I tend to have feelings similar to Michael about a multi-repo
> > approach.
> > > >> I’ve rarely seen it help and more often seen it hurt — it’s
> confusing
> > > >> (especially to newcomers), stuff gets neglected because it’s easier
> to
> > > >> ignore, you need another master project or some such to do an entire
> > > build.
> > > >>
> > > >> Maybe git submodules could help mitigate this, but creating
> > independent
> > > >> assemblies or using different build profiles to enable building and
> > > >> packaging the binaries in different ways would satisfy everything
> > except
> > > >> disentangling the releases.
> > > >>
> > > >> -joey
> > > >>
> > > >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>,
> > wrote:
> > > >>> I agree... Long term extension registry, short term one repo with
> > > >> different
> > > >>> assemblies (e.g. standard, slim, analytic, etc...).
> > > >>>
> > > >>> Brandon
> > > >>>
> > > >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
> > > >> [hidden email]
> > > >>> wrote:
> > > >>>
> > > >>>> Option #3 also has my preference. But it's probably a good idea to
> > > only
> > > >>>> keep one git repo and play with the assembly and Maven profiles
> for
> > > the
> > > >>>> releases, no? It'd be certainly easier for release management
> > process.
> > > >> But
> > > >>>> this decision could also depend on how the option #3 is going to
> be
> > > >>>> implemented I guess.
> > > >>>>
> > > >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
> > > >>>>
> > > >>>>> thanks tony!
> > > >>>>>
> > > >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]> wrote:
> > > >>>>>>
> > > >>>>>> I put some of the data I was working with on the wiki -
> > > >>>>>>
> > > >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
> > > >> 1.5.0+nar+files
> > > >>>>>>
> > > >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <[hidden email]
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> So my favorite option is Bryan’s option number “three” of using
> > > >> the
> > > >>>>>>> extension registry. Now my thought is do we really need to add
> > > >>>>> complexity
> > > >>>>>>> and do anything in the mean time or just focus on that? Meaning
> > > >> we
> > > >>>> have
> > > >>>>>>> roughly 500mb of available capacity today so why don’t we spend
> > > >> those
> > > >>>>> man
> > > >>>>>>> hours we would spend on getting the second repo up on the
> > > >> extension
> > > >>>>>>> registry instead?
> > > >>>>>>>
> > > >>>>>>> @Bryan do you have thoughts about the deployment of those bars
> > > >> in the
> > > >>>>>>> extension registry? Since we won’t be able to build the release
> > > >>>> binary
> > > >>>>>>> anymore would we still need to create separate repos for the
> > > >> nars or
> > > >>>>>> no?? I
> > > >>>>>>> have used the registry a little but I’m not 100% sure on your
> > > >> vision
> > > >>>>> for
> > > >>>>>>> the nars
> > > >>>>>>>
> > > >>>>>>> - Jeremy Dyer
> > > >>>>>>>
> > > >>>>>>> Sent from my iPhone
> > > >>>>>>>
> > > >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
> > > >> wrote:
> > > >>>>>>>>
> > > >>>>>>>> I was looking at nar sizes, and thought some data may be
> > > >> helpful. I
> > > >>>>>> used
> > > >>>>>>> my recent RC1 verification as a basis for getting file sizes,
> and
> > > >>>> just
> > > >>>>>> got
> > > >>>>>>> the file size for each file in the assembly named "*.nar". I
> > > >> don't
> > > >>>> know
> > > >>>>>>> whether the images I pasted in will go through, but I made some
> > > >>>>> graphs.b
> > > >>>>>>> The first is a histogram of nar file size in buckets of 10MB.
> The
> > > >>>>> second
> > > >>>>>>> basically is similar to a cumulative distribution, the x axis
> is
> > > >> the
> > > >>>>>> "rank"
> > > >>>>>>> of the nar (smallest to largest), and the y-axis is how what
> > > >> fraction
> > > >>>>> of
> > > >>>>>>> the all the sizes of the nars together are that rank or lower.
> In
> > > >>>> other
> > > >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
> > > >> smallest 60
> > > >>>>>> nars
> > > >>>>>>> contribute only ~27% of the total. Of note, the standard and
> > > >>>> framework
> > > >>>>>> nars
> > > >>>>>>> are at 83 and 84.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
> > > >>>> [hidden email]
> > > >>>>>>> wrote:
> > > >>>>>>>>> And of course, as I hit <send> I thought of one more thing.
> > > >>>>>>>>>
> > > >>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
> > > >> the
> > > >>>>>>>>> nifi-assembly part of the build could be broken up to build
> > > >> core
> > > >>>>> NiFi
> > > >>>>>>>>> separately from the tar/zip functional grouping of other
> > > >> NARs.
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
> > > >>>> [hidden email]
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Long term I would also like to see #3 be the solution. I
> > > >> think
> > > >>>>> what
> > > >>>>>>>>>> Joseph N described could be part of the capabilities of #3.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I would like to add a note of caution with respect to
> > > >>>> reorganizing
> > > >>>>>> and
> > > >>>>>>>>>> releasing extension bundles separately:
> > > >>>>>>>>>>
> > > >>>>>>>>>> - the burden on release manager expands because many more
> > > >>>>>> projects
> > > >>>>>>>>>> have to be released; probably not all on each release cycle
> > > >>>> but
> > > >>>>>> it
> > > >>>>>>> could
> > > >>>>>>>>>> still be many
> > > >>>>>>>>>> - the chance of accidentally forgetting to release a
> > > >> project
> > > >>>>> in a
> > > >>>>>>>>>> release cycle becomes non-zero
> > > >>>>>>>>>> - sharing code between projects gets a bit harder because
> > > >> you
> > > >>>>>> have
> > > >>>>>>> to
> > > >>>>>>>>>> manage releasing projects in a specific order
> > > >>>>>>>>>> - it becomes harder to find all of the projects that need
> > > >> to
> > > >>>>>> change
> > > >>>>>>>>>> when shared code is added
> > > >>>>>>>>>> - the simple act of finding code becomes harder ... in
> > > >> which
> > > >>>>>>> project
> > > >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
> > > >>>> project,
> > > >>>>>> but
> > > >>>>>>> if they
> > > >>>>>>>>>> search across multiple projects, then I haven't learned
> > > >> how)
> > > >>>>>>>>>>
> > > >>>>>>>>>> I used to maintain several nars in separate projects, and
> > > >>>> recently
> > > >>>>>>>>>> reorganized them into 1 project (following NiFi's
> > > >> multi-module
> > > >>>>> maven
> > > >>>>>>> build)
> > > >>>>>>>>>> and life has become much easier!
> > > >>>>>>>>>>
> > > >>>>>>>>>> -- Mike
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
> > > >>>>>>> [hidden email]
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> I very much like the solution proposed by Bryan below.
> > > >> This
> > > >>>> would
> > > >>>>>>> allow
> > > >>>>>>>>>>> for a cleaner docker image as well, while still proving
> > > >> the
> > > >>>>>>> functionality
> > > >>>>>>>>>>> as needed. For sure, the extension registry will be
> > > >> great, but
> > > >>>> in
> > > >>>>>>> the mean
> > > >>>>>>>>>>> time this is an adequate mid step.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards,
> > > >>>>>>>>>>> Chris
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
> > > >> [hidden email]
> > > >>>>> ,
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>> Long term I'd like to see the extension registry take
> > > >> form
> > > >>>> and
> > > >>>>>> have
> > > >>>>>>>>>>>> that be the solution (#3).
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> In the more near term, we could separate all of the
> > > >> NARs,
> > > >>>>> except
> > > >>>>>>> for
> > > >>>>>>>>>>>> framework and maybe standard processors & services,
> > > >> into a
> > > >>>>>> separate
> > > >>>>>>>>>>>> git repo.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> In that new git repo we could organize things like Joe
> > > >> N just
> > > >>>>>>>>>>>> described according to some kind of functional
> > > >> grouping. Each
> > > >>>>> of
> > > >>>>>>> these
> > > >>>>>>>>>>>> functional bundles could produce its own tar/zip which
> > > >> we can
> > > >>>>>> make
> > > >>>>>>>>>>>> available for download.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> That would separate the release cycles between core
> > > >> NiFi and
> > > >>>>> the
> > > >>>>>>> other
> > > >>>>>>>>>>>> NARs, and also avoid having any single binary artifact
> > > >> that
> > > >>>>> gets
> > > >>>>>>> too
> > > >>>>>>>>>>>> large.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
> > > >>>>>>> [hidden email]
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>> just a random thought.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
> > > >> package for
> > > >>>>>>> example
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
> > > >> for
> > > >>>>> Cloud,
> > > >>>>>> or
> > > >>>>>>>>>>> Database
> > > >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
> > > >>>> defining
> > > >>>>>>> these
> > > >>>>>>>>>>> groups
> > > >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
> > > >>>> installer
> > > >>>>>>> which
> > > >>>>>>>>>>> allows
> > > >>>>>>>>>>>>> you to elect which packages to download to add to
> > > >> the slim
> > > >>>>>>> install?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
> > > >>>>> [hidden email]
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Team,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
> > > >> grown
> > > >>>> to
> > > >>>>>>> 1.1GB now
> > > >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
> > > >> 1.6GB
> > > >>>>>>> allowance
> > > >>>>>>>>>>>>>> for us but has stated this is the last time.
> > > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> We need consider:
> > > >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
> > > >>>>> particularly
> > > >>>>>>> massive
> > > >>>>>>>>>>>>>> nars from the assembly we distribute by default.
> > > >> Folks
> > > >>>> can
> > > >>>>>>> still use
> > > >>>>>>>>>>>>>> these things if they want just not from our
> > > >> convenience
> > > >>>>>> binary
> > > >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
> > > >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
> > > >> Flow
> > > >>>>>> Registry
> > > >>>>>>> then
> > > >>>>>>>>>>>>>> moving to separate releases for extension bundles.
> > > >> The
> > > >>>> main
> > > >>>>>>> release
> > > >>>>>>>>>>>>>> then would be just the NiFi framework.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Any other ideas ?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I'll plan to start identifying candiates for
> > > >> removal
> > > >>>> soon.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks
> > > >>>>>>>>>>>>>> Joe
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>> Joseph
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Mike Thomsen
Also maybe #4: Message Queue support (JMS, Kafka, etc.)

On Tue, Jan 16, 2018 at 5:13 AM, Mike Thomsen <[hidden email]>
wrote:

> One possibility: 3 "packs." Such as:
>
> 1. Big Data.
> 2. Search
> 3. Non-BD NoSQL.
>
> Each pack would be an assembly of NARs that correspond to the category.
>
> The core would have JDBC support and all of the data mutator processors.
>
> On Mon, Jan 15, 2018 at 11:54 PM, James Wing <[hidden email]> wrote:
>
>> I think a reduced build is a good way forward until the extension registry
>> is ready.  If we can publish the remaining processors in one or more
>> additional artifacts, that would be ideal.  The admin burden of more git
>> repositories or separate releases does not appeal to me, especially since
>> we do not believe it to be our long-term path.
>>
>> It's not going to be easy to decide on a "core" build with "extras" sold
>> separately. But we will have to confront the division for the registry
>> solution in any case, we might as well get started on it.
>>
>> On Sun, Jan 14, 2018 at 1:37 PM, Mike Thomsen <[hidden email]>
>> wrote:
>>
>> > Since the limit was bumped to 1.6GB, it might be prudent to not do too
>> much
>> > NiFi 1.X and instead focus on a comprehensive solution that coincides
>> with
>> > 2.0. I think that would be a time when a lot of users might expect and
>> be
>> > tolerant of breaking changes on issues like this.
>> >
>> > Also, is there a clear process for deprecating processors? If not, there
>> > should be because it would be really helpful for doing cleanup.
>> >
>> > On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <[hidden email]>
>> wrote:
>> >
>> > > Why are core modules not listing everything as provided?
>> > >
>> > > IDE’s solve this problem with the use of dependency libraries. As an
>> > > example NetBeans nbm’s have a single purpose, you must export the
>> > packages
>> > > to be exposed.
>> > >
>> > > We do the same with confluence modules using felix.
>> > >
>> > > Why is NiFi doing things different just so the person who wants to
>> > install
>> > > many custom nars can be lazy?
>> > >
>> > > > On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
>> > > >
>> > > > I added some more stats to the wiki page, trying to determine what
>> > > > dependencies are included in jars. It seems like there is
>> opportunity.
>> > > >
>> > > > Highlights, 50 copies of what appears to be some version of
>> > bcprov-jdk15
>> > > > for a total of 162M. 51 copies of jackson-databind.
>> > > >
>> > > > total size       copies  jar
>> > > >     30.97MB     65     META-INF/bundled-dependencies/
>> > > commons-lang3-XXX.jar
>> > > >     32.53MB     50     META-INF/bundled-dependencies/
>> > > bcpkix-jdk15on-XXX.jar
>> > > >     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
>> > > >     39.62MB      1     META-INF/bundled-dependencies/
>> > > jython-shaded-XXX.jar
>> > > >     63.06MB     51
>> > > > META-INF/bundled-dependencies/jackson-databind-XXX.jar
>> > > >    162.07MB     50     META-INF/bundled-dependencies/
>> > > bcprov-jdk15on-XXX.jar
>> > > >
>> > > >
>> > > >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <
>> [hidden email]>
>> > > wrote:
>> > > >>
>> > > >> I tend to have feelings similar to Michael about a multi-repo
>> > approach.
>> > > >> I’ve rarely seen it help and more often seen it hurt — it’s
>> confusing
>> > > >> (especially to newcomers), stuff gets neglected because it’s
>> easier to
>> > > >> ignore, you need another master project or some such to do an
>> entire
>> > > build.
>> > > >>
>> > > >> Maybe git submodules could help mitigate this, but creating
>> > independent
>> > > >> assemblies or using different build profiles to enable building and
>> > > >> packaging the binaries in different ways would satisfy everything
>> > except
>> > > >> disentangling the releases.
>> > > >>
>> > > >> -joey
>> > > >>
>> > > >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>,
>> > wrote:
>> > > >>> I agree... Long term extension registry, short term one repo with
>> > > >> different
>> > > >>> assemblies (e.g. standard, slim, analytic, etc...).
>> > > >>>
>> > > >>> Brandon
>> > > >>>
>> > > >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
>> > > >> [hidden email]
>> > > >>> wrote:
>> > > >>>
>> > > >>>> Option #3 also has my preference. But it's probably a good idea
>> to
>> > > only
>> > > >>>> keep one git repo and play with the assembly and Maven profiles
>> for
>> > > the
>> > > >>>> releases, no? It'd be certainly easier for release management
>> > process.
>> > > >> But
>> > > >>>> this decision could also depend on how the option #3 is going to
>> be
>> > > >>>> implemented I guess.
>> > > >>>>
>> > > >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
>> > > >>>>
>> > > >>>>> thanks tony!
>> > > >>>>>
>> > > >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]>
>> wrote:
>> > > >>>>>>
>> > > >>>>>> I put some of the data I was working with on the wiki -
>> > > >>>>>>
>> > > >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
>> > > >> 1.5.0+nar+files
>> > > >>>>>>
>> > > >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <
>> [hidden email]
>> > > >>>> wrote:
>> > > >>>>>>
>> > > >>>>>>> So my favorite option is Bryan’s option number “three” of
>> using
>> > > >> the
>> > > >>>>>>> extension registry. Now my thought is do we really need to add
>> > > >>>>> complexity
>> > > >>>>>>> and do anything in the mean time or just focus on that?
>> Meaning
>> > > >> we
>> > > >>>> have
>> > > >>>>>>> roughly 500mb of available capacity today so why don’t we
>> spend
>> > > >> those
>> > > >>>>> man
>> > > >>>>>>> hours we would spend on getting the second repo up on the
>> > > >> extension
>> > > >>>>>>> registry instead?
>> > > >>>>>>>
>> > > >>>>>>> @Bryan do you have thoughts about the deployment of those bars
>> > > >> in the
>> > > >>>>>>> extension registry? Since we won’t be able to build the
>> release
>> > > >>>> binary
>> > > >>>>>>> anymore would we still need to create separate repos for the
>> > > >> nars or
>> > > >>>>>> no?? I
>> > > >>>>>>> have used the registry a little but I’m not 100% sure on your
>> > > >> vision
>> > > >>>>> for
>> > > >>>>>>> the nars
>> > > >>>>>>>
>> > > >>>>>>> - Jeremy Dyer
>> > > >>>>>>>
>> > > >>>>>>> Sent from my iPhone
>> > > >>>>>>>
>> > > >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
>> > > >> wrote:
>> > > >>>>>>>>
>> > > >>>>>>>> I was looking at nar sizes, and thought some data may be
>> > > >> helpful. I
>> > > >>>>>> used
>> > > >>>>>>> my recent RC1 verification as a basis for getting file sizes,
>> and
>> > > >>>> just
>> > > >>>>>> got
>> > > >>>>>>> the file size for each file in the assembly named "*.nar". I
>> > > >> don't
>> > > >>>> know
>> > > >>>>>>> whether the images I pasted in will go through, but I made
>> some
>> > > >>>>> graphs.b
>> > > >>>>>>> The first is a histogram of nar file size in buckets of 10MB.
>> The
>> > > >>>>> second
>> > > >>>>>>> basically is similar to a cumulative distribution, the x axis
>> is
>> > > >> the
>> > > >>>>>> "rank"
>> > > >>>>>>> of the nar (smallest to largest), and the y-axis is how what
>> > > >> fraction
>> > > >>>>> of
>> > > >>>>>>> the all the sizes of the nars together are that rank or
>> lower. In
>> > > >>>> other
>> > > >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
>> > > >> smallest 60
>> > > >>>>>> nars
>> > > >>>>>>> contribute only ~27% of the total. Of note, the standard and
>> > > >>>> framework
>> > > >>>>>> nars
>> > > >>>>>>> are at 83 and 84.
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
>> > > >>>> [hidden email]
>> > > >>>>>>> wrote:
>> > > >>>>>>>>> And of course, as I hit <send> I thought of one more thing.
>> > > >>>>>>>>>
>> > > >>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
>> > > >> the
>> > > >>>>>>>>> nifi-assembly part of the build could be broken up to build
>> > > >> core
>> > > >>>>> NiFi
>> > > >>>>>>>>> separately from the tar/zip functional grouping of other
>> > > >> NARs.
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
>> > > >>>> [hidden email]
>> > > >>>>>>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> Long term I would also like to see #3 be the solution. I
>> > > >> think
>> > > >>>>> what
>> > > >>>>>>>>>> Joseph N described could be part of the capabilities of #3.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> I would like to add a note of caution with respect to
>> > > >>>> reorganizing
>> > > >>>>>> and
>> > > >>>>>>>>>> releasing extension bundles separately:
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> - the burden on release manager expands because many more
>> > > >>>>>> projects
>> > > >>>>>>>>>> have to be released; probably not all on each release cycle
>> > > >>>> but
>> > > >>>>>> it
>> > > >>>>>>> could
>> > > >>>>>>>>>> still be many
>> > > >>>>>>>>>> - the chance of accidentally forgetting to release a
>> > > >> project
>> > > >>>>> in a
>> > > >>>>>>>>>> release cycle becomes non-zero
>> > > >>>>>>>>>> - sharing code between projects gets a bit harder because
>> > > >> you
>> > > >>>>>> have
>> > > >>>>>>> to
>> > > >>>>>>>>>> manage releasing projects in a specific order
>> > > >>>>>>>>>> - it becomes harder to find all of the projects that need
>> > > >> to
>> > > >>>>>> change
>> > > >>>>>>>>>> when shared code is added
>> > > >>>>>>>>>> - the simple act of finding code becomes harder ... in
>> > > >> which
>> > > >>>>>>> project
>> > > >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
>> > > >>>> project,
>> > > >>>>>> but
>> > > >>>>>>> if they
>> > > >>>>>>>>>> search across multiple projects, then I haven't learned
>> > > >> how)
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> I used to maintain several nars in separate projects, and
>> > > >>>> recently
>> > > >>>>>>>>>> reorganized them into 1 project (following NiFi's
>> > > >> multi-module
>> > > >>>>> maven
>> > > >>>>>>> build)
>> > > >>>>>>>>>> and life has become much easier!
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> -- Mike
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
>> > > >>>>>>> [hidden email]
>> > > >>>>>>>>>> wrote:
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>> I very much like the solution proposed by Bryan below.
>> > > >> This
>> > > >>>> would
>> > > >>>>>>> allow
>> > > >>>>>>>>>>> for a cleaner docker image as well, while still proving
>> > > >> the
>> > > >>>>>>> functionality
>> > > >>>>>>>>>>> as needed. For sure, the extension registry will be
>> > > >> great, but
>> > > >>>> in
>> > > >>>>>>> the mean
>> > > >>>>>>>>>>> time this is an adequate mid step.
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> Regards,
>> > > >>>>>>>>>>> Chris
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
>> > > >> [hidden email]
>> > > >>>>> ,
>> > > >>>>>>> wrote:
>> > > >>>>>>>>>>>> Long term I'd like to see the extension registry take
>> > > >> form
>> > > >>>> and
>> > > >>>>>> have
>> > > >>>>>>>>>>>> that be the solution (#3).
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> In the more near term, we could separate all of the
>> > > >> NARs,
>> > > >>>>> except
>> > > >>>>>>> for
>> > > >>>>>>>>>>>> framework and maybe standard processors & services,
>> > > >> into a
>> > > >>>>>> separate
>> > > >>>>>>>>>>>> git repo.
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> In that new git repo we could organize things like Joe
>> > > >> N just
>> > > >>>>>>>>>>>> described according to some kind of functional
>> > > >> grouping. Each
>> > > >>>>> of
>> > > >>>>>>> these
>> > > >>>>>>>>>>>> functional bundles could produce its own tar/zip which
>> > > >> we can
>> > > >>>>>> make
>> > > >>>>>>>>>>>> available for download.
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> That would separate the release cycles between core
>> > > >> NiFi and
>> > > >>>>> the
>> > > >>>>>>> other
>> > > >>>>>>>>>>>> NARs, and also avoid having any single binary artifact
>> > > >> that
>> > > >>>>> gets
>> > > >>>>>>> too
>> > > >>>>>>>>>>>> large.
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>>
>> > > >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
>> > > >>>>>>> [hidden email]
>> > > >>>>>>>>>>> wrote:
>> > > >>>>>>>>>>>>> just a random thought.
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
>> > > >> package for
>> > > >>>>>>> example
>> > > >>>>>>>>>>> that
>> > > >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
>> > > >> for
>> > > >>>>> Cloud,
>> > > >>>>>> or
>> > > >>>>>>>>>>> Database
>> > > >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
>> > > >>>> defining
>> > > >>>>>>> these
>> > > >>>>>>>>>>> groups
>> > > >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
>> > > >>>> installer
>> > > >>>>>>> which
>> > > >>>>>>>>>>> allows
>> > > >>>>>>>>>>>>> you to elect which packages to download to add to
>> > > >> the slim
>> > > >>>>>>> install?
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
>> > > >>>>> [hidden email]
>> > > >>>>>>> wrote:
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> Team,
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
>> > > >> grown
>> > > >>>> to
>> > > >>>>>>> 1.1GB now
>> > > >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
>> > > >> 1.6GB
>> > > >>>>>>> allowance
>> > > >>>>>>>>>>>>>> for us but has stated this is the last time.
>> > > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> We need consider:
>> > > >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
>> > > >>>>> particularly
>> > > >>>>>>> massive
>> > > >>>>>>>>>>>>>> nars from the assembly we distribute by default.
>> > > >> Folks
>> > > >>>> can
>> > > >>>>>>> still use
>> > > >>>>>>>>>>>>>> these things if they want just not from our
>> > > >> convenience
>> > > >>>>>> binary
>> > > >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
>> > > >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
>> > > >> Flow
>> > > >>>>>> Registry
>> > > >>>>>>> then
>> > > >>>>>>>>>>>>>> moving to separate releases for extension bundles.
>> > > >> The
>> > > >>>> main
>> > > >>>>>>> release
>> > > >>>>>>>>>>>>>> then would be just the NiFi framework.
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> Any other ideas ?
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> I'll plan to start identifying candiates for
>> > > >> removal
>> > > >>>> soon.
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>> Thanks
>> > > >>>>>>>>>>>>>> Joe
>> > > >>>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>>
>> > > >>>>>>>>>>>>> --
>> > > >>>>>>>>>>>>> Joseph
>> > > >>>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Apache NiFi distribution has grown too large

Bryan Bende
I still like the "NAR packs" idea even for the single repo approach. I
think if we only provide a "light" binary and then say that everything
else has to be built on your own, it creates a big barrier to entry
for a lot of users. With the NAR packs approach we could provide one
binary that is the actual application, and then multiple zips/tars
that each contain a set of NARs. So someone gets the first binary and
then adds whichever NAR packs to it. This solves the immediate problem
of having any single binary exceed a certain size.

As a side effect of whatever we do, I was also hoping we could make
the build process easier for folks working on the framework. If all we
do is change our current assembly, I think you'd still incur the time
of building all the NARs since they are listed in the modules section
nifi-nar-bundles pom, even though most of them wouldn't be included in
the new "light" assembly. We'd have to consider restructuring the git
repo a little bit if this was something we wanted to do. Possibly the
top-level could be divided into "nifi-core" and "nifi-nar-bundles",
where nifi-core produced the light assembly so folks working on the
framework can build this quickly, but if you want to build everything
then you build from the root pom which also builds all the NAR packs.
Just something to think about if we are going to make changes.

Regarding the duplication of many JARs (thanks for putting the data
together Tony!)...

We could try to collapse common dependencies so that we don't end up
with so many duplicate copies of the same JAR, but I don't know
exactly how we'd set this up...

We could promote a JAR to the lib directory which makes it visible to
every single NAR and thus no longer needs to be bundled into each NAR.
That works great for the NARs that already use the dependency, but now
means that a bunch of other NARs have this extra thing on the
classpath, and also means we are forcing the version of that library
upon every NAR which somewhat defeats the purpose of NARs.

We could create "lib" NARs, similar to the original intent of
nifi-hadoop-libraries-nar. For example, we could create
nifi-jackson-libraries-nar, and then any NAR that needs jackson would
have this as their parent. This gets tricky when their is more than
one library in play, for example lets say we also had
nifi-bcprov-libraries-nar, and then some other NAR needs jackson and
bcprov, there can be only one parent NAR so you can only pick one of
them. You could chain things together, but then how do you decide the
order of the chain... nifi-xyz-nar -> nifi-jackson-nar ->
nifi-bcprov-nar  VS. nifi-xyz-nar -> nifi-bcprov-nar ->
nifi-jackson-nar.

Right now having a NAR dependency is like single class inheritance,
and it seems like we would also need a mix-in style NAR dependency to
be able to add multiple lib NARs without getting into this chaining
issue.


On Tue, Jan 16, 2018 at 5:14 AM, Mike Thomsen <[hidden email]> wrote:

> Also maybe #4: Message Queue support (JMS, Kafka, etc.)
>
> On Tue, Jan 16, 2018 at 5:13 AM, Mike Thomsen <[hidden email]>
> wrote:
>
>> One possibility: 3 "packs." Such as:
>>
>> 1. Big Data.
>> 2. Search
>> 3. Non-BD NoSQL.
>>
>> Each pack would be an assembly of NARs that correspond to the category.
>>
>> The core would have JDBC support and all of the data mutator processors.
>>
>> On Mon, Jan 15, 2018 at 11:54 PM, James Wing <[hidden email]> wrote:
>>
>>> I think a reduced build is a good way forward until the extension registry
>>> is ready.  If we can publish the remaining processors in one or more
>>> additional artifacts, that would be ideal.  The admin burden of more git
>>> repositories or separate releases does not appeal to me, especially since
>>> we do not believe it to be our long-term path.
>>>
>>> It's not going to be easy to decide on a "core" build with "extras" sold
>>> separately. But we will have to confront the division for the registry
>>> solution in any case, we might as well get started on it.
>>>
>>> On Sun, Jan 14, 2018 at 1:37 PM, Mike Thomsen <[hidden email]>
>>> wrote:
>>>
>>> > Since the limit was bumped to 1.6GB, it might be prudent to not do too
>>> much
>>> > NiFi 1.X and instead focus on a comprehensive solution that coincides
>>> with
>>> > 2.0. I think that would be a time when a lot of users might expect and
>>> be
>>> > tolerant of breaking changes on issues like this.
>>> >
>>> > Also, is there a clear process for deprecating processors? If not, there
>>> > should be because it would be really helpful for doing cleanup.
>>> >
>>> > On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <[hidden email]>
>>> wrote:
>>> >
>>> > > Why are core modules not listing everything as provided?
>>> > >
>>> > > IDE’s solve this problem with the use of dependency libraries. As an
>>> > > example NetBeans nbm’s have a single purpose, you must export the
>>> > packages
>>> > > to be exposed.
>>> > >
>>> > > We do the same with confluence modules using felix.
>>> > >
>>> > > Why is NiFi doing things different just so the person who wants to
>>> > install
>>> > > many custom nars can be lazy?
>>> > >
>>> > > > On 14 Jan 2018, at 08:59, Tony Kurc <[hidden email]> wrote:
>>> > > >
>>> > > > I added some more stats to the wiki page, trying to determine what
>>> > > > dependencies are included in jars. It seems like there is
>>> opportunity.
>>> > > >
>>> > > > Highlights, 50 copies of what appears to be some version of
>>> > bcprov-jdk15
>>> > > > for a total of 162M. 51 copies of jackson-databind.
>>> > > >
>>> > > > total size       copies  jar
>>> > > >     30.97MB     65     META-INF/bundled-dependencies/
>>> > > commons-lang3-XXX.jar
>>> > > >     32.53MB     50     META-INF/bundled-dependencies/
>>> > > bcpkix-jdk15on-XXX.jar
>>> > > >     33.55MB     16     META-INF/bundled-dependencies/guava-XXX.jar
>>> > > >     39.62MB      1     META-INF/bundled-dependencies/
>>> > > jython-shaded-XXX.jar
>>> > > >     63.06MB     51
>>> > > > META-INF/bundled-dependencies/jackson-databind-XXX.jar
>>> > > >    162.07MB     50     META-INF/bundled-dependencies/
>>> > > bcprov-jdk15on-XXX.jar
>>> > > >
>>> > > >
>>> > > >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <
>>> [hidden email]>
>>> > > wrote:
>>> > > >>
>>> > > >> I tend to have feelings similar to Michael about a multi-repo
>>> > approach.
>>> > > >> I’ve rarely seen it help and more often seen it hurt — it’s
>>> confusing
>>> > > >> (especially to newcomers), stuff gets neglected because it’s
>>> easier to
>>> > > >> ignore, you need another master project or some such to do an
>>> entire
>>> > > build.
>>> > > >>
>>> > > >> Maybe git submodules could help mitigate this, but creating
>>> > independent
>>> > > >> assemblies or using different build profiles to enable building and
>>> > > >> packaging the binaries in different ways would satisfy everything
>>> > except
>>> > > >> disentangling the releases.
>>> > > >>
>>> > > >> -joey
>>> > > >>
>>> > > >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries <[hidden email]>,
>>> > wrote:
>>> > > >>> I agree... Long term extension registry, short term one repo with
>>> > > >> different
>>> > > >>> assemblies (e.g. standard, slim, analytic, etc...).
>>> > > >>>
>>> > > >>> Brandon
>>> > > >>>
>>> > > >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
>>> > > >> [hidden email]
>>> > > >>> wrote:
>>> > > >>>
>>> > > >>>> Option #3 also has my preference. But it's probably a good idea
>>> to
>>> > > only
>>> > > >>>> keep one git repo and play with the assembly and Maven profiles
>>> for
>>> > > the
>>> > > >>>> releases, no? It'd be certainly easier for release management
>>> > process.
>>> > > >> But
>>> > > >>>> this decision could also depend on how the option #3 is going to
>>> be
>>> > > >>>> implemented I guess.
>>> > > >>>>
>>> > > >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <[hidden email]>:
>>> > > >>>>
>>> > > >>>>> thanks tony!
>>> > > >>>>>
>>> > > >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <[hidden email]>
>>> wrote:
>>> > > >>>>>>
>>> > > >>>>>> I put some of the data I was working with on the wiki -
>>> > > >>>>>>
>>> > > >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
>>> > > >> 1.5.0+nar+files
>>> > > >>>>>>
>>> > > >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <
>>> [hidden email]
>>> > > >>>> wrote:
>>> > > >>>>>>
>>> > > >>>>>>> So my favorite option is Bryan’s option number “three” of
>>> using
>>> > > >> the
>>> > > >>>>>>> extension registry. Now my thought is do we really need to add
>>> > > >>>>> complexity
>>> > > >>>>>>> and do anything in the mean time or just focus on that?
>>> Meaning
>>> > > >> we
>>> > > >>>> have
>>> > > >>>>>>> roughly 500mb of available capacity today so why don’t we
>>> spend
>>> > > >> those
>>> > > >>>>> man
>>> > > >>>>>>> hours we would spend on getting the second repo up on the
>>> > > >> extension
>>> > > >>>>>>> registry instead?
>>> > > >>>>>>>
>>> > > >>>>>>> @Bryan do you have thoughts about the deployment of those bars
>>> > > >> in the
>>> > > >>>>>>> extension registry? Since we won’t be able to build the
>>> release
>>> > > >>>> binary
>>> > > >>>>>>> anymore would we still need to create separate repos for the
>>> > > >> nars or
>>> > > >>>>>> no?? I
>>> > > >>>>>>> have used the registry a little but I’m not 100% sure on your
>>> > > >> vision
>>> > > >>>>> for
>>> > > >>>>>>> the nars
>>> > > >>>>>>>
>>> > > >>>>>>> - Jeremy Dyer
>>> > > >>>>>>>
>>> > > >>>>>>> Sent from my iPhone
>>> > > >>>>>>>
>>> > > >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc <[hidden email]>
>>> > > >> wrote:
>>> > > >>>>>>>>
>>> > > >>>>>>>> I was looking at nar sizes, and thought some data may be
>>> > > >> helpful. I
>>> > > >>>>>> used
>>> > > >>>>>>> my recent RC1 verification as a basis for getting file sizes,
>>> and
>>> > > >>>> just
>>> > > >>>>>> got
>>> > > >>>>>>> the file size for each file in the assembly named "*.nar". I
>>> > > >> don't
>>> > > >>>> know
>>> > > >>>>>>> whether the images I pasted in will go through, but I made
>>> some
>>> > > >>>>> graphs.b
>>> > > >>>>>>> The first is a histogram of nar file size in buckets of 10MB.
>>> The
>>> > > >>>>> second
>>> > > >>>>>>> basically is similar to a cumulative distribution, the x axis
>>> is
>>> > > >> the
>>> > > >>>>>> "rank"
>>> > > >>>>>>> of the nar (smallest to largest), and the y-axis is how what
>>> > > >> fraction
>>> > > >>>>> of
>>> > > >>>>>>> the all the sizes of the nars together are that rank or
>>> lower. In
>>> > > >>>> other
>>> > > >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
>>> > > >> smallest 60
>>> > > >>>>>> nars
>>> > > >>>>>>> contribute only ~27% of the total. Of note, the standard and
>>> > > >>>> framework
>>> > > >>>>>> nars
>>> > > >>>>>>> are at 83 and 84.
>>> > > >>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
>>> > > >>>> [hidden email]
>>> > > >>>>>>> wrote:
>>> > > >>>>>>>>> And of course, as I hit <send> I thought of one more thing.
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> We could keep all of the code in 1 git repo (1 project) but
>>> > > >> the
>>> > > >>>>>>>>> nifi-assembly part of the build could be broken up to build
>>> > > >> core
>>> > > >>>>> NiFi
>>> > > >>>>>>>>> separately from the tar/zip functional grouping of other
>>> > > >> NARs.
>>> > > >>>>>>>>>
>>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
>>> > > >>>> [hidden email]
>>> > > >>>>>>> wrote:
>>> > > >>>>>>>>>
>>> > > >>>>>>>>>> Long term I would also like to see #3 be the solution. I
>>> > > >> think
>>> > > >>>>> what
>>> > > >>>>>>>>>> Joseph N described could be part of the capabilities of #3.
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> I would like to add a note of caution with respect to
>>> > > >>>> reorganizing
>>> > > >>>>>> and
>>> > > >>>>>>>>>> releasing extension bundles separately:
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> - the burden on release manager expands because many more
>>> > > >>>>>> projects
>>> > > >>>>>>>>>> have to be released; probably not all on each release cycle
>>> > > >>>> but
>>> > > >>>>>> it
>>> > > >>>>>>> could
>>> > > >>>>>>>>>> still be many
>>> > > >>>>>>>>>> - the chance of accidentally forgetting to release a
>>> > > >> project
>>> > > >>>>> in a
>>> > > >>>>>>>>>> release cycle becomes non-zero
>>> > > >>>>>>>>>> - sharing code between projects gets a bit harder because
>>> > > >> you
>>> > > >>>>>> have
>>> > > >>>>>>> to
>>> > > >>>>>>>>>> manage releasing projects in a specific order
>>> > > >>>>>>>>>> - it becomes harder to find all of the projects that need
>>> > > >> to
>>> > > >>>>>> change
>>> > > >>>>>>>>>> when shared code is added
>>> > > >>>>>>>>>> - the simple act of finding code becomes harder ... in
>>> > > >> which
>>> > > >>>>>>> project
>>> > > >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
>>> > > >>>> project,
>>> > > >>>>>> but
>>> > > >>>>>>> if they
>>> > > >>>>>>>>>> search across multiple projects, then I haven't learned
>>> > > >> how)
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> I used to maintain several nars in separate projects, and
>>> > > >>>> recently
>>> > > >>>>>>>>>> reorganized them into 1 project (following NiFi's
>>> > > >> multi-module
>>> > > >>>>> maven
>>> > > >>>>>>> build)
>>> > > >>>>>>>>>> and life has become much easier!
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> -- Mike
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
>>> > > >>>>>>> [hidden email]
>>> > > >>>>>>>>>> wrote:
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>> I very much like the solution proposed by Bryan below.
>>> > > >> This
>>> > > >>>> would
>>> > > >>>>>>> allow
>>> > > >>>>>>>>>>> for a cleaner docker image as well, while still proving
>>> > > >> the
>>> > > >>>>>>> functionality
>>> > > >>>>>>>>>>> as needed. For sure, the extension registry will be
>>> > > >> great, but
>>> > > >>>> in
>>> > > >>>>>>> the mean
>>> > > >>>>>>>>>>> time this is an adequate mid step.
>>> > > >>>>>>>>>>>
>>> > > >>>>>>>>>>> Regards,
>>> > > >>>>>>>>>>> Chris
>>> > > >>>>>>>>>>>
>>> > > >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
>>> > > >> [hidden email]
>>> > > >>>>> ,
>>> > > >>>>>>> wrote:
>>> > > >>>>>>>>>>>> Long term I'd like to see the extension registry take
>>> > > >> form
>>> > > >>>> and
>>> > > >>>>>> have
>>> > > >>>>>>>>>>>> that be the solution (#3).
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>> In the more near term, we could separate all of the
>>> > > >> NARs,
>>> > > >>>>> except
>>> > > >>>>>>> for
>>> > > >>>>>>>>>>>> framework and maybe standard processors & services,
>>> > > >> into a
>>> > > >>>>>> separate
>>> > > >>>>>>>>>>>> git repo.
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>> In that new git repo we could organize things like Joe
>>> > > >> N just
>>> > > >>>>>>>>>>>> described according to some kind of functional
>>> > > >> grouping. Each
>>> > > >>>>> of
>>> > > >>>>>>> these
>>> > > >>>>>>>>>>>> functional bundles could produce its own tar/zip which
>>> > > >> we can
>>> > > >>>>>> make
>>> > > >>>>>>>>>>>> available for download.
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>> That would separate the release cycles between core
>>> > > >> NiFi and
>>> > > >>>>> the
>>> > > >>>>>>> other
>>> > > >>>>>>>>>>>> NARs, and also avoid having any single binary artifact
>>> > > >> that
>>> > > >>>>> gets
>>> > > >>>>>>> too
>>> > > >>>>>>>>>>>> large.
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>>
>>> > > >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
>>> > > >>>>>>> [hidden email]
>>> > > >>>>>>>>>>> wrote:
>>> > > >>>>>>>>>>>>> just a random thought.
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
>>> > > >> package for
>>> > > >>>>>>> example
>>> > > >>>>>>>>>>> that
>>> > > >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
>>> > > >> for
>>> > > >>>>> Cloud,
>>> > > >>>>>> or
>>> > > >>>>>>>>>>> Database
>>> > > >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
>>> > > >>>> defining
>>> > > >>>>>>> these
>>> > > >>>>>>>>>>> groups
>>> > > >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
>>> > > >>>> installer
>>> > > >>>>>>> which
>>> > > >>>>>>>>>>> allows
>>> > > >>>>>>>>>>>>> you to elect which packages to download to add to
>>> > > >> the slim
>>> > > >>>>>>> install?
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
>>> > > >>>>> [hidden email]
>>> > > >>>>>>> wrote:
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> Team,
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
>>> > > >> grown
>>> > > >>>> to
>>> > > >>>>>>> 1.1GB now
>>> > > >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
>>> > > >> 1.6GB
>>> > > >>>>>>> allowance
>>> > > >>>>>>>>>>>>>> for us but has stated this is the last time.
>>> > > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> We need consider:
>>> > > >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
>>> > > >>>>> particularly
>>> > > >>>>>>> massive
>>> > > >>>>>>>>>>>>>> nars from the assembly we distribute by default.
>>> > > >> Folks
>>> > > >>>> can
>>> > > >>>>>>> still use
>>> > > >>>>>>>>>>>>>> these things if they want just not from our
>>> > > >> convenience
>>> > > >>>>>> binary
>>> > > >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
>>> > > >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
>>> > > >> Flow
>>> > > >>>>>> Registry
>>> > > >>>>>>> then
>>> > > >>>>>>>>>>>>>> moving to separate releases for extension bundles.
>>> > > >> The
>>> > > >>>> main
>>> > > >>>>>>> release
>>> > > >>>>>>>>>>>>>> then would be just the NiFi framework.
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> Any other ideas ?
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> I'll plan to start identifying candiates for
>>> > > >> removal
>>> > > >>>> soon.
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>> Thanks
>>> > > >>>>>>>>>>>>>> Joe
>>> > > >>>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>>
>>> > > >>>>>>>>>>>>> --
>>> > > >>>>>>>>>>>>> Joseph
>>> > > >>>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>>>
>>> > > >>>>>>>>
>>> > > >>>>>>>
>>> > > >>>>>>
>>> > > >>>>>
>>> > > >>>>
>>> > > >>
>>> > >
>>> >
>>>
>>
>>
12