Core Components

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Core Components

Teresa Jackson
Hello everyone,

I'm reviewing the Apache-NiFi source code and would like to put together some architecture diagrams of the framework's core components. What's the required format for submission (UML, DODAF 2.0, et. al)? Also, what's the vetting process? And are there tools/approaches/processes that the board prefer be used?

Thanks,

Teresa Jackson
Onyx Consulting Services, LLC
Chief Engineer
Reply | Threaded
Open this post in threaded view
|

Re: Core Components

trkurc
Administrator
Teresa,
Glad you're interested in contributing. I suggest reading some of the
guides apache has [1] on what to expect when getting involved, which should
answer some of the questions about vetting and the board (which I inferred
to mean the PPMC)

Were you planning on doing this for developer documentation to get
developers up to speed more quickly [2]? Thus far the documentation has
been developed with asciidoc [3], I certainly had some degree of
expectation the developer guide to have followed this path also. Were you
expecting to build images from the UML or other tool to include in a guide?
Or were you thinking it may be useful to have UML outside the context of a
developer documentation guide?

[1] http://www.apache.org/foundation/getinvolved.html
[2] https://issues.apache.org/jira/browse/NIFI-152
[3]
http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/User-Guide-td46.html

Tony

On Mon, Jan 12, 2015 at 8:05 PM, Teresa Jackson <[hidden email]>
wrote:

> Hello everyone,
>
> I'm reviewing the Apache-NiFi source code and would like to put together
> some architecture diagrams of the framework's core components. What's the
> required format for submission (UML, DODAF 2.0, et. al)? Also, what's the
> vetting process? And are there tools/approaches/processes that the board
> prefer be used?
>
> Thanks,
>
> Teresa Jackson
> Onyx Consulting Services, LLC
> Chief Engineer
>
Reply | Threaded
Open this post in threaded view
|

Re: Core Components

Teresa Jackson
Hi Tony,

Not exactly, the primary target audience for this architecture view would be for business executives and system architects.  I'm targeting folks who are looking for an enterprise view or who are seeking to understand and come up to speed on how the Core Framework works.

And the docs that I have seen and reviewed to date don't seem to fit that particular target audience.

Also, what's the process to propose a design idea for the Core Framework?  In reviewing some of the source code, I didn't see any software packages that supported metrics needs.

I'd like to propose an addition or enhancement be made to the Core to support volume management, trend analysis by way of databasing attributes and content so that it is query-able and made available for display. This information would then be used for statistical roll ups, metrics, trend analysis, etc..

Ideally, we'd do it by capturing running totals by receiving copies of local provenance events.  This component would be like local provenance in that it would retain the data for some configurable period of time, based on the amount of disk space allocated for that process.  In addition, these roll ups could be sent somewhere for even longer retention.

The goal is to keep as many hooks as possible to making it possible for other programs/services to ingest both the local provenance logs, and the rolled up summaries.  There's a growing base of people who are comfortable with NIFI graphs, and local provenance, so I think that it makes sense to build off that.

The issue I'm facing is that Provenance is fine for tracking one file if you have a starting point, but it is not designed to do counting, summarization and correlation of data. And it doesn't support advanced queries.  

Here are some of the most immediate and pressing use cases for this design.

1.  How much traffic came in yesterday (or last week)?
2. Provide statistical counts on items of interest within a flow for a given flow/date range.
3.  When was the last file sent to "System X"?
4. Did anything get sent to "System Y"?
5. How much data was marked with a certain tag?
6. How much data was scanned?
7. How much data was detected?
8. How much of a particular type of data was received in bytes?
9. How much data was processed by file count?

Another thought:

This might also be a good place for hooking streaming services where you can deal with the raw events and then summarize/aggregate when things go by.

I'm completely new to this process so I don't know if basic concept proposals of this sort should come in the form of an architecture diagram or simply in plain English.

Thanks,

Teresa Jackson
Onyx Consulting Services, LLC
Chief Engineer

________________________________________
From: Tony Kurc <[hidden email]>
Sent: Monday, January 12, 2015 10:28 PM
To: [hidden email]
Subject: Re: Core Components

Teresa,
Glad you're interested in contributing. I suggest reading some of the
guides apache has [1] on what to expect when getting involved, which should
answer some of the questions about vetting and the board (which I inferred
to mean the PPMC)

Were you planning on doing this for developer documentation to get
developers up to speed more quickly [2]? Thus far the documentation has
been developed with asciidoc [3], I certainly had some degree of
expectation the developer guide to have followed this path also. Were you
expecting to build images from the UML or other tool to include in a guide?
Or were you thinking it may be useful to have UML outside the context of a
developer documentation guide?

[1] http://www.apache.org/foundation/getinvolved.html
[2] https://issues.apache.org/jira/browse/NIFI-152
[3]
http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/User-Guide-td46.html

Tony

On Mon, Jan 12, 2015 at 8:05 PM, Teresa Jackson <[hidden email]>
wrote:

> Hello everyone,
>
> I'm reviewing the Apache-NiFi source code and would like to put together
> some architecture diagrams of the framework's core components. What's the
> required format for submission (UML, DODAF 2.0, et. al)? Also, what's the
> vetting process? And are there tools/approaches/processes that the board
> prefer be used?
>
> Thanks,
>
> Teresa Jackson
> Onyx Consulting Services, LLC
> Chief Engineer
>
Reply | Threaded
Open this post in threaded view
|

Re: Core Components

Joe Witt
Teresa,

Architecture diagrams:
For the documents you suggest I think it is one of those things you'll just
have to show what you mean.  It isn't really clear what role UML or other
such diagrams could play.  As a general statement for anything you want to
contribute it will just be good to have in mind how you think it will help
the project.  One of the things we can do as a community is document a set
of goals for the project so that folks understand where/how they can
contribute or where to suggest an expansion of goals

Provenance/Reporting:
For the suggestions on provenance I agree this is an important area of
focus.  I'll respond more completely on your ticket NIFI-252.  For this
thread here though I'll just mention that the design does indeed already
provide for these things to occur.  We just haven't gotten to
implementation of that vision.  We also need to be very cognizant of where
the lanes should end.  There are a lot of great solutions out there to take
in the sorts of data we expose that would do a great job of
ingesting/indexing/querying/analyzing.  We need to be complimentary to
those things and build in 'just enough' to NiFi for parts that are unique
to it.  Your additional thought seems to be along these lines too so that
is great.

Thanks
Joe

On Tue, Jan 13, 2015 at 12:41 PM, Teresa Jackson <[hidden email]>
wrote:

> Hi Tony,
>
> Not exactly, the primary target audience for this architecture view would
> be for business executives and system architects.  I'm targeting folks who
> are looking for an enterprise view or who are seeking to understand and
> come up to speed on how the Core Framework works.
>
> And the docs that I have seen and reviewed to date don't seem to fit that
> particular target audience.
>
> Also, what's the process to propose a design idea for the Core Framework?
> In reviewing some of the source code, I didn't see any software packages
> that supported metrics needs.
>
> I'd like to propose an addition or enhancement be made to the Core to
> support volume management, trend analysis by way of databasing attributes
> and content so that it is query-able and made available for display. This
> information would then be used for statistical roll ups, metrics, trend
> analysis, etc..
>
> Ideally, we'd do it by capturing running totals by receiving copies of
> local provenance events.  This component would be like local provenance in
> that it would retain the data for some configurable period of time, based
> on the amount of disk space allocated for that process.  In addition, these
> roll ups could be sent somewhere for even longer retention.
>
> The goal is to keep as many hooks as possible to making it possible for
> other programs/services to ingest both the local provenance logs, and the
> rolled up summaries.  There's a growing base of people who are comfortable
> with NIFI graphs, and local provenance, so I think that it makes sense to
> build off that.
>
> The issue I'm facing is that Provenance is fine for tracking one file if
> you have a starting point, but it is not designed to do counting,
> summarization and correlation of data. And it doesn't support advanced
> queries.
>
> Here are some of the most immediate and pressing use cases for this design.
>
> 1.  How much traffic came in yesterday (or last week)?
> 2. Provide statistical counts on items of interest within a flow for a
> given flow/date range.
> 3.  When was the last file sent to "System X"?
> 4. Did anything get sent to "System Y"?
> 5. How much data was marked with a certain tag?
> 6. How much data was scanned?
> 7. How much data was detected?
> 8. How much of a particular type of data was received in bytes?
> 9. How much data was processed by file count?
>
> Another thought:
>
> This might also be a good place for hooking streaming services where you
> can deal with the raw events and then summarize/aggregate when things go by.
>
> I'm completely new to this process so I don't know if basic concept
> proposals of this sort should come in the form of an architecture diagram
> or simply in plain English.
>
> Thanks,
>
> Teresa Jackson
> Onyx Consulting Services, LLC
> Chief Engineer
>
> ________________________________________
> From: Tony Kurc <[hidden email]>
> Sent: Monday, January 12, 2015 10:28 PM
> To: [hidden email]
> Subject: Re: Core Components
>
> Teresa,
> Glad you're interested in contributing. I suggest reading some of the
> guides apache has [1] on what to expect when getting involved, which should
> answer some of the questions about vetting and the board (which I inferred
> to mean the PPMC)
>
> Were you planning on doing this for developer documentation to get
> developers up to speed more quickly [2]? Thus far the documentation has
> been developed with asciidoc [3], I certainly had some degree of
> expectation the developer guide to have followed this path also. Were you
> expecting to build images from the UML or other tool to include in a guide?
> Or were you thinking it may be useful to have UML outside the context of a
> developer documentation guide?
>
> [1] http://www.apache.org/foundation/getinvolved.html
> [2] https://issues.apache.org/jira/browse/NIFI-152
> [3]
>
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/User-Guide-td46.html
>
> Tony
>
> On Mon, Jan 12, 2015 at 8:05 PM, Teresa Jackson <[hidden email]
> >
> wrote:
>
> > Hello everyone,
> >
> > I'm reviewing the Apache-NiFi source code and would like to put together
> > some architecture diagrams of the framework's core components. What's the
> > required format for submission (UML, DODAF 2.0, et. al)? Also, what's the
> > vetting process? And are there tools/approaches/processes that the board
> > prefer be used?
> >
> > Thanks,
> >
> > Teresa Jackson
> > Onyx Consulting Services, LLC
> > Chief Engineer
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Core Components

Teresa Jackson
Joe,

What would you think about adding a publish/subscriber interface to the Provenance repo?  That way a client can subscribe to events in real time to do the necessary trending and metrics reporting.

Conceptually, here's what I'm thinking.

The local provenance repository pushes to a client every time an event occurs.

The service that sends events to this client in real time would also provide the flowfile associated with that event.

<Do note that all processing past this point falls outside the Core, I'm just adding it for contextual background.>

The client sends things to a database somewhere, then the "Aggregator" Processor can get back a list of attributes that the user selected via that processor's configuration.  That processor would then generate a flowFile whose content is the set of metrics, that flowFile can be sent wherever for additional analysis.  Another processor would be used to support multiple data formats (XML, JSON, proprietary formats, whatever).

I'm not too wrapped up in whether the 'Aggregator' has an output format attribute that can be set, or if it outputs a flowfile that the CreateFormatProcessor ingests, and that processor has the output format attribute.  

The point is that if this feature were implemented with a Publish/Subscriber interface, then aggregation, summarization, and correlation of data for trending can occur.

The overall flow would look something like this:

Local NiFi Provenance repo -> Client -> Aggregator -> PutProcessor

Thanks,

Teresa Jackson
Onyx Consulting Services, LLC
Chief Engineer

________________________________________
From: Joe Witt <[hidden email]>
Sent: Tuesday, January 13, 2015 5:31 PM
To: [hidden email]
Subject: Re: Core Components

Teresa,

Architecture diagrams:
For the documents you suggest I think it is one of those things you'll just
have to show what you mean.  It isn't really clear what role UML or other
such diagrams could play.  As a general statement for anything you want to
contribute it will just be good to have in mind how you think it will help
the project.  One of the things we can do as a community is document a set
of goals for the project so that folks understand where/how they can
contribute or where to suggest an expansion of goals

Provenance/Reporting:
For the suggestions on provenance I agree this is an important area of
focus.  I'll respond more completely on your ticket NIFI-252.  For this
thread here though I'll just mention that the design does indeed already
provide for these things to occur.  We just haven't gotten to
implementation of that vision.  We also need to be very cognizant of where
the lanes should end.  There are a lot of great solutions out there to take
in the sorts of data we expose that would do a great job of
ingesting/indexing/querying/analyzing.  We need to be complimentary to
those things and build in 'just enough' to NiFi for parts that are unique
to it.  Your additional thought seems to be along these lines too so that
is great.

Thanks
Joe

On Tue, Jan 13, 2015 at 12:41 PM, Teresa Jackson <[hidden email]>
wrote:

> Hi Tony,
>
> Not exactly, the primary target audience for this architecture view would
> be for business executives and system architects.  I'm targeting folks who
> are looking for an enterprise view or who are seeking to understand and
> come up to speed on how the Core Framework works.
>
> And the docs that I have seen and reviewed to date don't seem to fit that
> particular target audience.
>
> Also, what's the process to propose a design idea for the Core Framework?
> In reviewing some of the source code, I didn't see any software packages
> that supported metrics needs.
>
> I'd like to propose an addition or enhancement be made to the Core to
> support volume management, trend analysis by way of databasing attributes
> and content so that it is query-able and made available for display. This
> information would then be used for statistical roll ups, metrics, trend
> analysis, etc..
>
> Ideally, we'd do it by capturing running totals by receiving copies of
> local provenance events.  This component would be like local provenance in
> that it would retain the data for some configurable period of time, based
> on the amount of disk space allocated for that process.  In addition, these
> roll ups could be sent somewhere for even longer retention.
>
> The goal is to keep as many hooks as possible to making it possible for
> other programs/services to ingest both the local provenance logs, and the
> rolled up summaries.  There's a growing base of people who are comfortable
> with NIFI graphs, and local provenance, so I think that it makes sense to
> build off that.
>
> The issue I'm facing is that Provenance is fine for tracking one file if
> you have a starting point, but it is not designed to do counting,
> summarization and correlation of data. And it doesn't support advanced
> queries.
>
> Here are some of the most immediate and pressing use cases for this design.
>
> 1.  How much traffic came in yesterday (or last week)?
> 2. Provide statistical counts on items of interest within a flow for a
> given flow/date range.
> 3.  When was the last file sent to "System X"?
> 4. Did anything get sent to "System Y"?
> 5. How much data was marked with a certain tag?
> 6. How much data was scanned?
> 7. How much data was detected?
> 8. How much of a particular type of data was received in bytes?
> 9. How much data was processed by file count?
>
> Another thought:
>
> This might also be a good place for hooking streaming services where you
> can deal with the raw events and then summarize/aggregate when things go by.
>
> I'm completely new to this process so I don't know if basic concept
> proposals of this sort should come in the form of an architecture diagram
> or simply in plain English.
>
> Thanks,
>
> Teresa Jackson
> Onyx Consulting Services, LLC
> Chief Engineer
>
> ________________________________________
> From: Tony Kurc <[hidden email]>
> Sent: Monday, January 12, 2015 10:28 PM
> To: [hidden email]
> Subject: Re: Core Components
>
> Teresa,
> Glad you're interested in contributing. I suggest reading some of the
> guides apache has [1] on what to expect when getting involved, which should
> answer some of the questions about vetting and the board (which I inferred
> to mean the PPMC)
>
> Were you planning on doing this for developer documentation to get
> developers up to speed more quickly [2]? Thus far the documentation has
> been developed with asciidoc [3], I certainly had some degree of
> expectation the developer guide to have followed this path also. Were you
> expecting to build images from the UML or other tool to include in a guide?
> Or were you thinking it may be useful to have UML outside the context of a
> developer documentation guide?
>
> [1] http://www.apache.org/foundation/getinvolved.html
> [2] https://issues.apache.org/jira/browse/NIFI-152
> [3]
>
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/User-Guide-td46.html
>
> Tony
>
> On Mon, Jan 12, 2015 at 8:05 PM, Teresa Jackson <[hidden email]
> >
> wrote:
>
> > Hello everyone,
> >
> > I'm reviewing the Apache-NiFi source code and would like to put together
> > some architecture diagrams of the framework's core components. What's the
> > required format for submission (UML, DODAF 2.0, et. al)? Also, what's the
> > vetting process? And are there tools/approaches/processes that the board
> > prefer be used?
> >
> > Thanks,
> >
> > Teresa Jackson
> > Onyx Consulting Services, LLC
> > Chief Engineer
> >
>