Proposing NiFi-Fn

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposing NiFi-Fn

Samuel Hjelmfelt
 
Hello,

I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities.

 

Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked.

 

Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases.
   
   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails
   
   - For example, ConsumeKafka acks beforedownstream processing even starts.
   - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures.    

   - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides:
   
   - Increases cost significantly as throughput requirements rise(especially in the cloud)
   - Increases HA complexity, because the state on each node must bedurable
   
   - e.g. content repository replicationsimilar to Kafka is a common ask to improve this
   
   - Reduces flexibility, because data has to be migrated off of nodesto scale down
   
   - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down.
   - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends)
   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce)
   
   - Flow-specific error handling isrequired (such as this processor group)
   
   - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily.
   - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr).
   
   - Job/task oriented data movement usecases do not fit well with NiFi
   
   - For example: triggering data movement as part of a scheduler job
   
   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users.
   
   - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts.
   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option
   
   -  NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters
   
   - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide.
   - The items listed in #1 make this even more challenging to accommodate than it would be otherwise.  


NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows.




By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed.




An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a container engine or FaaS platform. A newmonitoring interface could then be provided in the NiFi UI for thesejob-oriented workloads.




Potential NiFi-Fn run-times include:
   
   - Java (done)
   - Docker (done)
   - OpenWhisk
   
   - Java (done)
   - Custom (done)
   
   - YARN (done)
   - Kubernetes (TODO)
   - AWS Lambda (TODO)
   - Azure Functions (TODO)
   - Google Cloud Functions (TODO)
   - Oracle Fn (TODO)
   - CloudFoundry (TODO)
   - NiFi custom processor (TODO)
   - NiFi jobs runtime (TODO)

 

The core of NiFi-Fn is complete,but it could use some improved testing, more run-times, and better reporting forlogs, metrics, and provenance.

 

 

Sam Hjelmfelt

Principal Software Engineer

Hortonworks

Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Andy LoPresto-2
Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share the code you referenced below so people can take a look. Do you have a preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this, I think (if it moves forward), the standard platform choices would apply. Thanks.


Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt <[hidden email]> wrote:
>
>
> Hello,
>
> I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities.
>
>  
>
> Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked.
>
>  
>
> Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases.
>
>   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails
>
>   - For example, ConsumeKafka acks beforedownstream processing even starts.
>   - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures.    
>
>   - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides:
>
>   - Increases cost significantly as throughput requirements rise(especially in the cloud)
>   - Increases HA complexity, because the state on each node must bedurable
>
>   - e.g. content repository replicationsimilar to Kafka is a common ask to improve this
>
>   - Reduces flexibility, because data has to be migrated off of nodesto scale down
>
>   - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down.
>   - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends)
>   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce)
>
>   - Flow-specific error handling isrequired (such as this processor group)
>
>   - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily.
>   - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr).
>
>   - Job/task oriented data movement usecases do not fit well with NiFi
>
>   - For example: triggering data movement as part of a scheduler job
>
>   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users.
>
>   - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts.
>   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option
>
>   -  NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters
>
>   - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide.
>   - The items listed in #1 make this even more challenging to accommodate than it would be otherwise.  
>
>
> NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows.
>
>
>
>
> By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed.
>
>
>
>
> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a container engine or FaaS platform. A newmonitoring interface could then be provided in the NiFi UI for thesejob-oriented workloads.
>
>
>
>
> Potential NiFi-Fn run-times include:
>
>   - Java (done)
>   - Docker (done)
>   - OpenWhisk
>
>   - Java (done)
>   - Custom (done)
>
>   - YARN (done)
>   - Kubernetes (TODO)
>   - AWS Lambda (TODO)
>   - Azure Functions (TODO)
>   - Google Cloud Functions (TODO)
>   - Oracle Fn (TODO)
>   - CloudFoundry (TODO)
>   - NiFi custom processor (TODO)
>   - NiFi jobs runtime (TODO)
>
>  
>
> The core of NiFi-Fn is complete,but it could use some improved testing, more run-times, and better reporting forlogs, metrics, and provenance.
>
>  
>
>  
>
> Sam Hjelmfelt
>
> Principal Software Engineer
>
> Hortonworks
>

Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Samuel Hjelmfelt
Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image on docker hub. Here are the links:
https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241 https://hub.docker.com/r/samhjelmfelt/nifi-fn
I am open to communication on any platform.
Thanks,
Sam Hjelmfelt
 

    On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <[hidden email]> wrote:  
 
 Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share the code you referenced below so people can take a look. Do you have a preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this, I think (if it moves forward), the standard platform choices would apply. Thanks.


Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt <[hidden email]> wrote:
>
>
> Hello,
>
> I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities.
>

>
> Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked.
>

>
> Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases.
>
>  - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails
>
>  - For example, ConsumeKafka acks beforedownstream processing even starts.
>  - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures.   
>
>  - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides:
>
>  - Increases cost significantly as throughput requirements rise(especially in the cloud)
>  - Increases HA complexity, because the state on each node must bedurable
>
>  - e.g. content repository replicationsimilar to Kafka is a common ask to improve this
>
>  - Reduces flexibility, because data has to be migrated off of nodesto scale down
>
>  - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down.
>  - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends)
>  - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce)
>
>  - Flow-specific error handling isrequired (such as this processor group)
>
>  - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily.
>  - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr).
>
>  - Job/task oriented data movement usecases do not fit well with NiFi
>
>  - For example: triggering data movement as part of a scheduler job
>
>  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users.
>
>  - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts.
>  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option
>
>  -  NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters
>
>  - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide.
>  - The items listed in #1 make this even more challenging to accommodate than it would be otherwise. 
>
>
> NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows.
>
>
>
>
> By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed.
>
>
>
>
> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a container engine or FaaS platform. A newmonitoring interface could then be provided in the NiFi UI for thesejob-oriented workloads.
>
>
>
>
> Potential NiFi-Fn run-times include:
>
>  - Java (done)
>  - Docker (done)
>  - OpenWhisk
>
>  - Java (done)
>  - Custom (done)
>
>  - YARN (done)
>  - Kubernetes (TODO)
>  - AWS Lambda (TODO)
>  - Azure Functions (TODO)
>  - Google Cloud Functions (TODO)
>  - Oracle Fn (TODO)
>  - CloudFoundry (TODO)
>  - NiFi custom processor (TODO)
>  - NiFi jobs runtime (TODO)
>

>
> The core of NiFi-Fn is complete,but it could use some improved testing, more run-times, and better reporting forlogs, metrics, and provenance.
>

>

>
> Sam Hjelmfelt
>
> Principal Software Engineer
>
> Hortonworks
>
 
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Otto Fowler
This is really cool.
Is there a design document to reference?   Any diagrams?  I don’t remember
clearly if Nifi requires or prefers javadoc or not, but it would help to
have those things I think.



On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
[hidden email]) wrote:

Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
on docker hub. Here are the links:
https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
https://hub.docker.com/r/samhjelmfelt/nifi-fn
I am open to communication on any platform.
Thanks,
Sam Hjelmfelt


On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
[hidden email]> wrote:

Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share
the code you referenced below so people can take a look. Do you have a
preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
there is more discussion from the community on this, I think (if it moves
forward), the standard platform choices would apply. Thanks.


Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
<[hidden email]> wrote:
>
>
> Hello,
>
> I have not been very active on theNiFi mailing lists, but I have been
working with NiFi for several years acrossdozens of companies. I have a
great appreciation for NiFi’s value in real-worldscenarios. Its growth over
the last few years has been very impressive, and Iwould like to see a
further expansion of NiFi’s capabilities.
>
>
>
> Over the last few months, I have beenworking on a new NiFi run-time to
address some of the limitation that I haveseen in the field. Its intent is
not to replace the existing NiFi engine, butrather to extend the possible
applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
alternate run-time that expands NiFi’s reach tocloud scale. Given the
similarities, MagNiFi might have been a bettername, but it was already
trademarked.
>
>
>
> Here are some of the limitations thatI have seen in the field. In many
cases, there are entirely valid reasons forthis behavior, but this behavior
also prevents NiFi from being used for certainuse cases.
>
>  - NiFi flows do not succeed or fail as a unit. Part of a flow can
succeed while the other part fails
>
>  - For example, ConsumeKafka acks beforedownstream processing even
starts.
>  - Given this behavior, data deliveryguarantees require writing all
incoming data to local disk in order to handlenode failures.
>
>  - While this helps to accommodate non-resilient sources (e.g.TCP), it
has downsides:
>
>  - Increases cost significantly as throughput requirements
rise(especially in the cloud)
>  - Increases HA complexity, because the state on each node must bedurable
>
>  - e.g. content repository replicationsimilar to Kafka is a common ask to
improve this
>
>  - Reduces flexibility, because data has to be migrated off of nodesto
scale down
>
>  - NiFi environments must be sized forthe peak expected volumes given the
complexity of scaling up and down.
>  - Resources are wasted when use caseshave periods of lower volume (such
as overnight or on weekends)
>  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
Sqoop (i.e. MapReduce)
>
>  - Flow-specific error handling isrequired (such as this processor group)
>
>  - NiFi’s content repository is now the source of truth and the
flowcannot be restarted easily.
>  - This is useful for multi-destination flows, because errors can
behandled individually, but unnecessary in other cases (e.g. Kafka to
Solr).
>
>  - Job/task oriented data movement usecases do not fit well with NiFi
>
>  - For example: triggering data movement as part of a scheduler job
>
>  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
spark ETL job to loadit into Hive, then run a report and send it to users.
>
>  - In every other way, NiFi fits this use case. It just needs a
joboriented interface/runtime that returns success or fail and allows
fortimeouts.
>  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
but it should be a first class runtime option
>
>  -  NiFi does not provide resource controls for multi-tenancy, requiring
organizations to have multiple clusters
>
>  - Granular authorization policies are possible, but there are no
resource usage policies such as what YARN and other container engines
provide.
>  - The items listed in #1 make this even more challenging to accommodate
than it would be otherwise.
>
>
> NiFi-Fn is a library for running NiFiflows as stateless functions. It
provides similar delivery guarantees as NiFiwithout the need for on-disk
repositories by waiting to confirm receipt ofincoming data until it has
been written to the destination. This is similar toStorm’s acking mechanism
and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
this is completely handled by the framework while stillsupporting all NiFi
processors and controller services natively without change.This results in
the ability to run NiFi flows as ephemeral, stateless functionsand should
be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
and scalability while leveraging the vast library of NiFiprocessors and the
NiFi UI for building custom flows.
>
>
>
>
> By leveraging container engines (e.g.YARN, Kubernetes), long-running
NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
scale and multi-tenancy features. By leveragingFunction as a Service
engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
attached to event sources (or just cron) for event-drivendata movement
where flows only run when triggered and pricing is measured atthe 100ms
granularity. By combining the two, large-scale batch processing couldalso
be performed.
>
>
>
>
> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
could provide a clean solution for aNiFi jobs interface. A user could
select a run-time on a per process group basisto take advantage of the
NiFi-Fn efficiency and job-like execution whenappropriate without requiring
a container engine or FaaS platform. A newmonitoring interface could then
be provided in the NiFi UI for thesejob-oriented workloads.

>
>
>
>
> Potential NiFi-Fn run-times include:
>
>  - Java (done)
>  - Docker (done)
>  - OpenWhisk
>
>  - Java (done)
>  - Custom (done)
>
>  - YARN (done)
>  - Kubernetes (TODO)
>  - AWS Lambda (TODO)
>  - Azure Functions (TODO)
>  - Google Cloud Functions (TODO)
>  - Oracle Fn (TODO)
>  - CloudFoundry (TODO)
>  - NiFi custom processor (TODO)
>  - NiFi jobs runtime (TODO)
>
>
>
> The core of NiFi-Fn is complete,but it could use some improved testing,
more run-times, and better reporting forlogs, metrics, and provenance.

>
>
>
>
>
> Sam Hjelmfelt
>
> Principal Software Engineer
>
> Hortonworks
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Joe Witt
Sam,

This is clearly a really awesome idea.  It addresses certain use cases
for which NiFi's default model is heavy handed yet allows the
ecosystem of components to port naturally.  It allows versioned flows
designed in one environment to also easily run in another more
functional style/environment.  Really slick stuff.

Thanks
Joe

On Thu, Jan 3, 2019 at 6:21 PM Otto Fowler <[hidden email]> wrote:

>
> This is really cool.
> Is there a design document to reference?   Any diagrams?  I don’t remember
> clearly if Nifi requires or prefers javadoc or not, but it would help to
> have those things I think.
>
>
>
> On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
> [hidden email]) wrote:
>
> Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
> on docker hub. Here are the links:
> https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
> https://hub.docker.com/r/samhjelmfelt/nifi-fn
> I am open to communication on any platform.
> Thanks,
> Sam Hjelmfelt
>
>
> On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
> [hidden email]> wrote:
>
> Hi Sam,
>
> Thanks for writing all this up. I’m wondering if you are prepared to share
> the code you referenced below so people can take a look. Do you have a
> preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
> there is more discussion from the community on this, I think (if it moves
> forward), the standard platform choices would apply. Thanks.
>
>
> Andy LoPresto
> [hidden email]
> [hidden email]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
> <[hidden email]> wrote:
> >
> >
> > Hello,
> >
> > I have not been very active on theNiFi mailing lists, but I have been
> working with NiFi for several years acrossdozens of companies. I have a
> great appreciation for NiFi’s value in real-worldscenarios. Its growth over
> the last few years has been very impressive, and Iwould like to see a
> further expansion of NiFi’s capabilities.
> >
> >
> >
> > Over the last few months, I have beenworking on a new NiFi run-time to
> address some of the limitation that I haveseen in the field. Its intent is
> not to replace the existing NiFi engine, butrather to extend the possible
> applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
> alternate run-time that expands NiFi’s reach tocloud scale. Given the
> similarities, MagNiFi might have been a bettername, but it was already
> trademarked.
> >
> >
> >
> > Here are some of the limitations thatI have seen in the field. In many
> cases, there are entirely valid reasons forthis behavior, but this behavior
> also prevents NiFi from being used for certainuse cases.
> >
> >  - NiFi flows do not succeed or fail as a unit. Part of a flow can
> succeed while the other part fails
> >
> >  - For example, ConsumeKafka acks beforedownstream processing even
> starts.
> >  - Given this behavior, data deliveryguarantees require writing all
> incoming data to local disk in order to handlenode failures.
> >
> >  - While this helps to accommodate non-resilient sources (e.g.TCP), it
> has downsides:
> >
> >  - Increases cost significantly as throughput requirements
> rise(especially in the cloud)
> >  - Increases HA complexity, because the state on each node must bedurable
> >
> >  - e.g. content repository replicationsimilar to Kafka is a common ask to
> improve this
> >
> >  - Reduces flexibility, because data has to be migrated off of nodesto
> scale down
> >
> >  - NiFi environments must be sized forthe peak expected volumes given the
> complexity of scaling up and down.
> >  - Resources are wasted when use caseshave periods of lower volume (such
> as overnight or on weekends)
> >  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
> Sqoop (i.e. MapReduce)
> >
> >  - Flow-specific error handling isrequired (such as this processor group)
> >
> >  - NiFi’s content repository is now the source of truth and the
> flowcannot be restarted easily.
> >  - This is useful for multi-destination flows, because errors can
> behandled individually, but unnecessary in other cases (e.g. Kafka to
> Solr).
> >
> >  - Job/task oriented data movement usecases do not fit well with NiFi
> >
> >  - For example: triggering data movement as part of a scheduler job
> >
> >  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
> spark ETL job to loadit into Hive, then run a report and send it to users.
> >
> >  - In every other way, NiFi fits this use case. It just needs a
> joboriented interface/runtime that returns success or fail and allows
> fortimeouts.
> >  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
> but it should be a first class runtime option
> >
> >  -  NiFi does not provide resource controls for multi-tenancy, requiring
> organizations to have multiple clusters
> >
> >  - Granular authorization policies are possible, but there are no
> resource usage policies such as what YARN and other container engines
> provide.
> >  - The items listed in #1 make this even more challenging to accommodate
> than it would be otherwise.
> >
> >
> > NiFi-Fn is a library for running NiFiflows as stateless functions. It
> provides similar delivery guarantees as NiFiwithout the need for on-disk
> repositories by waiting to confirm receipt ofincoming data until it has
> been written to the destination. This is similar toStorm’s acking mechanism
> and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
> this is completely handled by the framework while stillsupporting all NiFi
> processors and controller services natively without change.This results in
> the ability to run NiFi flows as ephemeral, stateless functionsand should
> be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
> and scalability while leveraging the vast library of NiFiprocessors and the
> NiFi UI for building custom flows.
> >
> >
> >
> >
> > By leveraging container engines (e.g.YARN, Kubernetes), long-running
> NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
> scale and multi-tenancy features. By leveragingFunction as a Service
> engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
> attached to event sources (or just cron) for event-drivendata movement
> where flows only run when triggered and pricing is measured atthe 100ms
> granularity. By combining the two, large-scale batch processing couldalso
> be performed.
> >
> >
> >
> >
> > An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
> could provide a clean solution for aNiFi jobs interface. A user could
> select a run-time on a per process group basisto take advantage of the
> NiFi-Fn efficiency and job-like execution whenappropriate without requiring
> a container engine or FaaS platform. A newmonitoring interface could then
> be provided in the NiFi UI for thesejob-oriented workloads.
> >
> >
> >
> >
> > Potential NiFi-Fn run-times include:
> >
> >  - Java (done)
> >  - Docker (done)
> >  - OpenWhisk
> >
> >  - Java (done)
> >  - Custom (done)
> >
> >  - YARN (done)
> >  - Kubernetes (TODO)
> >  - AWS Lambda (TODO)
> >  - Azure Functions (TODO)
> >  - Google Cloud Functions (TODO)
> >  - Oracle Fn (TODO)
> >  - CloudFoundry (TODO)
> >  - NiFi custom processor (TODO)
> >  - NiFi jobs runtime (TODO)
> >
> >
> >
> > The core of NiFi-Fn is complete,but it could use some improved testing,
> more run-times, and better reporting forlogs, metrics, and provenance.
> >
> >
> >
> >
> >
> > Sam Hjelmfelt
> >
> > Principal Software Engineer
> >
> > Hortonworks
> >
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Mark Payne
In reply to this post by Samuel Hjelmfelt
Sam,

I love this idea, and I am all for it. I can definitely see how this could be useful both within the context of NiFi
itself and outside of NiFi as well. There has been quite a bit of talk of late, in both e-mail and the Slack channel
about users needing more ability to perform integration testing of flows, and I think this could also be a great
avenue to explore for better enabling that as well.

Thanks for putting this all together! I will certainly be interested to dig in more.

Thanks
-Mark

> On Jan 2, 2019, at 8:41 PM, Samuel Hjelmfelt <[hidden email]> wrote:
>
> Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image on docker hub. Here are the links:
> https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241 https://hub.docker.com/r/samhjelmfelt/nifi-fn
> I am open to communication on any platform.
> Thanks,
> Sam Hjelmfelt
>
>
>    On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <[hidden email]> wrote:  
>
> Hi Sam,
>
> Thanks for writing all this up. I’m wondering if you are prepared to share the code you referenced below so people can take a look. Do you have a preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once there is more discussion from the community on this, I think (if it moves forward), the standard platform choices would apply. Thanks.
>
>
> Andy LoPresto
> [hidden email]
> [hidden email]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt <[hidden email]> wrote:
>>
>>
>> Hello,
>>
>> I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities.
>>
>>  
>>
>> Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked.
>>
>>  
>>
>> Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases.
>>
>>   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails
>>
>>   - For example, ConsumeKafka acks beforedownstream processing even starts.
>>   - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures.    
>>
>>   - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides:
>>
>>   - Increases cost significantly as throughput requirements rise(especially in the cloud)
>>   - Increases HA complexity, because the state on each node must bedurable
>>
>>   - e.g. content repository replicationsimilar to Kafka is a common ask to improve this
>>
>>   - Reduces flexibility, because data has to be migrated off of nodesto scale down
>>
>>   - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down.
>>   - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends)
>>   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce)
>>
>>   - Flow-specific error handling isrequired (such as this processor group)
>>
>>   - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily.
>>   - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr).
>>
>>   - Job/task oriented data movement usecases do not fit well with NiFi
>>
>>   - For example: triggering data movement as part of a scheduler job
>>
>>   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users.
>>
>>   - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts.
>>   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option
>>
>>   -  NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters
>>
>>   - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide.
>>   - The items listed in #1 make this even more challenging to accommodate than it would be otherwise.  
>>
>>
>> NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows.
>>
>>
>>
>>
>> By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed.
>>
>>
>>
>>
>> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a container engine or FaaS platform. A newmonitoring interface could then be provided in the NiFi UI for thesejob-oriented workloads.
>>
>>
>>
>>
>> Potential NiFi-Fn run-times include:
>>
>>   - Java (done)
>>   - Docker (done)
>>   - OpenWhisk
>>
>>   - Java (done)
>>   - Custom (done)
>>
>>   - YARN (done)
>>   - Kubernetes (TODO)
>>   - AWS Lambda (TODO)
>>   - Azure Functions (TODO)
>>   - Google Cloud Functions (TODO)
>>   - Oracle Fn (TODO)
>>   - CloudFoundry (TODO)
>>   - NiFi custom processor (TODO)
>>   - NiFi jobs runtime (TODO)
>>
>>  
>>
>> The core of NiFi-Fn is complete,but it could use some improved testing, more run-times, and better reporting forlogs, metrics, and provenance.
>>
>>  
>>
>>  
>>
>> Sam Hjelmfelt
>>
>> Principal Software Engineer
>>
>> Hortonworks
>>

Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Samuel Hjelmfelt
In reply to this post by Otto Fowler
Hi Otto,Good point. There isn't much documentation right now.

Where is the best place to put it? I could create a nifi-fn/docs directory with md files, or I could create an ascii doc in the nifi-docs directory. I could also just expand the README if that is easiest in the short term.

-Sam

    On Thursday, January 3, 2019, 4:21:03 PM MST, Otto Fowler <[hidden email]> wrote:  
 
 This is really cool.
Is there a design document to reference?  Any diagrams?  I don’t remember
clearly if Nifi requires or prefers javadoc or not, but it would help to
have those things I think.



On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
[hidden email]) wrote:

Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
on docker hub. Here are the links:
https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
https://hub.docker.com/r/samhjelmfelt/nifi-fn
I am open to communication on any platform.
Thanks,
Sam Hjelmfelt


On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
[hidden email]> wrote:

Hi Sam,

Thanks for writing all this up. I’m wondering if you are prepared to share
the code you referenced below so people can take a look. Do you have a
preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
there is more discussion from the community on this, I think (if it moves
forward), the standard platform choices would apply. Thanks.


Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
<[hidden email]> wrote:
>
>
> Hello,
>
> I have not been very active on theNiFi mailing lists, but I have been
working with NiFi for several years acrossdozens of companies. I have a
great appreciation for NiFi’s value in real-worldscenarios. Its growth over
the last few years has been very impressive, and Iwould like to see a
further expansion of NiFi’s capabilities.
>
>
>
> Over the last few months, I have beenworking on a new NiFi run-time to
address some of the limitation that I haveseen in the field. Its intent is
not to replace the existing NiFi engine, butrather to extend the possible
applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
alternate run-time that expands NiFi’s reach tocloud scale. Given the
similarities, MagNiFi might have been a bettername, but it was already
trademarked.
>
>
>
> Here are some of the limitations thatI have seen in the field. In many
cases, there are entirely valid reasons forthis behavior, but this behavior
also prevents NiFi from being used for certainuse cases.
>
>  - NiFi flows do not succeed or fail as a unit. Part of a flow can
succeed while the other part fails
>
>  - For example, ConsumeKafka acks beforedownstream processing even
starts.
>  - Given this behavior, data deliveryguarantees require writing all
incoming data to local disk in order to handlenode failures.
>
>  - While this helps to accommodate non-resilient sources (e.g.TCP), it
has downsides:
>
>  - Increases cost significantly as throughput requirements
rise(especially in the cloud)
>  - Increases HA complexity, because the state on each node must bedurable
>
>  - e.g. content repository replicationsimilar to Kafka is a common ask to
improve this
>
>  - Reduces flexibility, because data has to be migrated off of nodesto
scale down
>
>  - NiFi environments must be sized forthe peak expected volumes given the
complexity of scaling up and down.
>  - Resources are wasted when use caseshave periods of lower volume (such
as overnight or on weekends)
>  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
Sqoop (i.e. MapReduce)
>
>  - Flow-specific error handling isrequired (such as this processor group)
>
>  - NiFi’s content repository is now the source of truth and the
flowcannot be restarted easily.
>  - This is useful for multi-destination flows, because errors can
behandled individually, but unnecessary in other cases (e.g. Kafka to
Solr).
>
>  - Job/task oriented data movement usecases do not fit well with NiFi
>
>  - For example: triggering data movement as part of a scheduler job
>
>  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
spark ETL job to loadit into Hive, then run a report and send it to users.
>
>  - In every other way, NiFi fits this use case. It just needs a
joboriented interface/runtime that returns success or fail and allows
fortimeouts.
>  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
but it should be a first class runtime option
>
>  -  NiFi does not provide resource controls for multi-tenancy, requiring
organizations to have multiple clusters
>
>  - Granular authorization policies are possible, but there are no
resource usage policies such as what YARN and other container engines
provide.
>  - The items listed in #1 make this even more challenging to accommodate
than it would be otherwise.
>
>
> NiFi-Fn is a library for running NiFiflows as stateless functions. It
provides similar delivery guarantees as NiFiwithout the need for on-disk
repositories by waiting to confirm receipt ofincoming data until it has
been written to the destination. This is similar toStorm’s acking mechanism
and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
this is completely handled by the framework while stillsupporting all NiFi
processors and controller services natively without change.This results in
the ability to run NiFi flows as ephemeral, stateless functionsand should
be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
and scalability while leveraging the vast library of NiFiprocessors and the
NiFi UI for building custom flows.
>
>
>
>
> By leveraging container engines (e.g.YARN, Kubernetes), long-running
NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
scale and multi-tenancy features. By leveragingFunction as a Service
engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
attached to event sources (or just cron) for event-drivendata movement
where flows only run when triggered and pricing is measured atthe 100ms
granularity. By combining the two, large-scale batch processing couldalso
be performed.
>
>
>
>
> An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
could provide a clean solution for aNiFi jobs interface. A user could
select a run-time on a per process group basisto take advantage of the
NiFi-Fn efficiency and job-like execution whenappropriate without requiring
a container engine or FaaS platform. A newmonitoring interface could then
be provided in the NiFi UI for thesejob-oriented workloads.

>
>
>
>
> Potential NiFi-Fn run-times include:
>
>  - Java (done)
>  - Docker (done)
>  - OpenWhisk
>
>  - Java (done)
>  - Custom (done)
>
>  - YARN (done)
>  - Kubernetes (TODO)
>  - AWS Lambda (TODO)
>  - Azure Functions (TODO)
>  - Google Cloud Functions (TODO)
>  - Oracle Fn (TODO)
>  - CloudFoundry (TODO)
>  - NiFi custom processor (TODO)
>  - NiFi jobs runtime (TODO)
>
>
>
> The core of NiFi-Fn is complete,but it could use some improved testing,
more run-times, and better reporting forlogs, metrics, and provenance.

>
>
>
>
>
> Sam Hjelmfelt
>
> Principal Software Engineer
>
> Hortonworks
>  
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Pierre Villard
Hey,

First of all, thanks for the proposal and PR, that looks awesome!

For the documentation, I'd suggest adding it into nifi-docs and choose one
of the two below options [1]:
- create a NiFi-Fn section (like General) so that the doc for this module
can be made of multiple pages
- create a single page in the 'General' section

I'm in favor of the first option because I see how we could have multiple
pages around this feature but not a strong opinion though.

Will definitely try to review and give it a try when I get a chance.

Pierre



Le lun. 7 janv. 2019 à 23:27, Samuel Hjelmfelt
<[hidden email]> a écrit :

> Hi Otto,Good point. There isn't much documentation right now.
>
> Where is the best place to put it? I could create a nifi-fn/docs directory
> with md files, or I could create an ascii doc in the nifi-docs directory. I
> could also just expand the README if that is easiest in the short term.
>
> -Sam
>
>     On Thursday, January 3, 2019, 4:21:03 PM MST, Otto Fowler <
> [hidden email]> wrote:
>
>  This is really cool.
> Is there a design document to reference?  Any diagrams?  I don’t remember
> clearly if Nifi requires or prefers javadoc or not, but it would help to
> have those things I think.
>
>
>
> On January 2, 2019 at 20:42:02, Samuel Hjelmfelt (
> [hidden email]) wrote:
>
> Hi Andy,I just submitted a JIRA and PR. I also put a pre-built docker image
> on docker hub. Here are the links:
>
> https://issues.apache.org/jira/browse/NIFI-5922https://github.com/apache/nifi/pull/3241
> https://hub.docker.com/r/samhjelmfelt/nifi-fn
> I am open to communication on any platform.
> Thanks,
> Sam Hjelmfelt
>
>
> On Wednesday, January 2, 2019, 6:27:02 PM MST, Andy LoPresto <
> [hidden email]> wrote:
>
> Hi Sam,
>
> Thanks for writing all this up. I’m wondering if you are prepared to share
> the code you referenced below so people can take a look. Do you have a
> preferred communication mechanism (GitHub issues, direct PRs, etc.?). Once
> there is more discussion from the community on this, I think (if it moves
> forward), the standard platform choices would apply. Thanks.
>
>
> Andy LoPresto
> [hidden email]
> [hidden email]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jan 2, 2019, at 5:04 PM, Samuel Hjelmfelt
> <[hidden email]> wrote:
> >
> >
> > Hello,
> >
> > I have not been very active on theNiFi mailing lists, but I have been
> working with NiFi for several years acrossdozens of companies. I have a
> great appreciation for NiFi’s value in real-worldscenarios. Its growth over
> the last few years has been very impressive, and Iwould like to see a
> further expansion of NiFi’s capabilities.
> >
> >
> >
> > Over the last few months, I have beenworking on a new NiFi run-time to
> address some of the limitation that I haveseen in the field. Its intent is
> not to replace the existing NiFi engine, butrather to extend the possible
> applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an
> alternate run-time that expands NiFi’s reach tocloud scale. Given the
> similarities, MagNiFi might have been a bettername, but it was already
> trademarked.
> >
> >
> >
> > Here are some of the limitations thatI have seen in the field. In many
> cases, there are entirely valid reasons forthis behavior, but this behavior
> also prevents NiFi from being used for certainuse cases.
> >
> >  - NiFi flows do not succeed or fail as a unit. Part of a flow can
> succeed while the other part fails
> >
> >  - For example, ConsumeKafka acks beforedownstream processing even
> starts.
> >  - Given this behavior, data deliveryguarantees require writing all
> incoming data to local disk in order to handlenode failures.
> >
> >  - While this helps to accommodate non-resilient sources (e.g.TCP), it
> has downsides:
> >
> >  - Increases cost significantly as throughput requirements
> rise(especially in the cloud)
> >  - Increases HA complexity, because the state on each node must bedurable
> >
> >  - e.g. content repository replicationsimilar to Kafka is a common ask to
> improve this
> >
> >  - Reduces flexibility, because data has to be migrated off of nodesto
> scale down
> >
> >  - NiFi environments must be sized forthe peak expected volumes given the
> complexity of scaling up and down.
> >  - Resources are wasted when use caseshave periods of lower volume (such
> as overnight or on weekends)
> >  - This improved in 1.8, but it isnowhere near as fluid as DistCp or
> Sqoop (i.e. MapReduce)
> >
> >  - Flow-specific error handling isrequired (such as this processor group)
> >
> >  - NiFi’s content repository is now the source of truth and the
> flowcannot be restarted easily.
> >  - This is useful for multi-destination flows, because errors can
> behandled individually, but unnecessary in other cases (e.g. Kafka to
> Solr).
> >
> >  - Job/task oriented data movement usecases do not fit well with NiFi
> >
> >  - For example: triggering data movement as part of a scheduler job
> >
> >  - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a
> spark ETL job to loadit into Hive, then run a report and send it to users.
> >
> >  - In every other way, NiFi fits this use case. It just needs a
> joboriented interface/runtime that returns success or fail and allows
> fortimeouts.
> >  - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs,
> but it should be a first class runtime option
> >
> >  -  NiFi does not provide resource controls for multi-tenancy, requiring
> organizations to have multiple clusters
> >
> >  - Granular authorization policies are possible, but there are no
> resource usage policies such as what YARN and other container engines
> provide.
> >  - The items listed in #1 make this even more challenging to accommodate
> than it would be otherwise.
> >
> >
> > NiFi-Fn is a library for running NiFiflows as stateless functions. It
> provides similar delivery guarantees as NiFiwithout the need for on-disk
> repositories by waiting to confirm receipt ofincoming data until it has
> been written to the destination. This is similar toStorm’s acking mechanism
> and Spark’s interface for committing Kafka offsets,except that in nifi-fn,
> this is completely handled by the framework while stillsupporting all NiFi
> processors and controller services natively without change.This results in
> the ability to run NiFi flows as ephemeral, stateless functionsand should
> be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency,
> and scalability while leveraging the vast library of NiFiprocessors and the
> NiFi UI for building custom flows.
> >
> >
> >
> >
> > By leveraging container engines (e.g.YARN, Kubernetes), long-running
> NiFi-Fn flows can be deployed that take fulladvantage of the platform’s
> scale and multi-tenancy features. By leveragingFunction as a Service
> engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be
> attached to event sources (or just cron) for event-drivendata movement
> where flows only run when triggered and pricing is measured atthe 100ms
> granularity. By combining the two, large-scale batch processing couldalso
> be performed.
> >
> >
> >
> >
> > An additional opportunity is tointegrate NiFi-Fn back into NiFi. This
> could provide a clean solution for aNiFi jobs interface. A user could
> select a run-time on a per process group basisto take advantage of the
> NiFi-Fn efficiency and job-like execution whenappropriate without requiring
> a container engine or FaaS platform. A newmonitoring interface could then
> be provided in the NiFi UI for thesejob-oriented workloads.
> >
> >
> >
> >
> > Potential NiFi-Fn run-times include:
> >
> >  - Java (done)
> >  - Docker (done)
> >  - OpenWhisk
> >
> >  - Java (done)
> >  - Custom (done)
> >
> >  - YARN (done)
> >  - Kubernetes (TODO)
> >  - AWS Lambda (TODO)
> >  - Azure Functions (TODO)
> >  - Google Cloud Functions (TODO)
> >  - Oracle Fn (TODO)
> >  - CloudFoundry (TODO)
> >  - NiFi custom processor (TODO)
> >  - NiFi jobs runtime (TODO)
> >
> >
> >
> > The core of NiFi-Fn is complete,but it could use some improved testing,
> more run-times, and better reporting forlogs, metrics, and provenance.
> >
> >
> >
> >
> >
> > Sam Hjelmfelt
> >
> > Principal Software Engineer
> >
> > Hortonworks
> >
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Sumanth
In reply to this post by Samuel Hjelmfelt
Sam,
great proposal.
I am thinking in terms of network communication between
processors/functions.
came across http://rsocket.io/ which might be ideal inter-process
communication protocol for NiFi-fn
this can provide back pressure  over network.
looking forward feedback on any alternatives that can provide back pressure
over network.



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

David Kegley
Sam,

This is really exciting stuff, thanks for contributing.

I took a run at the proposed Kubernetes runtime using a custom NiFiFn
operator, source is here: https://github.com/b23llc/nifi-fn-operator

I also have some related feature requests, is JIRA the preferred place to
track feature requests or should I wait until the initial PR is merged?

Thanks,
David



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Peter Wilcsinszky
Hi David!

That is a nice one as well! I can see you have a custom NiFi image, but can
not track back to the Dockerfile, could you point to where we can find it?
(I understand it's marginal, just curious)

Cheers,
Peter

On Tue, Feb 12, 2019 at 8:04 PM David Kegley <[hidden email]> wrote:

> Sam,
>
> This is really exciting stuff, thanks for contributing.
>
> I took a run at the proposed Kubernetes runtime using a custom NiFiFn
> operator, source is here: https://github.com/b23llc/nifi-fn-operator
>
> I also have some related feature requests, is JIRA the preferred place to
> track feature requests or should I wait until the initial PR is merged?
>
> Thanks,
> David
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

David Kegley
Hey Peter,

That image was from a different project I was working on and I forgot to add
it to the nififn repo.
It just has a couple additional nifi properties so that kubectl port-forward
could be used to expose the pods over the proxy. I'll update the repo to
include the nifi dockerfile as well and publish a new image

David



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Proposing NiFi-Fn

Samuel Hjelmfelt
In reply to this post by David Kegley
Hi David,Sorry for the delay. It has been a busy couple of months.
Nice work on the Kubernetes run-time! With Mark's changes brought in, the initial NiFi-Fn PR should be about ready to be merged. I mostly just need to fix the Docker image.

Once it is merged, it would be great to bring your runtime and other requests in via subsequent JIRAs.


-Sam
    On Tuesday, February 12, 2019, 11:04:21 AM PST, David Kegley <[hidden email]> wrote:  
 
 Sam,

This is really exciting stuff, thanks for contributing.

I took a run at the proposed Kubernetes runtime using a custom NiFiFn
operator, source is here: https://github.com/b23llc/nifi-fn-operator

I also have some related feature requests, is JIRA the preferred place to
track feature requests or should I wait until the initial PR is merged?

Thanks,
David



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/