conditional Listen Syslog

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

conditional Listen Syslog

pradeepbill
This post was updated on .
hi there, I am scratching my head to see if there is something in NIFI I can do to my case, I have a ListenSyslog processor ->data output port->Spark application. So if my Spark application is down, I would like to cut off ListenSyslog processor as well so that I would not loose data being pushed, is there a way ?.Please advice.

Thanks
Pradeep
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

David Wynne
Pradeep,

You can configure back pressure on the connection from the ListenSyslog to
the output port.
It can be configure to use data size or number of objects, by selecting the
Settings tab on the connection.
To get to the settings tab, right click on the connection once it is
highlighted.
Once the configured threshold is met, the ListenSyslog processor will
essentially stop receiving data.

                                                                Dave.

On Wed, Jul 6, 2016 at 3:05 PM, pradeepbill <[hidden email]> wrote:

> hi there, I am scratching my head to see if there is something in NIFI I
> can
> do, I have a ListenSyslog processor ->data output port->Spark application.
> So if my Spark application is down, I would like to cut off ListenSyslog
> processor as well so that I would not loose data being pushed, is there a
> way ?.Please advice.
>
> Thanks
> Pradeep
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/conditional-Listen-Syslog-tp12652.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

pradeepbill
Thanks David, I have done that, and I still see a connection being established between the ListenSysLog port and the server  that is pushing data to ListenSysLog, then I thought NIFI is ignoring/loosing data after enabling back pressure, am I wrong?, I would like to be wrong ? Please advice.
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

Joe Witt
Hello

Is this syslog via UDP or is it via TCP?  If it is UDP then there
isn't much we can do to push back on the sender.  If it is TCP then we
certainly will stop bringing in new data which in turn should
effectively push back but I'm not sure that will work well because the
sender may not be able to buffer long.  Another consideration is that
NiFi itself can be used as a rather large buffer for this data itself.

Alternatively, you might find that this is a good use case for Kafka
as a buffer where it is syslogsender -> nifi -> kafka and then the
spark jobs source data from kafka.  That won't offer back pressure but
achieving true back pressure in syslog based sources broadly might be
elusive anyway.

Thanks
Joe


On Wed, Jul 6, 2016 at 12:14 PM, pradeepbill <[hidden email]> wrote:

> Thanks David, I have done that, and I still see a connection being
> established between the ListenSysLog port and the server  that is pushing
> data to ListenSysLog, then I thought NIFI is ignoring/loosing data after
> enabling back pressure, am I wrong?, I would like to be wrong ? Please
> advice.
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/conditional-Listen-Syslog-tp12652p12655.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

pradeepbill
thanks Joe, its a TCP protocol, and the sender is Qradar, which can buffer very well up to 60 days of data is what they say, but yes , I am  thinking using NIFI as a buffer is a good idea like you said. KAFKA is also a good idea, but we might stick to NIFI alone.

But will confirm with the other team pushing data to NIFI, if they are seeing any data being sent after back pressure applied.
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

Bryan Bende
In reply to this post by Joe Witt
I agree with everything Joe said, but just wanted to elaborate on one
point...

Currently the way ListenSyslog is implemented [1], UDP and TCP both share
the same logic that offers a message to an internal blocking queue with a
100 ms wait [2].
We should consider changing this for TCP so that it blocks indefinitely (or
maybe just way longer) so that the client (the syslog server) wouldn't be
able to send more data.

With the current way it works you would not want to set back-pressure
between ListenSyslog and whatever comes next, because this will cause
ListenSyslog to stop processing messages on
the internal queue, but the background thread will keep queueing them and
eventually start dropping messages (the code in EventQueue below).

[1]
https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html
[2]
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/event/EventQueue.java#L60



On Wed, Jul 6, 2016 at 4:32 PM, Joe Witt <[hidden email]> wrote:

> Hello
>
> Is this syslog via UDP or is it via TCP?  If it is UDP then there
> isn't much we can do to push back on the sender.  If it is TCP then we
> certainly will stop bringing in new data which in turn should
> effectively push back but I'm not sure that will work well because the
> sender may not be able to buffer long.  Another consideration is that
> NiFi itself can be used as a rather large buffer for this data itself.
>
> Alternatively, you might find that this is a good use case for Kafka
> as a buffer where it is syslogsender -> nifi -> kafka and then the
> spark jobs source data from kafka.  That won't offer back pressure but
> achieving true back pressure in syslog based sources broadly might be
> elusive anyway.
>
> Thanks
> Joe
>
>
> On Wed, Jul 6, 2016 at 12:14 PM, pradeepbill <[hidden email]>
> wrote:
> > Thanks David, I have done that, and I still see a connection being
> > established between the ListenSysLog port and the server  that is pushing
> > data to ListenSysLog, then I thought NIFI is ignoring/loosing data after
> > enabling back pressure, am I wrong?, I would like to be wrong ? Please
> > advice.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/conditional-Listen-Syslog-tp12652p12655.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

pradeepbill
Thanks Bryan, then I guess in my case for the time being, I am going to use it like  syslog processor-> control rate processor->data output port->spark application , and apply back pressure like really huge between syslog processor -> control rate processor   and very small back pressure between control rate processor -> data output port , that way  when the spark application comes back, a fixed data flows and wont overwhelm the spark application, decent idea ?
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

Bryan Bende
That probably makes sense. You may not even need back-pressure at all
between ListenSyslog -> Control Rate, this way the data will just queue up
there.
I guess it depends how fast your syslog data is coming in and how long you
think spark will be down.

NiFi will swap out flow files to disk when the queue gets very large and
should be able to hold a significant number of flow files in a queue
(millions).

The swapping is controlled by the following properties in nifi.properties:

nifi.queue.swap.threshold=20000
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4

On Wed, Jul 6, 2016 at 3:55 PM, pradeepbill <[hidden email]> wrote:

> Thanks Bryan, then I guess in my case for the time being, I am going to use
> it like  syslog processor-> control rate processor->data output port->spark
> application , and apply back pressure like really huge between syslog
> processor -> control rate processor   and very small back pressure between
> control rate processor -> data output port , that way  when the spark
> application comes back, a fixed data flows and wont overwhelm the spark
> application, decent idea ?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/conditional-Listen-Syslog-tp12652p12662.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

pradeepbill
thanks Bryan, will do.
Reply | Threaded
Open this post in threaded view
|

Re: conditional Listen Syslog

David Wynne
Pradeep,

While NiFi is able to hold millions of files, remember, your actual limit
is determined by how much disk space you have available on the partition
where the content_repository is located.



  Dave.

On Wed, Jul 6, 2016 at 4:41 PM, pradeepbill <[hidden email]> wrote:

> thanks Bryan, will do.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/conditional-Listen-Syslog-tp12652p12665.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>