Status of "event-driven" scheduling

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Status of "event-driven" scheduling

Joe Percivall-2
Hey everyone,

The dataflow I'm running has one main flow and a couple other disjoint
process groups. Within that main flow, there are sections which aren't used
very often. In trying to optimize things, I looked into the guidance we
have on the "event-driven" scheduling type. There doesn't appear to be much
concrete other than "it's experimental". Which has been the go-to,
basically since being open-sourced.

So with that, I'm curious about a couple things:
1: With the recent improvements to the controller and timer-based
scheduling, what should be our guidance on when to use event-based over
timer-based?
2: Is anyone actually using it in production?
3: Given it's been 3+ years of "it's experimental", we should start
thinking about either declaring it good to go or deprecating it.
4: Any lessons learned on optimizing disjoint/sparse flows.

Cheers,
Joe
--
*Joe Percivall*
linkedin.com/in/Percivall
e: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Status of "event-driven" scheduling

Michael Moser-2
Hi Joe,

I'm guessing here, but I think the Event Driven scheduling was intended to
be more efficient than Timer Driven scheduling, in the way that push
notifications should be more efficient than polling.  In practice, I'm not
sure anyone has measured the difference.

I have seen folks use Event Driven scheduling to get access to the separate
thread pool from the Timer Driven pool.  For example, if you are running on
an 8 core system but you want a Timer Driven pool with 50 threads to do
lots of I/O bound tasks, you might create an Event Driven pool with 4
threads and assign your CPU heavy processing to that pool.  This limit may
avoid having way more than 8 CPU heavy threads (from the Timer Driven pool)
bogging down your 8 core system.

Regards,
-- Mike


On Thu, Sep 6, 2018 at 3:11 PM Joe Percivall <[hidden email]> wrote:

> Hey everyone,
>
> The dataflow I'm running has one main flow and a couple other disjoint
> process groups. Within that main flow, there are sections which aren't used
> very often. In trying to optimize things, I looked into the guidance we
> have on the "event-driven" scheduling type. There doesn't appear to be much
> concrete other than "it's experimental". Which has been the go-to,
> basically since being open-sourced.
>
> So with that, I'm curious about a couple things:
> 1: With the recent improvements to the controller and timer-based
> scheduling, what should be our guidance on when to use event-based over
> timer-based?
> 2: Is anyone actually using it in production?
> 3: Given it's been 3+ years of "it's experimental", we should start
> thinking about either declaring it good to go or deprecating it.
> 4: Any lessons learned on optimizing disjoint/sparse flows.
>
> Cheers,
> Joe
> --
> *Joe Percivall*
> linkedin.com/in/Percivall
> e: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Status of "event-driven" scheduling

Mark Payne
Joe,

Mike is right in that it was intended to be a more efficient scheduling strategy. With Timer-Driven,
the processors used to constantly be checking if they had work to do and if not would switch contexts
and check again. And again. This was pretty expensive, so we added the Event-Driven strategy.

Generally, implementing the Event-Driven strategy should be fairly simple and straight-forward. When
a FlowFile lands in a queue, just call the onTrigger method of the queue's destination. However, it got
a lot more complicated when we need to consider backpressure and limiting the number of concurrent tasks.
So much more complicated, in fact, that tested showed that the Event-Driven strategy was noticeably
slower than Timer-Driven. To that end, we added the "nifi.bored.yield.duration" property to nifi.properties
and updated the framework so that if there is no work for the Processor to do (due to its queues being empty
or backpressure being applied) we don't schedule that processor thread for the configured number of time.
Implementing this showed a significant drop in CPU resources while still providing great throughput. So, truth
be told, we pretty much abandoned using Event-Driven.

I do also remember several years back, running into an issue where under high load we would occasionally
see a Processor "freeze up" using Event-Driven scheduling. I think that was the main reason we marked it
experimental. It was unclear what the cause was, but given how well the Timer-Driven scheduling strategy as
worked for us, I've just never re-visited it.

That being said, I do believe that an Event-Driven approach is a good idea. But given how much more mature NiFi
is now than it was at the point that it was implemented, I would probably approach the idea entirely differently.
To answer your questions directly:

1. I would never recommend using event-driven over timer-driven processors.
2. Not sure who is using it in production, but I would recommend against it.
3. My vote would be to mark it as deprecated.
4. To be honest, I'm not sure that I fully understand this question, as it is somewhat vague. Are you referring specifically
to scheduling, obtaining the best performance, minimizing resource utilization, or did you intend for this to be vague and
are just asking for any general guidance in whatever form?

Thanks
-Mark


> On Sep 12, 2018, at 5:11 PM, Michael Moser <[hidden email]> wrote:
>
> Hi Joe,
>
> I'm guessing here, but I think the Event Driven scheduling was intended to
> be more efficient than Timer Driven scheduling, in the way that push
> notifications should be more efficient than polling.  In practice, I'm not
> sure anyone has measured the difference.
>
> I have seen folks use Event Driven scheduling to get access to the separate
> thread pool from the Timer Driven pool.  For example, if you are running on
> an 8 core system but you want a Timer Driven pool with 50 threads to do
> lots of I/O bound tasks, you might create an Event Driven pool with 4
> threads and assign your CPU heavy processing to that pool.  This limit may
> avoid having way more than 8 CPU heavy threads (from the Timer Driven pool)
> bogging down your 8 core system.
>
> Regards,
> -- Mike
>
>
> On Thu, Sep 6, 2018 at 3:11 PM Joe Percivall <[hidden email]> wrote:
>
>> Hey everyone,
>>
>> The dataflow I'm running has one main flow and a couple other disjoint
>> process groups. Within that main flow, there are sections which aren't used
>> very often. In trying to optimize things, I looked into the guidance we
>> have on the "event-driven" scheduling type. There doesn't appear to be much
>> concrete other than "it's experimental". Which has been the go-to,
>> basically since being open-sourced.
>>
>> So with that, I'm curious about a couple things:
>> 1: With the recent improvements to the controller and timer-based
>> scheduling, what should be our guidance on when to use event-based over
>> timer-based?
>> 2: Is anyone actually using it in production?
>> 3: Given it's been 3+ years of "it's experimental", we should start
>> thinking about either declaring it good to go or deprecating it.
>> 4: Any lessons learned on optimizing disjoint/sparse flows.
>>
>> Cheers,
>> Joe
>> --
>> *Joe Percivall*
>> linkedin.com/in/Percivall
>> e: [hidden email]
>>

Reply | Threaded
Open this post in threaded view
|

Re: Status of "event-driven" scheduling

Joe Percivall-2
Mike/Mark,

Thanks for the responses! Given the background and current state, I'd vote
to mark as deprecated too. If in agreement, I'll create the ticket for
further discussion/planning.

Mike, that's a great point around IO vs CPU driven tasking buckets. With
some of the modern reactive frameworks, that's how they create thread pools
to divvy up work. If we could somehow apply that to our scheduling that'd
be awesome! Maybe we can leverage the CPU & IO annotations some processors
already have?

Mark, yeah, #4 was a bit more vague than intended. Primarily it was around
scheduling but also open-ended for general guidance (always like to learn
from other's experiences). As for our use-case, the core of our data plane
flow branches into ~20 different PGs based on how to handle different data
types. Each one does some transformation (is/was just EvaluateJson +
AttributesToJson but moving to records) and hitting internal microservices.
Processors hitting the micro-services are the slowest. Depending on the
deployment/time, 95% of the FFs would be across ~4 data types but there
could be variances with spikes and/or some types never being used. In
addition to that main data pipeline, there are other disjointed PGs in the
data plane instance.

Similar to Mike's point, one key lesson learned is how important run
duration is when there's a mixture of fast (UpdateAttribute) and slow (any
IO-bounded) processors. Where if you may allocate many concurrent tasks to
the slow processors, and don't increase the tasks or run duration on the
fast ones, the fast ones will lag behind. This due to the fact that they're
only be scheduled at an X:1 rate. Along those lines, one thing I've thought
about but regrettably not brought up before, adding run duration for
internal ports + funnels. They are still run using the old version of
batching (grabbing 100 FFs off the queue) and only run with one concurrent
task.

So aside from setting the run duration for any processor which supports it,
increasing the concurrent threads for slow processors and
changing/increasing the bored yield, are there any other options for tuning
the scheduling of disjoint/sparse flows?

Joe


On Thu, Sep 13, 2018 at 9:50 AM Mark Payne <[hidden email]> wrote:

> Joe,
>
> Mike is right in that it was intended to be a more efficient scheduling
> strategy. With Timer-Driven,
> the processors used to constantly be checking if they had work to do and
> if not would switch contexts
> and check again. And again. This was pretty expensive, so we added the
> Event-Driven strategy.
>
> Generally, implementing the Event-Driven strategy should be fairly simple
> and straight-forward. When
> a FlowFile lands in a queue, just call the onTrigger method of the queue's
> destination. However, it got
> a lot more complicated when we need to consider backpressure and limiting
> the number of concurrent tasks.
> So much more complicated, in fact, that tested showed that the
> Event-Driven strategy was noticeably
> slower than Timer-Driven. To that end, we added the
> "nifi.bored.yield.duration" property to nifi.properties
> and updated the framework so that if there is no work for the Processor to
> do (due to its queues being empty
> or backpressure being applied) we don't schedule that processor thread for
> the configured number of time.
> Implementing this showed a significant drop in CPU resources while still
> providing great throughput. So, truth
> be told, we pretty much abandoned using Event-Driven.
>
> I do also remember several years back, running into an issue where under
> high load we would occasionally
> see a Processor "freeze up" using Event-Driven scheduling. I think that
> was the main reason we marked it
> experimental. It was unclear what the cause was, but given how well the
> Timer-Driven scheduling strategy as
> worked for us, I've just never re-visited it.
>
> That being said, I do believe that an Event-Driven approach is a good
> idea. But given how much more mature NiFi
> is now than it was at the point that it was implemented, I would probably
> approach the idea entirely differently.
> To answer your questions directly:
>
> 1. I would never recommend using event-driven over timer-driven processors.
> 2. Not sure who is using it in production, but I would recommend against
> it.
> 3. My vote would be to mark it as deprecated.
> 4. To be honest, I'm not sure that I fully understand this question, as it
> is somewhat vague. Are you referring specifically
> to scheduling, obtaining the best performance, minimizing resource
> utilization, or did you intend for this to be vague and
> are just asking for any general guidance in whatever form?
>
> Thanks
> -Mark
>
>
> > On Sep 12, 2018, at 5:11 PM, Michael Moser <[hidden email]> wrote:
> >
> > Hi Joe,
> >
> > I'm guessing here, but I think the Event Driven scheduling was intended
> to
> > be more efficient than Timer Driven scheduling, in the way that push
> > notifications should be more efficient than polling.  In practice, I'm
> not
> > sure anyone has measured the difference.
> >
> > I have seen folks use Event Driven scheduling to get access to the
> separate
> > thread pool from the Timer Driven pool.  For example, if you are running
> on
> > an 8 core system but you want a Timer Driven pool with 50 threads to do
> > lots of I/O bound tasks, you might create an Event Driven pool with 4
> > threads and assign your CPU heavy processing to that pool.  This limit
> may
> > avoid having way more than 8 CPU heavy threads (from the Timer Driven
> pool)
> > bogging down your 8 core system.
> >
> > Regards,
> > -- Mike
> >
> >
> > On Thu, Sep 6, 2018 at 3:11 PM Joe Percivall <[hidden email]>
> wrote:
> >
> >> Hey everyone,
> >>
> >> The dataflow I'm running has one main flow and a couple other disjoint
> >> process groups. Within that main flow, there are sections which aren't
> used
> >> very often. In trying to optimize things, I looked into the guidance we
> >> have on the "event-driven" scheduling type. There doesn't appear to be
> much
> >> concrete other than "it's experimental". Which has been the go-to,
> >> basically since being open-sourced.
> >>
> >> So with that, I'm curious about a couple things:
> >> 1: With the recent improvements to the controller and timer-based
> >> scheduling, what should be our guidance on when to use event-based over
> >> timer-based?
> >> 2: Is anyone actually using it in production?
> >> 3: Given it's been 3+ years of "it's experimental", we should start
> >> thinking about either declaring it good to go or deprecating it.
> >> 4: Any lessons learned on optimizing disjoint/sparse flows.
> >>
> >> Cheers,
> >> Joe
> >> --
> >> *Joe Percivall*
> >> linkedin.com/in/Percivall
> >> e: [hidden email]
> >>
>
>

--
*Joe Percivall*
linkedin.com/in/Percivall
e: [hidden email]