Operational Deployment/Garbage Collection

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Operational Deployment/Garbage Collection

Mike Drob
Are there operational guidelines somewhere on heap sizing and garbage
collection when deploying NiFi?

There's a lot of common wisdom about how to avoid full GCs (which I assume
are as bad for NiFi as they are for any Java application) but I was curious
what people had experience running with.

CMS? G1? C4? Are there recommended options to enable/disable based on how
NiFi runs for a smoother experience?

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Operational Deployment/Garbage Collection

Joe Witt
Mike,

We hope to discuss this in much more detail as we make progress toward the
administration guide.  But we are certainly susceptible to GC behaviors
which can impact performance.  That is particularly true because of the
extension points which folks can build to (processors, controller tasks,
etc..).  We've taken great care to be as memory efficient as possible in
all of our internal framework components and the existing standard
processors.  In short, everything is designed to handle arbitrarily large
objects without every loading more than some finite and relatively small
amount of memory at once.

Where this breaks down as we currently have it is the FlowFile objects
themselves.  For each flow file that is active in the flow we have the
entire Map of attribute key/value String pairs loaded with the FlowFIle
object.  So while we do not have the actual content of the flowfile in
memory we do have those Maps and a few small values with each.  If there
are dozens or hundreds of large keys/values across hundreds of thousands if
not many millions of flow files then that can start to eat into heap usage
considerably.  We do combat this fairly well with a concept called
'flowfile swapping'.  If a queue backlogs beyond a configurable threshold
we actually serialize the excess flowfiles out to storage (off heap).  This
allows for massive backlogs to be gracefully handled.  But this mechanism
is still arguably crude as it is purely based on 'number of flow files' and
in reality there can be great variability in the "Heap cost" of any flow
file and that depends on the number of and size of the attributes.

The key stressors of the heap:
- Is it large enough for all the normal goings on in a flow?
-- If yes great.  If no then no matter what things will be no fun..  The
size needed depends on how many things are in the flow, how many flow files
can be around at once, the sophistication of the processors in the flow.

-- Are most objects created of a relatively short life span?  If yes
great.  If not then it creates a different of tension on garbage
collection.  G1 tends to handle even this fairly well but still folks
should strive to have objects as short lived as possible.

-- Are all operations against content (which could be arbitrarily large)
done so in a manner which only ever has some finite amount in the heap at a
time?  This is by far the single biggest gotcha we see related to garbage
collection issues.  It is imperative that if one wants to see their JVM
stay performant that they be very cognizant of being buffer 'stream'
friendly rather than using byte[] to hold large objects.

I've run with G1 very successfully for a very long time and if I write the
documentation for this I would recommend its use.

I've put together a couple of 'Stress Test' style templates that people can
run on their configured system to get a sense of memory load for well
behaved processors and framework components.  Hopefully that will help put
some real information behind such a discussion.  We can also update the
GenerateFlowFile processor to have what would be considered bad behaviors
so folks can plainly see the effects of bad memory practices.

Was this rambling even close to what you were looking for?

Thanks
Joe

On Wed, Jan 7, 2015 at 11:38 AM, Mike Drob <[hidden email]> wrote:

> Are there operational guidelines somewhere on heap sizing and garbage
> collection when deploying NiFi?
>
> There's a lot of common wisdom about how to avoid full GCs (which I assume
> are as bad for NiFi as they are for any Java application) but I was curious
> what people had experience running with.
>
> CMS? G1? C4? Are there recommended options to enable/disable based on how
> NiFi runs for a smoother experience?
>
> Mike
>
Reply | Threaded
Open this post in threaded view
|

Re: Operational Deployment/Garbage Collection

Mike Drob
On Wed, Jan 7, 2015 at 10:54 AM, Joe Witt <[hidden email]> wrote:

> Mike,
>
> We hope to discuss this in much more detail as we make progress toward the
> administration guide.  But we are certainly susceptible to GC behaviors
> which can impact performance.  That is particularly true because of the
> extension points which folks can build to (processors, controller tasks,
> etc..).  We've taken great care to be as memory efficient as possible in
> all of our internal framework components and the existing standard
> processors.  In short, everything is designed to handle arbitrarily large
> objects without every loading more than some finite and relatively small
> amount of memory at once.
>
> Yea, capturing all of this in user/operator facing documentation is
probably the best end-goal. I can file a JIRA if one does not already exist.


> Where this breaks down as we currently have it is the FlowFile objects
> themselves.  For each flow file that is active in the flow we have the
> entire Map of attribute key/value String pairs loaded with the FlowFIle
> object.  So while we do not have the actual content of the flowfile in
> memory we do have those Maps and a few small values with each.  If there
> are dozens or hundreds of large keys/values across hundreds of thousands if
> not many millions of flow files then that can start to eat into heap usage
> considerably.  We do combat this fairly well with a concept called
> 'flowfile swapping'.  If a queue backlogs beyond a configurable threshold
> we actually serialize the excess flowfiles out to storage (off heap).  This
> allows for massive backlogs to be gracefully handled.  But this mechanism
> is still arguably crude as it is purely based on 'number of flow files' and
> in reality there can be great variability in the "Heap cost" of any flow
> file and that depends on the number of and size of the attributes.
>

Are there metrics kept on flow file metadata? I recall seeing # of flow
files, but it would be cool to see summary statistics on number of
attributes, memory footprint per flow file, etc. Apologies if this already
exists, I haven't gone looking yet. Maybe JMX is a good place for these.

>
> The key stressors of the heap:
> - Is it large enough for all the normal goings on in a flow?
> -- If yes great.  If no then no matter what things will be no fun..  The
> size needed depends on how many things are in the flow, how many flow files
> can be around at once, the sophistication of the processors in the flow.
>
> -- Are most objects created of a relatively short life span?  If yes
> great.  If not then it creates a different of tension on garbage
> collection.  G1 tends to handle even this fairly well but still folks
> should strive to have objects as short lived as possible.
>

> -- Are all operations against content (which could be arbitrarily large)
> done so in a manner which only ever has some finite amount in the heap at a
> time?  This is by far the single biggest gotcha we see related to garbage
> collection issues.  It is imperative that if one wants to see their JVM
> stay performant that they be very cognizant of being buffer 'stream'
> friendly rather than using byte[] to hold large objects.
>

I could come up with several scenarios (i.e. do this or that) to ask about,
but I think I'll be better served by just looking at existing processors as
exemplars. I'll come back with more questions after I've read the source.

>
> I've run with G1 very successfully for a very long time and if I write the
> documentation for this I would recommend its use.


Good to know.

>
> I've put together a couple of 'Stress Test' style templates that people can
> run on their configured system to get a sense of memory load for well
> behaved processors and framework components.  Hopefully that will help put
> some real information behind such a discussion.  We can also update the
> GenerateFlowFile processor to have what would be considered bad behaviors
> so folks can plainly see the effects of bad memory practices.
>

This is very cool. I would make the bad behaviours optional, but otherwise
that is an incredibly clever idea. I love it.

>
> Was this rambling even close to what you were looking for?
>

Yes, very informative. Thank you.

>
> Thanks
> Joe
>
> On Wed, Jan 7, 2015 at 11:38 AM, Mike Drob <[hidden email]> wrote:
>
> > Are there operational guidelines somewhere on heap sizing and garbage
> > collection when deploying NiFi?
> >
> > There's a lot of common wisdom about how to avoid full GCs (which I
> assume
> > are as bad for NiFi as they are for any Java application) but I was
> curious
> > what people had experience running with.
> >
> > CMS? G1? C4? Are there recommended options to enable/disable based on how
> > NiFi runs for a smoother experience?
> >
> > Mike
> >
>

Mike