NiFi 1.8.0 LoadBalance Strategy Issue for Connection between Funnel and FetchSFTP

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

NiFi 1.8.0 LoadBalance Strategy Issue for Connection between Funnel and FetchSFTP

Josefz

Hi guys

 

We have a 8 cluster nifi cluster and do a listSFTP on the primary node. After the ListSFTP we add some attributes and send it over a funnel to the FetchSFTP. On the connection between the funnel and the FetchSFTP we have an “Object Threshold” of 100,some “Prioritizer” and round robin loadbalancing to get the files in a sorted order. Right after start we had about 800 files (expected value due to 8 nodes) in the queue between the funnel and the FetchSFTP, but after a few hours (we get about 200k-250k files from each ListSFTP processors) the number of files decreased to the number below. However, it seems that all nodes gets load, because after the FetchSFTP we see a more or less even distributed load.
Next Issue or maybe misunderstanding is, that we would like to have all the listSFTP files in a sorded order from the four folders. So we added the priority attribute where we assign as value epoch in seconds extracted from filename. However, it seems that there is no human understandable logic how the files get sorted in the queue between the funnel and the FetchSFTP, because after a few hours I see files with nearly the oldest and the newest possible timestamp in our DB (which shouldn’t be possible as we have the priority attribute with epoch time. Is the a failure in our logic how nifi works here? Should we remove the funnel and connect the UpdateAttribute processor directly to the FetchSFTP? Or how can we overcome the order issue?

 

Thanks in advance,

Josef

 

 

 

 


smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NiFi 1.8.0 LoadBalance Strategy Issue for Connection between Funnel and FetchSFTP

Mark Payne
Hi Josef,

The prioritizers provide a weak ordering to the data, not an absolute sorting. What I mean by that is that
if you are prioritizing a FlowFile with attribute A = 123 over a FlowFIle with attribute A = 125, then the first
one will likely go first but it's not guaranteed. For example, when you have Load Balanced connections,
that Connection between your Funnel and FetchSFTP actually consists of 8 different queues: one for each
node in your cluster. Within each of those queues, the FlowFiles in the queue are prioritized according to
your configured Prioritizers. So you're not guaranteed to process everything sequentially according to the
Prioritizer. Data that is swapped out can also change the 'absolute ordering' of FlowFiles.

Now, that being said, you should get a 'rough ordering' close to what you would expect. The way that you
have this shown here, though, I think is that only the Connection between the funnel and FetchSFTP is
using Prioritizers. This means that it will sort the data that it has according to your Prioritizer - but the Funnel
is feeding in the data from its Connections and those are not Prioritized. So you'll want to ensure that
the Connections between UpdateAttribute and the Funnel are also configured with Prioritizers.

Sorry for the wordiness. Hopefully this makes sense. If not, please let us know.

Thanks
-Mark



On Nov 8, 2018, at 2:55 AM, <[hidden email]<mailto:[hidden email]>> <[hidden email]<mailto:[hidden email]>> wrote:

Hi guys

We have a 8 cluster nifi cluster and do a listSFTP on the primary node. After the ListSFTP we add some attributes and send it over a funnel to the FetchSFTP. On the connection between the funnel and the FetchSFTP we have an “Object Threshold” of 100,some “Prioritizer” and round robin loadbalancing to get the files in a sorted order. Right after start we had about 800 files (expected value due to 8 nodes) in the queue between the funnel and the FetchSFTP, but after a few hours (we get about 200k-250k files from each ListSFTP processors) the number of files decreased to the number below. However, it seems that all nodes gets load, because after the FetchSFTP we see a more or less even distributed load.
Next Issue or maybe misunderstanding is, that we would like to have all the listSFTP files in a sorded order from the four folders. So we added the priority attribute where we assign as value epoch in seconds extracted from filename. However, it seems that there is no human understandable logic how the files get sorted in the queue between the funnel and the FetchSFTP, because after a few hours I see files with nearly the oldest and the newest possible timestamp in our DB (which shouldn’t be possible as we have the priority attribute with epoch time. Is the a failure in our logic how nifi works here? Should we remove the funnel and connect the UpdateAttribute processor directly to the FetchSFTP? Or how can we overcome the order issue?

Thanks in advance,
Josef


<image001.png>


<image002.png>