Cluster Warnings

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster Warnings

Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added recently and as soon as we added the 4th node, we started seeing below warnings

Response time from NODE2 was slow for each of the last 3 requests made. To see more information about timing, enable DEBUG logging for org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator

Initially we though the problem was with the recent node added and cross checked all the configs on the box and everything seemed to be just fine. After enabling the DEBUG mode for cluster logging we noticed that the warning is not specific to any node and every-time we see a warning like above there is one slow node which takes forever to send a response like below (in this case the slow node is NIFI04). Sometimes these will lead to node-disconnects needing a manual intervention.

DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5):
NIFI04:8443: 1386 millis
NIFI02:8443: 3 millis
NIFI01:8443: 5 millis
NIFI03:8443: 3 millis
DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
NIFI04:8443: 1143 millis
NIFI02:8443: 22 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis
DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET /nifi-api/site-to-site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920):
NIFI04:8443: 1053 millis
NIFI02:8443: 3 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis

We tried changing the configurations in nifi.properties like bumping up the "nifi.cluster.node.protocol.max.threads" but none of them seems to be working and we're still stuck with the slow communication between the nodes. We use an external zookeeper as this is our production server.
Below are some of our configs

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=100
nifi.cluster.node.protocol.max.threads=120
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=90 sec
nifi.cluster.node.read.timeout=90 sec
nifi.cluster.node.max.concurrent.requests=1000
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=

Any thoughts on why this is happening?


-Karthik
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Warnings

Joe Witt
please keep this thread on users.

On Sun, Oct 14, 2018, 11:53 PM Karthik Kothareddy (karthikk) [CONT - Type
2] <[hidden email]> wrote:

> Hello,
>
> We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added
> recently and as soon as we added the 4th node, we started seeing below
> warnings
>
> Response time from NODE2 was slow for each of the last 3 requests made. To
> see more information about timing, enable DEBUG logging for
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator
>
> Initially we though the problem was with the recent node added and cross
> checked all the configs on the box and everything seemed to be just fine.
> After enabling the DEBUG mode for cluster logging we noticed that the
> warning is not specific to any node and every-time we see a warning like
> above there is one slow node which takes forever to send a response like
> below (in this case the slow node is NIFI04). Sometimes these will lead to
> node-disconnects needing a manual intervention.
>
> DEBUG [Replicate Request Thread-50]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET
> /nifi-api/site-to-site (Request ID b2c6e983-5233-4007-bd54-13d21b7068d5):
> NIFI04:8443: 1386 millis
> NIFI02:8443: 3 millis
> NIFI01:8443: 5 millis
> NIFI03:8443: 3 millis
> DEBUG [Replicate Request Thread-41]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET
> /nifi-api/site-to-site (Request ID d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
> NIFI04:8443: 1143 millis
> NIFI02:8443: 22 millis
> NIFI01:8443: 3 millis
> NIFI03:8443: 2 millis
> DEBUG [Replicate Request Thread-31]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Node Responses for GET
> /nifi-api/site-to-site (Request ID e4726027-27c7-4bbb-8ab6-d02bb41f1920):
> NIFI04:8443: 1053 millis
> NIFI02:8443: 3 millis
> NIFI01:8443: 3 millis
> NIFI03:8443: 2 millis
>
> We tried changing the configurations in nifi.properties like bumping up
> the "nifi.cluster.node.protocol.max.threads" but none of them seems to be
> working and we're still stuck with the slow communication between the
> nodes. We use an external zookeeper as this is our production server.
> Below are some of our configs
>
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com
> nifi.cluster.node.protocol.port=11443
> nifi.cluster.node.protocol.threads=100
> nifi.cluster.node.protocol.max.threads=120
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=90 sec
> nifi.cluster.node.read.timeout=90 sec
> nifi.cluster.node.max.concurrent.requests=1000
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=30 sec
> nifi.cluster.flow.election.max.candidates=
>
> Any thoughts on why this is happening?
>
>
> -Karthik
>