Fixing unstable nifi cluster.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing unstable nifi cluster.

ashwin.konale@gmail.com
Hi,
We have a 3 node nifi cluster (With separate zookeper instances running in
the same machines) which pulls the data from mysql and write to hdfs. I am
frequently running into problems with cluster. Nodes keeps disconnecting
from each other, primary nodes keeps switching and sometimes it just goes
into zombie state when I just cannot access the ui. I have followed best
practices guide and tweaked params in nifi.properties, have switched
provenanceRepositoryImplementation to volatile because cluster was not able
to keep up with incoming traffic. Data traffic is not high at all (4Mbps).
This is the message I frequently get from the logs.

*INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change:
LOST*
*INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@56ebedec
Connection State changed to LOST*
*INFO [Curator-ConnectionStateManager-0]
o.a.n.c.l.e.CuratorLeaderElectionManager
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1b0e2055
Connection State changed to LOST*
*INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change:
RECONNECTED*

Am I doing something wrong with cluster setup ? Can someone give me some
guidance on how to go about debugging this issue ? What kind of system
metrics to look at etc.

Ashwin
Reply | Threaded
Open this post in threaded view
|

Re: Fixing unstable nifi cluster.

Pierre Villard
Hi,

Can you try increasing the below parameters? That's usually what I
recommend, our default values being probably a bit too aggressively low.

nifi.zookeeper.connect.timeout=15 secs
nifi.zookeeper.session.timeout=15 secs
nifi.cluster.node.read.timeout=30 sec

Pierre

Le mar. 16 oct. 2018 à 13:02, ashwin konale <[hidden email]> a
écrit :

> Hi,
> We have a 3 node nifi cluster (With separate zookeper instances running in
> the same machines) which pulls the data from mysql and write to hdfs. I am
> frequently running into problems with cluster. Nodes keeps disconnecting
> from each other, primary nodes keeps switching and sometimes it just goes
> into zombie state when I just cannot access the ui. I have followed best
> practices guide and tweaked params in nifi.properties, have switched
> provenanceRepositoryImplementation to volatile because cluster was not able
> to keep up with incoming traffic. Data traffic is not high at all (4Mbps).
> This is the message I frequently get from the logs.
>
> *INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change:
> LOST*
> *INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
>
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@56ebedec
> Connection State changed to LOST*
> *INFO [Curator-ConnectionStateManager-0]
> o.a.n.c.l.e.CuratorLeaderElectionManager
>
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1b0e2055
> Connection State changed to LOST*
> *INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change:
> RECONNECTED*
>
> Am I doing something wrong with cluster setup ? Can someone give me some
> guidance on how to go about debugging this issue ? What kind of system
> metrics to look at etc.
>
> Ashwin
>
Reply | Threaded
Open this post in threaded view
|

Re: Fixing unstable nifi cluster.

Jeff
Hello,

Pierre's suggestions should be helpful for you, since you are running
Zookeeper on the same nodes as NiFi.  If it's possible for you to run
Zookeeper on separate hosts from NiFi so that ZK and NiFi are not
colocated, you should see better results.

- Jeff

On Tue, Oct 16, 2018 at 8:03 AM Pierre Villard <[hidden email]>
wrote:

> Hi,
>
> Can you try increasing the below parameters? That's usually what I
> recommend, our default values being probably a bit too aggressively low.
>
> nifi.zookeeper.connect.timeout=15 secs
> nifi.zookeeper.session.timeout=15 secs
> nifi.cluster.node.read.timeout=30 sec
>
> Pierre
>
> Le mar. 16 oct. 2018 à 13:02, ashwin konale <[hidden email]> a
> écrit :
>
> > Hi,
> > We have a 3 node nifi cluster (With separate zookeper instances running
> in
> > the same machines) which pulls the data from mysql and write to hdfs. I
> am
> > frequently running into problems with cluster. Nodes keeps disconnecting
> > from each other, primary nodes keeps switching and sometimes it just goes
> > into zombie state when I just cannot access the ui. I have followed best
> > practices guide and tweaked params in nifi.properties, have switched
> > provenanceRepositoryImplementation to volatile because cluster was not
> able
> > to keep up with incoming traffic. Data traffic is not high at all
> (4Mbps).
> > This is the message I frequently get from the logs.
> >
> > *INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State
> change:
> > LOST*
> > *INFO [Curator-ConnectionStateManager-0]
> > o.a.n.c.l.e.CuratorLeaderElectionManager
> >
> >
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@56ebedec
> > Connection State changed to LOST*
> > *INFO [Curator-ConnectionStateManager-0]
> > o.a.n.c.l.e.CuratorLeaderElectionManager
> >
> >
> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1b0e2055
> > Connection State changed to LOST*
> > *INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State
> change:
> > RECONNECTED*
> >
> > Am I doing something wrong with cluster setup ? Can someone give me some
> > guidance on how to go about debugging this issue ? What kind of system
> > metrics to look at etc.
> >
> > Ashwin
> >
>