The mystery of ListSFTP that stops working

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

The mystery of ListSFTP that stops working

Andre
Folks,

Has anyone ever seen a situation where ListSFTP simply stops working?

I have observed a few occurrences of seems to be some weird bug.

Symptoms are:

- Processor looks healthy (i.e. no bulletins)
- State is stalled (i.e. no changes to new files or timestamps)
- No signs of a stuck thread (i.e. processor doesn't display a thread count)
- UI displays processor "Task count = 0/0"
- Neither stopping > starting nor stopping > disabling > enabling >
starting processor seems to make a difference.

Has anyone seen this?

Cheers
Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

ruurd.schoonheim
Same behavior seen with putHbase processor.
Nifi restart was only fix.
Would like to see root cause and way to get control of processor without nifi-restart.


> Op 15 nov. 2017 om 10:58 heeft Andre <[hidden email]> het volgende geschreven:
>
> Folks,
>
> Has anyone ever seen a situation where ListSFTP simply stops working?
>
> I have observed a few occurrences of seems to be some weird bug.
>
> Symptoms are:
>
> - Processor looks healthy (i.e. no bulletins)
> - State is stalled (i.e. no changes to new files or timestamps)
> - No signs of a stuck thread (i.e. processor doesn't display a thread count)
> - UI displays processor "Task count = 0/0"
> - Neither stopping > starting nor stopping > disabling > enabling >
> starting processor seems to make a difference.
>
> Has anyone seen this?
>
> Cheers
Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

Mark Payne
In reply to this post by Andre
Hey Andre,

I have not seen this personally. Can you share how you have the Scheduling Tab configured?
Is it set to Timer-Driven with a period of "0 secs"? Also, what is the Yield Duration set to?
Is there any backpressure configured on any of the outbound connections?
Additionally, what is the value of the "nifi.administrative.yield.duration" property in nifi.properties?

Sorry - I know that's a barrage of questions. Hopefully it's something easy that's just being overlooked, though :)

-Mark

> On Nov 15, 2017, at 4:58 AM, Andre <[hidden email]> wrote:
>
> Folks,
>
> Has anyone ever seen a situation where ListSFTP simply stops working?
>
> I have observed a few occurrences of seems to be some weird bug.
>
> Symptoms are:
>
> - Processor looks healthy (i.e. no bulletins)
> - State is stalled (i.e. no changes to new files or timestamps)
> - No signs of a stuck thread (i.e. processor doesn't display a thread count)
> - UI displays processor "Task count = 0/0"
> - Neither stopping > starting nor stopping > disabling > enabling >
> starting processor seems to make a difference.
>
> Has anyone seen this?
>
> Cheers

Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

Andre
Mark,

Timer driven, Primary Node, 5 min
Yield is set to 1 sec
Backpressure = 10k flows or 1GB
nifi.administrative.yield.duration=30 sec

Cheers


On Thu, Nov 16, 2017 at 1:32 AM, Mark Payne <[hidden email]> wrote:

> Hey Andre,
>
> I have not seen this personally. Can you share how you have the Scheduling
> Tab configured?
> Is it set to Timer-Driven with a period of "0 secs"? Also, what is the
> Yield Duration set to?
> Is there any backpressure configured on any of the outbound connections?
> Additionally, what is the value of the "nifi.administrative.yield.duration"
> property in nifi.properties?
>
> Sorry - I know that's a barrage of questions. Hopefully it's something
> easy that's just being overlooked, though :)
>
> -Mark
>
> > On Nov 15, 2017, at 4:58 AM, Andre <[hidden email]> wrote:
> >
> > Folks,
> >
> > Has anyone ever seen a situation where ListSFTP simply stops working?
> >
> > I have observed a few occurrences of seems to be some weird bug.
> >
> > Symptoms are:
> >
> > - Processor looks healthy (i.e. no bulletins)
> > - State is stalled (i.e. no changes to new files or timestamps)
> > - No signs of a stuck thread (i.e. processor doesn't display a thread
> count)
> > - UI displays processor "Task count = 0/0"
> > - Neither stopping > starting nor stopping > disabling > enabling >
> > starting processor seems to make a difference.
> >
> > Has anyone seen this?
> >
> > Cheers
>
>
Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

Mark Payne
Andre,

So I am guessing that backpressure is not an issue then if you're not seeing it run :)
Have you tried reducing the scheduling period from 5 mins to something like 5 seconds?
Of course, you may not want to actually be running it every 5 seconds in a production environment,
but I am curious if it would cause it to start running or not...

> On Nov 15, 2017, at 10:02 AM, Andre <[hidden email]> wrote:
>
> Mark,
>
> Timer driven, Primary Node, 5 min
> Yield is set to 1 sec
> Backpressure = 10k flows or 1GB
> nifi.administrative.yield.duration=30 sec
>
> Cheers
>
>
> On Thu, Nov 16, 2017 at 1:32 AM, Mark Payne <[hidden email]> wrote:
>
>> Hey Andre,
>>
>> I have not seen this personally. Can you share how you have the Scheduling
>> Tab configured?
>> Is it set to Timer-Driven with a period of "0 secs"? Also, what is the
>> Yield Duration set to?
>> Is there any backpressure configured on any of the outbound connections?
>> Additionally, what is the value of the "nifi.administrative.yield.duration"
>> property in nifi.properties?
>>
>> Sorry - I know that's a barrage of questions. Hopefully it's something
>> easy that's just being overlooked, though :)
>>
>> -Mark
>>
>>> On Nov 15, 2017, at 4:58 AM, Andre <[hidden email]> wrote:
>>>
>>> Folks,
>>>
>>> Has anyone ever seen a situation where ListSFTP simply stops working?
>>>
>>> I have observed a few occurrences of seems to be some weird bug.
>>>
>>> Symptoms are:
>>>
>>> - Processor looks healthy (i.e. no bulletins)
>>> - State is stalled (i.e. no changes to new files or timestamps)
>>> - No signs of a stuck thread (i.e. processor doesn't display a thread
>> count)
>>> - UI displays processor "Task count = 0/0"
>>> - Neither stopping > starting nor stopping > disabling > enabling >
>>> starting processor seems to make a difference.
>>>
>>> Has anyone seen this?
>>>
>>> Cheers
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

Andre
Hey Mark,

Changing the scheduling doesn't seem to make a difference.

I have noticed some instances where a nifi.sh dump causes a switch of the
primary node and that unleashes the stuck processor but still not clear why
it happens.

I took a thread dump of the primary node while the thing was happening.
Will upload to gist soon.

Cheers



On Thu, Nov 16, 2017 at 2:07 AM, Mark Payne <[hidden email]> wrote:

> Andre,
>
> So I am guessing that backpressure is not an issue then if you're not
> seeing it run :)
> Have you tried reducing the scheduling period from 5 mins to something
> like 5 seconds?
> Of course, you may not want to actually be running it every 5 seconds in a
> production environment,
> but I am curious if it would cause it to start running or not...
>
> > On Nov 15, 2017, at 10:02 AM, Andre <[hidden email]> wrote:
> >
> > Mark,
> >
> > Timer driven, Primary Node, 5 min
> > Yield is set to 1 sec
> > Backpressure = 10k flows or 1GB
> > nifi.administrative.yield.duration=30 sec
> >
> > Cheers
> >
> >
> > On Thu, Nov 16, 2017 at 1:32 AM, Mark Payne <[hidden email]>
> wrote:
> >
> >> Hey Andre,
> >>
> >> I have not seen this personally. Can you share how you have the
> Scheduling
> >> Tab configured?
> >> Is it set to Timer-Driven with a period of "0 secs"? Also, what is the
> >> Yield Duration set to?
> >> Is there any backpressure configured on any of the outbound connections?
> >> Additionally, what is the value of the "nifi.administrative.yield.
> duration"
> >> property in nifi.properties?
> >>
> >> Sorry - I know that's a barrage of questions. Hopefully it's something
> >> easy that's just being overlooked, though :)
> >>
> >> -Mark
> >>
> >>> On Nov 15, 2017, at 4:58 AM, Andre <[hidden email]> wrote:
> >>>
> >>> Folks,
> >>>
> >>> Has anyone ever seen a situation where ListSFTP simply stops working?
> >>>
> >>> I have observed a few occurrences of seems to be some weird bug.
> >>>
> >>> Symptoms are:
> >>>
> >>> - Processor looks healthy (i.e. no bulletins)
> >>> - State is stalled (i.e. no changes to new files or timestamps)
> >>> - No signs of a stuck thread (i.e. processor doesn't display a thread
> >> count)
> >>> - UI displays processor "Task count = 0/0"
> >>> - Neither stopping > starting nor stopping > disabling > enabling >
> >>> starting processor seems to make a difference.
> >>>
> >>> Has anyone seen this?
> >>>
> >>> Cheers
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

kapkbr
I am seeing similar problem, It shows both Tasks/Duration as 0 (Sreenshot 1).
It shows the status as started, but appear never spawned task. Configuration
of the task is

Timer driven
Run Schedule 0
Primary node only
Yield 1 sec
penalty 30 sec

I have tried to change yield and other things, but nothing seems working.

Here are some more strange things
1) No message or error in log except following message...
[StandardProcessScheduler Thread-6] o.a.n.c.s.TimerDrivenSchedulingAgent
Scheduled ListFile[id=cafa1842-015f-1000-ffff-ffffee80ecf2] to run with 1
threads
2) If I change it to run on all nodes, it just works. But thats not what I
want (it will create duplicates if I do)
3) Not only this processor, all processors configured to run on primary node
behave same :(
4) I have another environment with same configuration, it just works fine.
5) I thought this could be by some stuck threads, but this problem remain
same after cluster restart.

Any idea where to look for this kind of problem?

<http://apache-nifi-developer-list.39713.n7.nabble.com/file/t841/screenshot1.png>

-Kap




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: The mystery of ListSFTP that stops working

kapkbr
After some more debugging, I have observed that Nifi is messing ZK nodes
after many stops and starts. I have pointed Nifi to newly created path in ZK
(zookeeper) and everything started working.



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/