Data recovery with full disk

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Data recovery with full disk

bmichaud
NiFi 1.0.0

I have a cluster where one of the nodes ran out of space and left the cluster. Is there a way to delete archived content to free up space? Can the unprocessed data be recovered somehow or do I have to delete the repositories and replay the data from our Kafka topic?

I looked through the admin and user guides, but could not find anything on this topic.

Here is the error I am getting:

2017-01-31 14:00:00,002 ERROR [Timer-Driven Process Thread-2] o.a.n.p.standard.RouteOnAttribute RouteOnAttribute[id=464e0e04-5acd-498d-1fc5-af23c1b86ccd] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
2017-01-31 14:00:00,002 ERROR [Timer-Driven Process Thread-9] o.a.n.p.standard.EvaluateJsonPath
org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
        at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:363) ~[nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:299) ~[nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28) ~[nifi-api-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064) [nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.0.0.jar:1.0.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_65]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_65]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_65]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.io.IOException: All Partitions have been blacklisted due to failures when attempting to update. If the Write-Ahead Log is able to perform a checkpoint, this issue may resolve itself. Otherwise, manual intervention will be required.
        at org.wali.MinimalLockingWriteAheadLog.update(MinimalLockingWriteAheadLog.java:212) ~[nifi-write-ahead-log-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:219) ~[nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:187) ~[nifi-framework-core-1.0.0.jar:1.0.0]
        at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:357) ~[nifi-framework-core-1.0.0.jar:1.0.0]
        ... 13 common frames omitted
Reply | Threaded
Open this post in threaded view
|

Re: Data recovery with full disk

Mark Payne
Ben,

You can certainly delete archived content from the Content Repository. Assuming that you are
running Linux (or OSX or cygwin, etc.) you could do this by running something like:

find content_repository/ -type f -amin +600 | grep archive | xargs rm -f

That would delete anything from the Content Repository's archive that is older than 10 hours.
Once you've cleaned that up, you should be able to restart NiFI.

Note that this is assuming that your content repo and flowfile repo are on the same partition,
as otherwise this wouldn't help given that your flowfile repo is the repo that is complaining about
being out of disk space.

Thanks
-Mark

> On Jan 31, 2017, at 8:54 PM, bmichaud <[hidden email]> wrote:
>
> NiFi 1.0.0
>
> I have a cluster where one of the nodes ran out of space and left the
> cluster. Is there a way to delete archived content to free up space? Can the
> unprocessed data be recovered somehow or do I have to delete the
> repositories and replay the data from our Kafka topic?
>
> I looked through the admin and user guides, but could not find anything on
> this topic.
>
> Here is the error I am getting:
>
> 2017-01-31 14:00:00,002 ERROR [Timer-Driven Process Thread-2]
> o.a.n.p.standard.RouteOnAttribute
> RouteOnAttribute[id=464e0e04-5acd-498d-1fc5-af23c1b86ccd] Failed to process
> session due to org.apache.nifi.processor.exception.ProcessException:
> FlowFile Repository failed to update:
> org.apache.nifi.processor.exception.ProcessException: FlowFile Repository
> failed to update
> 2017-01-31 14:00:00,002 ERROR [Timer-Driven Process Thread-9]
> o.a.n.p.standard.EvaluateJsonPath
> org.apache.nifi.processor.exception.ProcessException: FlowFile Repository
> failed to update
>        at
> org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:363)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:299)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28)
> ~[nifi-api-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064)
> [nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
> [nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_65]
>        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_65]
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_65]
>        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_65]
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_65]
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_65]
>        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> Caused by: java.io.IOException: All Partitions have been blacklisted due to
> failures when attempting to update. If the Write-Ahead Log is able to
> perform a checkpoint, this issue may resolve itself. Otherwise, manual
> intervention will be required.
>        at
> org.wali.MinimalLockingWriteAheadLog.update(MinimalLockingWriteAheadLog.java:212)
> ~[nifi-write-ahead-log-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:219)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:187)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>        at
> org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:357)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>        ... 13 common frames omitted
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Data-recovery-with-full-disk-tp14553.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Data recovery with full disk

bmichaud
This post was updated on .
Thanks for the help. Is it safe to do this while NiFi is running, or should it be shut down first?

Additionally, I am still finding many content files that are not in an archive directory from over a month ago that are clearly no longer needed on the disk. How do I reduce retention time?

Mark Payne wrote
Ben,

You can certainly delete archived content from the Content Repository. Assuming that you are
running Linux (or OSX or cygwin, etc.) you could do this by running something like:

find content_repository/ -type f -amin +600 | grep archive | xargs rm -f

That would delete anything from the Content Repository's archive that is older than 10 hours.
Once you've cleaned that up, you should be able to restart NiFI.

Note that this is assuming that your content repo and flowfile repo are on the same partition,
as otherwise this wouldn't help given that your flowfile repo is the repo that is complaining about
being out of disk space.
Reply | Threaded
Open this post in threaded view
|

Re: Data recovery with full disk

Mark Payne
It probably would be safe, but I'd recommend shutting down first if possible, because I can't
guarantee it.

Thanks
-Mark

On Feb 1, 2017, at 9:18 AM, bmichaud <[hidden email]<mailto:[hidden email]>> wrote:

Thanks for the help. Is it safe to do this while NiFi is running, or should
it be shut down first?


Mark Payne wrote
Ben,

You can certainly delete archived content from the Content Repository.
Assuming that you are
running Linux (or OSX or cygwin, etc.) you could do this by running
something like:

find content_repository/ -type f -amin +600 | grep archive | xargs rm -f

That would delete anything from the Content Repository's archive that is
older than 10 hours.
Once you've cleaned that up, you should be able to restart NiFI.

Note that this is assuming that your content repo and flowfile repo are on
the same partition,
as otherwise this wouldn't help given that your flowfile repo is the repo
that is complaining about
being out of disk space.





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Data-recovery-with-full-disk-tp14553p14561.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com<http://nabble.com/>.