Recovering corrupt flowfile repo

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Recovering corrupt flowfile repo

Joe Gresock
Hi,

I'm trying to restore a corrupt flowfile repo using these instructions:
https://community.hortonworks.com/content/supportkb/149943/errorjavaioioexception-expected-to-read-a-sentinel.html

I get basically the same error noted on that post.  However, the
instructions reference a jar called

nifi-toolkit-flowfile-repo-1.2.0-SNAPSHOT-jar-with-dependencies.jar

I downloaded nifi-toolkit 1.6.0 (since that's the nifi version I'm
using), and I see a nifi-toolkit-flowanalyzer-1.6.0.jar,
but nothing called nifi-toolkit-flowfile-repo.

Does anyone know where this tool lives now?

Thanks,

Joe


--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*
Reply | Threaded
Open this post in threaded view
|

Re: Recovering corrupt flowfile repo

Joe Gresock
Ok, I finally found the jar and got it working using java -cp instead of
java -jar.  However, I suspect this procedure might not meet my particular
case, because it's looking for parition-* directories, and I don't have
those.  Perhaps the flowfile repo implementation I'm using is different
from the one for which that tool (
https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-toolkit/nifi-toolkit-flowfile-repo/src/main/java/org/apache/nifi/toolkit/repos/flowfile/RepairCorruptedFileEndings.java)
applies:

# My configuration
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog

So, here is the actual error I'm getting, in case anyone can help me work
through it.  I don't want to give up on the flowfile repo, because this was
production data.

2018-11-20 20:07:53,901 INFO [main]
o.a.n.wali.SequentialAccessWriteAheadLog Successfully recovered 249151
records and 10 swap files from Snapshot at
/data/nifi/flowfile_repository/checkpoint with Max Transaction ID of
4233134189 in 8593 milliseconds. Now recovering records from 1 journal files
2018-11-20 20:07:53,907 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
Recovering records from journal
/data/nifi/flowfile_repository/journals/4233134190.journal
2018-11-20 20:07:54,592 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
6.78% of the way finished recovering journal
/data/nifi/flowfile_repository/journals/4233134190.journal, having
recovered 15845 updates

... skipping some lines

018-11-20 20:08:05,363 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
88.09% of the way finished recovering journal
/data/nifi/flowfile_repository/journals/4233134190.journal, having
recovered 322778 updates
2018-11-20 20:08:06,054 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
94.89% of the way finished recovering journal
/data/nifi/flowfile_repository/journals/4233134190.journal, having
recovered 352084 updates
2018-11-20 20:08:06,576 ERROR [main]
o.a.nifi.controller.StandardFlowService Failed to load flow from cluster
due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node
to cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
'1' but got a value of '64' instead
org.apache.nifi.cluster.ConnectionException: Failed to connect node to
cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
'1' but got a value of '64' instead
        at
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:946)
        at
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:516)
        at
org.apache.nifi.web.server.JettyServer.start(JettyServer.java:872)
        at org.apache.nifi.NiFi.<init>(NiFi.java:157)
        at org.apache.nifi.NiFi.<init>(NiFi.java:71)
        at org.apache.nifi.NiFi.main(NiFi.java:292)
Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1' but
got a value of '64' instead
        at
org.apache.nifi.repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:65)
        at
org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeRecord(SchemaRepositoryRecordSerde.java:124)
        at
org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:109)
        at
org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:46)
        at
org.apache.nifi.wali.LengthDelimitedJournal.recoverRecords(LengthDelimitedJournal.java:335)
        at
org.apache.nifi.wali.SequentialAccessWriteAheadLog.recoverRecords(SequentialAccessWriteAheadLog.java:198)
        at
org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:545)
        at
org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:746)
        at
org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:956)
        at
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:928)
        ... 5 common frames omitted
2018-11-20 20:08:06,576 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator
ip-172-31-55-35.ec2.internal:8443 requested disconnection from cluster due
to org.apache.nifi.cluster.ConnectionException: Failed to connect node to
cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
'1' but got a value of '64' instead

Thanks,
Joe


On Tue, Nov 20, 2018 at 8:18 PM Joe Gresock <[hidden email]> wrote:

> Hi,
>
> I'm trying to restore a corrupt flowfile repo using these instructions:
> https://community.hortonworks.com/content/supportkb/149943/errorjavaioioexception-expected-to-read-a-sentinel.html
>
> I get basically the same error noted on that post.  However, the
> instructions reference a jar called
>
> nifi-toolkit-flowfile-repo-1.2.0-SNAPSHOT-jar-with-dependencies.jar
>
> I downloaded nifi-toolkit 1.6.0 (since that's the nifi version I'm using), and I see a nifi-toolkit-flowanalyzer-1.6.0.jar,
> but nothing called nifi-toolkit-flowfile-repo.
>
> Does anyone know where this tool lives now?
>
> Thanks,
>
> Joe
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
>


--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*
Reply | Threaded
Open this post in threaded view
|

Re: Recovering corrupt flowfile repo

Joe Gresock
Ok... I have an interesting update.

I was able to remotely debug the server with the corrupt ff repo.  I set a
breakpoint at the line where the above exception was thrown, and then I
used Eclipse to manually "introspect" the following code:

while(sentinelByte != 1) { sentinelByte = in.read(); }

I had to do this several times, but after each "fast forwarding" of the
input stream, it read some actual WALI update records and was able to
successfully recover the repo.  At this point, the repo seems to be fixed,
but I have no idea what effect my shenanigans had on it.

On Wed, Nov 21, 2018 at 12:26 AM Joe Gresock <[hidden email]> wrote:

> Ok, I finally found the jar and got it working using java -cp instead of
> java -jar.  However, I suspect this procedure might not meet my particular
> case, because it's looking for parition-* directories, and I don't have
> those.  Perhaps the flowfile repo implementation I'm using is different
> from the one for which that tool (
> https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-toolkit/nifi-toolkit-flowfile-repo/src/main/java/org/apache/nifi/toolkit/repos/flowfile/RepairCorruptedFileEndings.java)
> applies:
>
> # My configuration
>
> nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
>
> nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
>
> So, here is the actual error I'm getting, in case anyone can help me work
> through it.  I don't want to give up on the flowfile repo, because this was
> production data.
>
> 2018-11-20 20:07:53,901 INFO [main]
> o.a.n.wali.SequentialAccessWriteAheadLog Successfully recovered 249151
> records and 10 swap files from Snapshot at
> /data/nifi/flowfile_repository/checkpoint with Max Transaction ID of
> 4233134189 in 8593 milliseconds. Now recovering records from 1 journal files
> 2018-11-20 20:07:53,907 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> Recovering records from journal
> /data/nifi/flowfile_repository/journals/4233134190.journal
> 2018-11-20 20:07:54,592 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> 6.78% of the way finished recovering journal
> /data/nifi/flowfile_repository/journals/4233134190.journal, having
> recovered 15845 updates
>
> ... skipping some lines
>
> 018-11-20 20:08:05,363 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> 88.09% of the way finished recovering journal
> /data/nifi/flowfile_repository/journals/4233134190.journal, having
> recovered 322778 updates
> 2018-11-20 20:08:06,054 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> 94.89% of the way finished recovering journal
> /data/nifi/flowfile_repository/journals/4233134190.journal, having
> recovered 352084 updates
> 2018-11-20 20:08:06,576 ERROR [main]
> o.a.nifi.controller.StandardFlowService Failed to load flow from cluster
> due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node
> to cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> '1' but got a value of '64' instead
> org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> '1' but got a value of '64' instead
>         at
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:946)
>         at
> org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:516)
>         at
> org.apache.nifi.web.server.JettyServer.start(JettyServer.java:872)
>         at org.apache.nifi.NiFi.<init>(NiFi.java:157)
>         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
>         at org.apache.nifi.NiFi.main(NiFi.java:292)
> Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1'
> but got a value of '64' instead
>         at
> org.apache.nifi.repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:65)
>         at
> org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeRecord(SchemaRepositoryRecordSerde.java:124)
>         at
> org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:109)
>         at
> org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:46)
>         at
> org.apache.nifi.wali.LengthDelimitedJournal.recoverRecords(LengthDelimitedJournal.java:335)
>         at
> org.apache.nifi.wali.SequentialAccessWriteAheadLog.recoverRecords(SequentialAccessWriteAheadLog.java:198)
>         at
> org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:545)
>         at
> org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:746)
>         at
> org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:956)
>         at
> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:928)
>         ... 5 common frames omitted
> 2018-11-20 20:08:06,576 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator
> ip-172-31-55-35.ec2.internal:8443 requested disconnection from cluster due
> to org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> '1' but got a value of '64' instead
>
> Thanks,
> Joe
>
>
> On Tue, Nov 20, 2018 at 8:18 PM Joe Gresock <[hidden email]> wrote:
>
>> Hi,
>>
>> I'm trying to restore a corrupt flowfile repo using these instructions:
>> https://community.hortonworks.com/content/supportkb/149943/errorjavaioioexception-expected-to-read-a-sentinel.html
>>
>> I get basically the same error noted on that post.  However, the
>> instructions reference a jar called
>>
>> nifi-toolkit-flowfile-repo-1.2.0-SNAPSHOT-jar-with-dependencies.jar
>>
>> I downloaded nifi-toolkit 1.6.0 (since that's the nifi version I'm using), and I see a nifi-toolkit-flowanalyzer-1.6.0.jar,
>> but nothing called nifi-toolkit-flowfile-repo.
>>
>> Does anyone know where this tool lives now?
>>
>> Thanks,
>>
>> Joe
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
>


--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*
Reply | Threaded
Open this post in threaded view
|

Re: Recovering corrupt flowfile repo

Joe Witt
joe

unfortunately i think the only real way to assist in debugging what
happened/what this is I think the actual repo/bytes would need to be
shared.  Not sure that would be feasible here or if you can reproduce
(assuming this is a straight apache nifi version).

Thanks
On Wed, Nov 21, 2018 at 1:04 PM Joe Gresock <[hidden email]> wrote:

>
> Ok... I have an interesting update.
>
> I was able to remotely debug the server with the corrupt ff repo.  I set a
> breakpoint at the line where the above exception was thrown, and then I
> used Eclipse to manually "introspect" the following code:
>
> while(sentinelByte != 1) { sentinelByte = in.read(); }
>
> I had to do this several times, but after each "fast forwarding" of the
> input stream, it read some actual WALI update records and was able to
> successfully recover the repo.  At this point, the repo seems to be fixed,
> but I have no idea what effect my shenanigans had on it.
>
> On Wed, Nov 21, 2018 at 12:26 AM Joe Gresock <[hidden email]> wrote:
>
> > Ok, I finally found the jar and got it working using java -cp instead of
> > java -jar.  However, I suspect this procedure might not meet my particular
> > case, because it's looking for parition-* directories, and I don't have
> > those.  Perhaps the flowfile repo implementation I'm using is different
> > from the one for which that tool (
> > https://github.com/apache/nifi/blob/rel/nifi-1.6.0/nifi-toolkit/nifi-toolkit-flowfile-repo/src/main/java/org/apache/nifi/toolkit/repos/flowfile/RepairCorruptedFileEndings.java)
> > applies:
> >
> > # My configuration
> >
> > nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
> >
> > nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
> >
> > So, here is the actual error I'm getting, in case anyone can help me work
> > through it.  I don't want to give up on the flowfile repo, because this was
> > production data.
> >
> > 2018-11-20 20:07:53,901 INFO [main]
> > o.a.n.wali.SequentialAccessWriteAheadLog Successfully recovered 249151
> > records and 10 swap files from Snapshot at
> > /data/nifi/flowfile_repository/checkpoint with Max Transaction ID of
> > 4233134189 in 8593 milliseconds. Now recovering records from 1 journal files
> > 2018-11-20 20:07:53,907 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> > Recovering records from journal
> > /data/nifi/flowfile_repository/journals/4233134190.journal
> > 2018-11-20 20:07:54,592 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> > 6.78% of the way finished recovering journal
> > /data/nifi/flowfile_repository/journals/4233134190.journal, having
> > recovered 15845 updates
> >
> > ... skipping some lines
> >
> > 018-11-20 20:08:05,363 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> > 88.09% of the way finished recovering journal
> > /data/nifi/flowfile_repository/journals/4233134190.journal, having
> > recovered 322778 updates
> > 2018-11-20 20:08:06,054 INFO [main] o.a.nifi.wali.LengthDelimitedJournal
> > 94.89% of the way finished recovering journal
> > /data/nifi/flowfile_repository/journals/4233134190.journal, having
> > recovered 352084 updates
> > 2018-11-20 20:08:06,576 ERROR [main]
> > o.a.nifi.controller.StandardFlowService Failed to load flow from cluster
> > due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node
> > to cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> > '1' but got a value of '64' instead
> > org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> > cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> > '1' but got a value of '64' instead
> >         at
> > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:946)
> >         at
> > org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:516)
> >         at
> > org.apache.nifi.web.server.JettyServer.start(JettyServer.java:872)
> >         at org.apache.nifi.NiFi.<init>(NiFi.java:157)
> >         at org.apache.nifi.NiFi.<init>(NiFi.java:71)
> >         at org.apache.nifi.NiFi.main(NiFi.java:292)
> > Caused by: java.io.IOException: Expected to read a Sentinel Byte of '1'
> > but got a value of '64' instead
> >         at
> > org.apache.nifi.repository.schema.SchemaRecordReader.readRecord(SchemaRecordReader.java:65)
> >         at
> > org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeRecord(SchemaRepositoryRecordSerde.java:124)
> >         at
> > org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:109)
> >         at
> > org.apache.nifi.controller.repository.SchemaRepositoryRecordSerde.deserializeEdit(SchemaRepositoryRecordSerde.java:46)
> >         at
> > org.apache.nifi.wali.LengthDelimitedJournal.recoverRecords(LengthDelimitedJournal.java:335)
> >         at
> > org.apache.nifi.wali.SequentialAccessWriteAheadLog.recoverRecords(SequentialAccessWriteAheadLog.java:198)
> >         at
> > org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.loadFlowFiles(WriteAheadFlowFileRepository.java:545)
> >         at
> > org.apache.nifi.controller.FlowController.initializeFlow(FlowController.java:746)
> >         at
> > org.apache.nifi.controller.StandardFlowService.initializeController(StandardFlowService.java:956)
> >         at
> > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:928)
> >         ... 5 common frames omitted
> > 2018-11-20 20:08:06,576 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator
> > ip-172-31-55-35.ec2.internal:8443 requested disconnection from cluster due
> > to org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> > cluster due to: java.io.IOException: Expected to read a Sentinel Byte of
> > '1' but got a value of '64' instead
> >
> > Thanks,
> > Joe
> >
> >
> > On Tue, Nov 20, 2018 at 8:18 PM Joe Gresock <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I'm trying to restore a corrupt flowfile repo using these instructions:
> >> https://community.hortonworks.com/content/supportkb/149943/errorjavaioioexception-expected-to-read-a-sentinel.html
> >>
> >> I get basically the same error noted on that post.  However, the
> >> instructions reference a jar called
> >>
> >> nifi-toolkit-flowfile-repo-1.2.0-SNAPSHOT-jar-with-dependencies.jar
> >>
> >> I downloaded nifi-toolkit 1.6.0 (since that's the nifi version I'm using), and I see a nifi-toolkit-flowanalyzer-1.6.0.jar,
> >> but nothing called nifi-toolkit-flowfile-repo.
> >>
> >> Does anyone know where this tool lives now?
> >>
> >> Thanks,
> >>
> >> Joe
> >>
> >>
> >> --
> >> I know what it is to be in need, and I know what it is to have plenty.  I
> >> have learned the secret of being content in any and every situation,
> >> whether well fed or hungry, whether living in plenty or in want.  I can
> >> do all this through him who gives me strength.    *-Philippians 4:12-13*
> >>
> >
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> > do all this through him who gives me strength.    *-Philippians 4:12-13*
> >
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*