AvroRecordSetWriter seems unable to deal with certain schemas

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

AvroRecordSetWriter seems unable to deal with certain schemas

Daniel Solow
Hi,

I'm trying to manipulate record-based avro data in Nifi, and I'm getting
consistent errors. Here's a simple schema that illustrates the problem for
me: https://gist.github.com/dmsolow/13992482534eb0b23de94a385fe999e8

The schema has a root record, with a single field. The field's type is an
array with union type items. The union type contains two different record
types as sub-types, with different fields. My understanding is that this is
perfectly acceptable in an avro schema, and that unions can support any
number of named members.

However when I try to process a record written using this schema with
AvroReader and AvroRecordSetWriter, I see an error on the write side.
Here's the log message:

2018-10-20 16:26:07,091 ERROR [Timer-Driven Process Thread-3]

> o.a.n.processors.standard.ValidateRecord
> ValidateRecord[id=01651155-d17b-1b8a-56a7-d4dc64b64499] Failed to process
> StandardFlowFileRecord[uuid=9b7e7b3b-286a-4f91-b6f1-da3ca355ffcd,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1540052765488-169441,
> container=default, section=481], offset=277006,
> length=345],offset=0,name=2513241609759835,size=345]; will route to
> failure: org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: null of string in field f1 of left of union
> of array in field children of root
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.NullPointerException: null of string in field f1 of left of union
> of array in field children of root
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
> at
> org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
> at
> org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
> at
> org.apache.nifi.processors.standard.ValidateRecord.onTrigger(ValidateRecord.java:344)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: null of string in field f1 of
> left of union of array in field children of root
> at
> org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
> ... 14 common frames omitted
> Caused by: java.lang.NullPointerException: null
> at org.apache.avro.io.Encoder.writeString(Encoder.java:121)
> at
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:254)
> at
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:249)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:115)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:112)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:179)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:107)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> ... 17 common frames omitted


Here's the JSON representation of the record that's failing:

{
"children" : [ {
"left" : {
"f1" : "a"
}
}, {
"right" : {
"f2" : 2
}
} ]
}

   - The schema access method does not matter (I've tried several different
   methods and they all result in the same error).
   - Notably I'm able to convert the record to JSON using a
   JSONRecordSetWriter. I'm also able to read it using third party tools (like
   avro-cli), so I think it's valid avro.
   - I realize that this is likely an issue with the Apache Avro library,
   but I thought I'd check with the Nifi community first since I'm using Nifi.
   - In the github gist at the top I've included a base64 encoded avro
   record (with embedded schema) if anyone wants to try to reproduce the
   problem.

I realize from the stack trace that this is probably caused by the writer
being unable to resolve which record type is being written (it seems like
it's trying to write "left" when it should be writing "right"). My
understanding is that the generic avro classes should support this kind of
resolution however (the generic avro record object has a method to get the
schema).

Any guidance is appreciated.

Thanks,
Daniel
Reply | Threaded
Open this post in threaded view
|

Re: AvroRecordSetWriter seems unable to deal with certain schemas

Mike Thomsen
That doesn't look like a properly formed Avro union type. See example:
https://avro.apache.org/docs/current/spec.html#Unions

You may want to take this over to the Avro mailing list because it's a
syntax issue for Avro, not a record API issue.

Mike

On Sat, Oct 20, 2018 at 12:39 PM Daniel Solow <[hidden email]> wrote:

> Hi,
>
> I'm trying to manipulate record-based avro data in Nifi, and I'm getting
> consistent errors. Here's a simple schema that illustrates the problem for
> me: https://gist.github.com/dmsolow/13992482534eb0b23de94a385fe999e8
>
> The schema has a root record, with a single field. The field's type is an
> array with union type items. The union type contains two different record
> types as sub-types, with different fields. My understanding is that this is
> perfectly acceptable in an avro schema, and that unions can support any
> number of named members.
>
> However when I try to process a record written using this schema with
> AvroReader and AvroRecordSetWriter, I see an error on the write side.
> Here's the log message:
>
> 2018-10-20 16:26:07,091 ERROR [Timer-Driven Process Thread-3]
> > o.a.n.processors.standard.ValidateRecord
> > ValidateRecord[id=01651155-d17b-1b8a-56a7-d4dc64b64499] Failed to process
> >
> StandardFlowFileRecord[uuid=9b7e7b3b-286a-4f91-b6f1-da3ca355ffcd,claim=StandardContentClaim
> > [resourceClaim=StandardResourceClaim[id=1540052765488-169441,
> > container=default, section=481], offset=277006,
> > length=345],offset=0,name=2513241609759835,size=345]; will route to
> > failure: org.apache.avro.file.DataFileWriter$AppendWriteException:
> > java.lang.NullPointerException: null of string in field f1 of left of
> union
> > of array in field children of root
> > org.apache.avro.file.DataFileWriter$AppendWriteException:
> > java.lang.NullPointerException: null of string in field f1 of left of
> union
> > of array in field children of root
> > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
> > at
> >
> org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
> > at
> >
> org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
> > at
> >
> org.apache.nifi.processors.standard.ValidateRecord.onTrigger(ValidateRecord.java:344)
> > at
> >
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> > at
> >
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
> > at
> >
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
> > at
> >
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.lang.NullPointerException: null of string in field f1 of
> > left of union of array in field children of root
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
> > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
> > ... 14 common frames omitted
> > Caused by: java.lang.NullPointerException: null
> > at org.apache.avro.io.Encoder.writeString(Encoder.java:121)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:254)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:249)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:115)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:112)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:179)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:107)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
> > at
> >
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
> > ... 17 common frames omitted
>
>
> Here's the JSON representation of the record that's failing:
>
> {
> "children" : [ {
> "left" : {
> "f1" : "a"
> }
> }, {
> "right" : {
> "f2" : 2
> }
> } ]
> }
>
>    - The schema access method does not matter (I've tried several different
>    methods and they all result in the same error).
>    - Notably I'm able to convert the record to JSON using a
>    JSONRecordSetWriter. I'm also able to read it using third party tools
> (like
>    avro-cli), so I think it's valid avro.
>    - I realize that this is likely an issue with the Apache Avro library,
>    but I thought I'd check with the Nifi community first since I'm using
> Nifi.
>    - In the github gist at the top I've included a base64 encoded avro
>    record (with embedded schema) if anyone wants to try to reproduce the
>    problem.
>
> I realize from the stack trace that this is probably caused by the writer
> being unable to resolve which record type is being written (it seems like
> it's trying to write "left" when it should be writing "right"). My
> understanding is that the generic avro classes should support this kind of
> resolution however (the generic avro record object has a method to get the
> schema).
>
> Any guidance is appreciated.
>
> Thanks,
> Daniel
>