JSON / Avro issues

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

JSON / Avro issues

Cerulean Blue
I built a simple flow that reads a tab separated file and attempts to convert to Avro.

ConvertCSVtoAvro just says that the conversion failed.  

Where can I find more information on what the failure was?

Using the same sample tab separated file, I create a JSON file out of it.  

The JSON to Avro processor also fails with very little explication.


With regard to the ConvertCSVtoAvro processor
        Since my file is tab  delimited, do I simple open the "CSV delimiter” property, delete , and hit the tab key or is there a special syntax like ^t?
        My data has no CSV quote character so do I leave this as “or delete it or check the empty box?  

With regard to the ConvertJSONtoAvro
        What is the expected JSON source file to look like?
                [
                 {fields values … },
                 {fields values …}
                ]
        Or
                 {fields values … }
                 {fields values …}
        or something else.

Thanks,

Sorry for send this to 2 lists
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

Bryan Bende
Jeff,

Are you using the 0.3.0 release?

I think this is the issue you ran into which is resolved for the next
release:
https://issues.apache.org/jira/browse/NIFI-944

With regards to ConvertJSONtoAvro, I believe it one json document per line
with a new line at the end of each line (your second example).

-Bryan

On Thu, Nov 5, 2015 at 4:59 PM, Jeff <[hidden email]> wrote:

> I built a simple flow that reads a tab separated file and attempts to
> convert to Avro.
>
> ConvertCSVtoAvro just says that the conversion failed.
>
> Where can I find more information on what the failure was?
>
> Using the same sample tab separated file, I create a JSON file out of it.
>
> The JSON to Avro processor also fails with very little explication.
>
>
> With regard to the ConvertCSVtoAvro processor
>         Since my file is tab  delimited, do I simple open the "CSV
> delimiter” property, delete , and hit the tab key or is there a special
> syntax like ^t?
>         My data has no CSV quote character so do I leave this as “or
> delete it or check the empty box?
>
> With regard to the ConvertJSONtoAvro
>         What is the expected JSON source file to look like?
>                 [
>                  {fields values … },
>                  {fields values …}
>                 ]
>         Or
>                  {fields values … }
>                  {fields values …}
>         or something else.
>
> Thanks,
>
> Sorry for send this to 2 lists
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

Cerulean Blue
I'm using a snapshot built yesterday.  

Thanks


Sent from my iPhone

> On Nov 5, 2015, at 4:19 PM, Bryan Bende <[hidden email]> wrote:
>
> Jeff,
>
> Are you using the 0.3.0 release?
>
> I think this is the issue you ran into which is resolved for the next release:
> https://issues.apache.org/jira/browse/NIFI-944
>
> With regards to ConvertJSONtoAvro, I believe it one json document per line with a new line at the end of each line (your second example).
>
> -Bryan
>
>> On Thu, Nov 5, 2015 at 4:59 PM, Jeff <[hidden email]> wrote:
>> I built a simple flow that reads a tab separated file and attempts to convert to Avro.
>>
>> ConvertCSVtoAvro just says that the conversion failed.
>>
>> Where can I find more information on what the failure was?
>>
>> Using the same sample tab separated file, I create a JSON file out of it.
>>
>> The JSON to Avro processor also fails with very little explication.
>>
>>
>> With regard to the ConvertCSVtoAvro processor
>>         Since my file is tab  delimited, do I simple open the "CSV delimiter” property, delete , and hit the tab key or is there a special syntax like ^t?
>>         My data has no CSV quote character so do I leave this as “or delete it or check the empty box?
>>
>> With regard to the ConvertJSONtoAvro
>>         What is the expected JSON source file to look like?
>>                 [
>>                  {fields values … },
>>                  {fields values …}
>>                 ]
>>         Or
>>                  {fields values … }
>>                  {fields values …}
>>         or something else.
>>
>> Thanks,
>>
>> Sorry for send this to 2 lists
>
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

trkurc
Administrator
Presuming it is off a recent commit, you should be able to read a delimited
tab file using "\t" as the delimiter. There should be a dropdown that will
allow you to choose ARRAY or NONE as a JSON container option, which would
toggle between the two JSON representations you described.

On Thu, Nov 5, 2015 at 10:24 PM, Cerulean Blue <[hidden email]> wrote:

> I'm using a snapshot built yesterday.
>
> Thanks
>
>
> Sent from my iPhone
>
> > On Nov 5, 2015, at 4:19 PM, Bryan Bende <[hidden email]> wrote:
> >
> > Jeff,
> >
> > Are you using the 0.3.0 release?
> >
> > I think this is the issue you ran into which is resolved for the next
> release:
> > https://issues.apache.org/jira/browse/NIFI-944
> >
> > With regards to ConvertJSONtoAvro, I believe it one json document per
> line with a new line at the end of each line (your second example).
> >
> > -Bryan
> >
> >> On Thu, Nov 5, 2015 at 4:59 PM, Jeff <[hidden email]> wrote:
> >> I built a simple flow that reads a tab separated file and attempts to
> convert to Avro.
> >>
> >> ConvertCSVtoAvro just says that the conversion failed.
> >>
> >> Where can I find more information on what the failure was?
> >>
> >> Using the same sample tab separated file, I create a JSON file out of
> it.
> >>
> >> The JSON to Avro processor also fails with very little explication.
> >>
> >>
> >> With regard to the ConvertCSVtoAvro processor
> >>         Since my file is tab  delimited, do I simple open the "CSV
> delimiter” property, delete , and hit the tab key or is there a special
> syntax like ^t?
> >>         My data has no CSV quote character so do I leave this as “or
> delete it or check the empty box?
> >>
> >> With regard to the ConvertJSONtoAvro
> >>         What is the expected JSON source file to look like?
> >>                 [
> >>                  {fields values … },
> >>                  {fields values …}
> >>                 ]
> >>         Or
> >>                  {fields values … }
> >>                  {fields values …}
> >>         or something else.
> >>
> >> Thanks,
> >>
> >> Sorry for send this to 2 lists
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

ianwork
trkurc, Am i missing something, I do not see the functionality to toggle between the two json representations in the latest build of jsontoavro processor?

Ian
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

trkurc
Administrator
Ian,
Excellent catch, I was referring to the ConvertAvroToJSON processor, which
can EMIT json with either representation, which is obvious in retrospect
*not* what was being asked:

http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.avro.ConvertAvroToJSON/index.html

On Thu, Dec 17, 2015 at 11:25 AM, ianwork <[hidden email]> wrote:

> trkurc, Am i missing something, I do not see the functionality to toggle
> between the two json representations in the latest build of jsontoavro
> processor?
>
> Ian
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/JSON-Avro-issues-tp3923p5828.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: JSON / Avro issues

Ryan Blue
In reply to this post by Cerulean Blue
Jeff, I've answered inline. Thanks for using the processor, sorry it
isn't clear what's happening.

rb

On 11/05/2015 01:59 PM, Jeff wrote:
> I built a simple flow that reads a tab separated file and attempts to convert to Avro.
>
> ConvertCSVtoAvro just says that the conversion failed.
>
> Where can I find more information on what the failure was?

Information about failures is added to the "errors" attribute on files
emitted to the failure relationship. Unfortunately, right now the files
aren't filtered to just the failed rows. That's something we need to
fix, but it does accumulate error messages so you get something like:

   "NumberFormatException: 'turkey' is not an integer (1,234 similar
errors)"

> Using the same sample tab separated file, I create a JSON file out of it.
>
> The JSON to Avro processor also fails with very little explication.

These processors are basically the same on the inside. :)

Same place for errors. I think the problem is likely that some of the
values are failing to convert to the Avro type you've selected.

>
> With regard to the ConvertCSVtoAvro processor
> Since my file is tab  delimited, do I simple open the "CSV delimiter” property, delete , and hit the tab key or is there a special syntax like ^t?
> My data has no CSV quote character so do I leave this as “or delete it or check the empty box?

This could definitely be a problem. The delimiter is what you want. It
works with both a tab character (I usually paste it in since the browser
uses it as a movement key) and with \t, though I think there's a bug
where you can't have 2-character delimiters in the validation. I should
fix that.

> With regard to the ConvertJSONtoAvro
> What is the expected JSON source file to look like?
> [
> {fields values … },
> {fields values …}
> ]
> Or
> {fields values … }
> {fields values …}
> or something else.

This should be the second case. the JSON to Avro processor can't handle
JSON lists as the root just yet. You should simply concatenate JSON. The
whitespace doesn't matter.

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.