"Flatten" JSON

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

"Flatten" JSON

Nicholas Hughes-2
Is there an easy way to "flatten" arbitrary JSON within NiFi?

For input data like that shown below from Yahoo [1]

{
  "query": {
    "count": 1,
    "created": "2017-09-15T11:20:26Z",
    "lang": "en-US",
    "results": {
      "channel": {
        "item": {
          "condition": {
            "code": "33",
            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
            "temp": "63",
            "text": "Mostly Clear"
          }
        }
      }
    }
  }
}


...I'd like to end up with output something like this:

{
  "query.count": 1,
  "query.created": "2017-09-15T11:20:26Z",
  "query.lang": "en-US",
  "query.results.channel.item.condition.code": "33",
  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
  "query.results.channel.item.condition.temp": "63",
  "query.results.channel.item.condition.text": "Mostly Clear"
}


I checked out the JoltTransformJSON processor and some examples, such as
the nested data to "prefix soup" demo [2], but it seems as though I need to
enter information about the schema for the incoming data in order to
transform it. Ideally, I'd like to have a processor "just figure it out"
without explicit entry of a schema.

Is there any way to accomplish this in a generic way with JoltTransformJSON
(or another native processor)?

If not, would a ticket requesting a "Field Flattener" processor much like
the one included in StreamSets Data Collector [3] be worthwhile?

Thanks in advance!

-Nick


[1]
https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

[2] http://jolt-demo.appspot.com/#bucketToPrefixSoup

[3]
https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener
Reply | Threaded
Open this post in threaded view
|

Re: "Flatten" JSON

Mark Payne
Nick,

I do believe that there's a way to do what you're asking with Jolt, without knowing any kind of schema.
That said, Jolt can get complex pretty quickly and I don't know it well :)  Personally, I have no problem with having a
FlattenRecord processor. I guess the question here, though, is are you using Record-oriented processors,
or are you using JSON-specific processors?

Personally, I'd like to see a FlattenRecord processor, rather than FlattenJSON, because that would allow
the transformation to apply to Avro as well (and as soon as we get an XML reader built, XML also). However,
the Record-oriented processors would expect that a schema be given (though it could also be inferred using
another existing processor).

-Mark



> On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <[hidden email]> wrote:
>
> Is there an easy way to "flatten" arbitrary JSON within NiFi?
>
> For input data like that shown below from Yahoo [1]
>
> {
>  "query": {
>    "count": 1,
>    "created": "2017-09-15T11:20:26Z",
>    "lang": "en-US",
>    "results": {
>      "channel": {
>        "item": {
>          "condition": {
>            "code": "33",
>            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
>            "temp": "63",
>            "text": "Mostly Clear"
>          }
>        }
>      }
>    }
>  }
> }
>
>
> ...I'd like to end up with output something like this:
>
> {
>  "query.count": 1,
>  "query.created": "2017-09-15T11:20:26Z",
>  "query.lang": "en-US",
>  "query.results.channel.item.condition.code": "33",
>  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
>  "query.results.channel.item.condition.temp": "63",
>  "query.results.channel.item.condition.text": "Mostly Clear"
> }
>
>
> I checked out the JoltTransformJSON processor and some examples, such as
> the nested data to "prefix soup" demo [2], but it seems as though I need to
> enter information about the schema for the incoming data in order to
> transform it. Ideally, I'd like to have a processor "just figure it out"
> without explicit entry of a schema.
>
> Is there any way to accomplish this in a generic way with JoltTransformJSON
> (or another native processor)?
>
> If not, would a ticket requesting a "Field Flattener" processor much like
> the one included in StreamSets Data Collector [3] be worthwhile?
>
> Thanks in advance!
>
> -Nick
>
>
> [1]
> https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
>
> [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
>
> [3]
> https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener

Reply | Threaded
Open this post in threaded view
|

Re: "Flatten" JSON

Nicholas Hughes-2
Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.

If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <[hidden email]> wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> [hidden email]> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >    "count": 1,
> >    "created": "2017-09-15T11:20:26Z",
> >    "lang": "en-US",
> >    "results": {
> >      "channel": {
> >        "item": {
> >          "condition": {
> >            "code": "33",
> >            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >            "temp": "63",
> >            "text": "Mostly Clear"
> >          }
> >        }
> >      }
> >    }
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>
Reply | Threaded
Open this post in threaded view
|

Re: "Flatten" JSON

Kevin Doran
+1 for adding a FlattenRecord processor. I can think of a few scenarios in which it would be quite useful, and it would be convenient if it could be accomplished without JOLT.

Thanks,
Kevin

On 9/15/17, 09:16, "Nicholas Hughes" <[hidden email] on behalf of [hidden email]> wrote:

    Mark,
   
    I'm definitely for making the processor as generic as possible. I don't
    mind chaining together a few simple processors to get a job done (such as
    convert JSON to Avro > infer schema > flatten records)... I just don't want
    steps get super complex... and the Jolt Transform processor does seem very
    powerful and very complex.
   
    If there's some support for a "FlattenRecord" processor, I can submit the
    Jira containing the meat of this thread.
   
    -Nick
   
   
    On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <[hidden email]> wrote:
   
    > Nick,
    >
    > I do believe that there's a way to do what you're asking with Jolt,
    > without knowing any kind of schema.
    > That said, Jolt can get complex pretty quickly and I don't know it well
    > :)  Personally, I have no problem with having a
    > FlattenRecord processor. I guess the question here, though, is are you
    > using Record-oriented processors,
    > or are you using JSON-specific processors?
    >
    > Personally, I'd like to see a FlattenRecord processor, rather than
    > FlattenJSON, because that would allow
    > the transformation to apply to Avro as well (and as soon as we get an XML
    > reader built, XML also). However,
    > the Record-oriented processors would expect that a schema be given (though
    > it could also be inferred using
    > another existing processor).
    >
    > -Mark
    >
    >
    >
    > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
    > [hidden email]> wrote:
    > >
    > > Is there an easy way to "flatten" arbitrary JSON within NiFi?
    > >
    > > For input data like that shown below from Yahoo [1]
    > >
    > > {
    > >  "query": {
    > >    "count": 1,
    > >    "created": "2017-09-15T11:20:26Z",
    > >    "lang": "en-US",
    > >    "results": {
    > >      "channel": {
    > >        "item": {
    > >          "condition": {
    > >            "code": "33",
    > >            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
    > >            "temp": "63",
    > >            "text": "Mostly Clear"
    > >          }
    > >        }
    > >      }
    > >    }
    > >  }
    > > }
    > >
    > >
    > > ...I'd like to end up with output something like this:
    > >
    > > {
    > >  "query.count": 1,
    > >  "query.created": "2017-09-15T11:20:26Z",
    > >  "query.lang": "en-US",
    > >  "query.results.channel.item.condition.code": "33",
    > >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
    > AM EDT",
    > >  "query.results.channel.item.condition.temp": "63",
    > >  "query.results.channel.item.condition.text": "Mostly Clear"
    > > }
    > >
    > >
    > > I checked out the JoltTransformJSON processor and some examples, such as
    > > the nested data to "prefix soup" demo [2], but it seems as though I need
    > to
    > > enter information about the schema for the incoming data in order to
    > > transform it. Ideally, I'd like to have a processor "just figure it out"
    > > without explicit entry of a schema.
    > >
    > > Is there any way to accomplish this in a generic way with
    > JoltTransformJSON
    > > (or another native processor)?
    > >
    > > If not, would a ticket requesting a "Field Flattener" processor much like
    > > the one included in StreamSets Data Collector [3] be worthwhile?
    > >
    > > Thanks in advance!
    > >
    > > -Nick
    > >
    > >
    > > [1]
    > > https://query.yahooapis.com/v1/public/yql?q=select%20item.
    > condition%20from%20weather.forecast%20where%20woeid%20%
    > 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%
    > 2Falltableswithkeys
    > >
    > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
    > >
    > > [3]
    > > https://github.com/streamsets/datacollector/tree/master/
    > basic-lib/src/main/java/com/streamsets/pipeline/stage/
    > processor/fieldflattener
    >
    >
   


Reply | Threaded
Open this post in threaded view
|

Re: "Flatten" JSON

Matt Burgess-2
+1 for FlattenRecord as well. In the meantime you can use
ExecuteScript or InvokeScriptedProcessor, I have a Groovy script
(albeit for a different product) that does the flatten [1].

Regards,
Matt

[1] http://funpdi.blogspot.com/2014/10/flatten-json-to-key-value-pairs-in-pdi.html

On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <[hidden email]> wrote:

> +1 for adding a FlattenRecord processor. I can think of a few scenarios in which it would be quite useful, and it would be convenient if it could be accomplished without JOLT.
>
> Thanks,
> Kevin
>
> On 9/15/17, 09:16, "Nicholas Hughes" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Mark,
>
>     I'm definitely for making the processor as generic as possible. I don't
>     mind chaining together a few simple processors to get a job done (such as
>     convert JSON to Avro > infer schema > flatten records)... I just don't want
>     steps get super complex... and the Jolt Transform processor does seem very
>     powerful and very complex.
>
>     If there's some support for a "FlattenRecord" processor, I can submit the
>     Jira containing the meat of this thread.
>
>     -Nick
>
>
>     On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <[hidden email]> wrote:
>
>     > Nick,
>     >
>     > I do believe that there's a way to do what you're asking with Jolt,
>     > without knowing any kind of schema.
>     > That said, Jolt can get complex pretty quickly and I don't know it well
>     > :)  Personally, I have no problem with having a
>     > FlattenRecord processor. I guess the question here, though, is are you
>     > using Record-oriented processors,
>     > or are you using JSON-specific processors?
>     >
>     > Personally, I'd like to see a FlattenRecord processor, rather than
>     > FlattenJSON, because that would allow
>     > the transformation to apply to Avro as well (and as soon as we get an XML
>     > reader built, XML also). However,
>     > the Record-oriented processors would expect that a schema be given (though
>     > it could also be inferred using
>     > another existing processor).
>     >
>     > -Mark
>     >
>     >
>     >
>     > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
>     > [hidden email]> wrote:
>     > >
>     > > Is there an easy way to "flatten" arbitrary JSON within NiFi?
>     > >
>     > > For input data like that shown below from Yahoo [1]
>     > >
>     > > {
>     > >  "query": {
>     > >    "count": 1,
>     > >    "created": "2017-09-15T11:20:26Z",
>     > >    "lang": "en-US",
>     > >    "results": {
>     > >      "channel": {
>     > >        "item": {
>     > >          "condition": {
>     > >            "code": "33",
>     > >            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
>     > >            "temp": "63",
>     > >            "text": "Mostly Clear"
>     > >          }
>     > >        }
>     > >      }
>     > >    }
>     > >  }
>     > > }
>     > >
>     > >
>     > > ...I'd like to end up with output something like this:
>     > >
>     > > {
>     > >  "query.count": 1,
>     > >  "query.created": "2017-09-15T11:20:26Z",
>     > >  "query.lang": "en-US",
>     > >  "query.results.channel.item.condition.code": "33",
>     > >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
>     > AM EDT",
>     > >  "query.results.channel.item.condition.temp": "63",
>     > >  "query.results.channel.item.condition.text": "Mostly Clear"
>     > > }
>     > >
>     > >
>     > > I checked out the JoltTransformJSON processor and some examples, such as
>     > > the nested data to "prefix soup" demo [2], but it seems as though I need
>     > to
>     > > enter information about the schema for the incoming data in order to
>     > > transform it. Ideally, I'd like to have a processor "just figure it out"
>     > > without explicit entry of a schema.
>     > >
>     > > Is there any way to accomplish this in a generic way with
>     > JoltTransformJSON
>     > > (or another native processor)?
>     > >
>     > > If not, would a ticket requesting a "Field Flattener" processor much like
>     > > the one included in StreamSets Data Collector [3] be worthwhile?
>     > >
>     > > Thanks in advance!
>     > >
>     > > -Nick
>     > >
>     > >
>     > > [1]
>     > > https://query.yahooapis.com/v1/public/yql?q=select%20item.
>     > condition%20from%20weather.forecast%20where%20woeid%20%
>     > 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%
>     > 2Falltableswithkeys
>     > >
>     > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
>     > >
>     > > [3]
>     > > https://github.com/streamsets/datacollector/tree/master/
>     > basic-lib/src/main/java/com/streamsets/pipeline/stage/
>     > processor/fieldflattener
>     >
>     >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: "Flatten" JSON

Nicholas Hughes-2
Created an issue for this functionality [1]. Please change issue properties
and comment as necessary.

-Nick

[1] https://issues.apache.org/jira/browse/NIFI-4398


On Sat, Sep 16, 2017 at 4:55 PM, Matt Burgess <[hidden email]> wrote:

> +1 for FlattenRecord as well. In the meantime you can use
> ExecuteScript or InvokeScriptedProcessor, I have a Groovy script
> (albeit for a different product) that does the flatten [1].
>
> Regards,
> Matt
>
> [1] http://funpdi.blogspot.com/2014/10/flatten-json-to-key-
> value-pairs-in-pdi.html
>
> On Fri, Sep 15, 2017 at 9:33 AM, Kevin Doran <[hidden email]>
> wrote:
> > +1 for adding a FlattenRecord processor. I can think of a few scenarios
> in which it would be quite useful, and it would be convenient if it could
> be accomplished without JOLT.
> >
> > Thanks,
> > Kevin
> >
> > On 9/15/17, 09:16, "Nicholas Hughes" <[hidden email] on
> behalf of [hidden email]> wrote:
> >
> >     Mark,
> >
> >     I'm definitely for making the processor as generic as possible. I
> don't
> >     mind chaining together a few simple processors to get a job done
> (such as
> >     convert JSON to Avro > infer schema > flatten records)... I just
> don't want
> >     steps get super complex... and the Jolt Transform processor does
> seem very
> >     powerful and very complex.
> >
> >     If there's some support for a "FlattenRecord" processor, I can
> submit the
> >     Jira containing the meat of this thread.
> >
> >     -Nick
> >
> >
> >     On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne <[hidden email]>
> wrote:
> >
> >     > Nick,
> >     >
> >     > I do believe that there's a way to do what you're asking with Jolt,
> >     > without knowing any kind of schema.
> >     > That said, Jolt can get complex pretty quickly and I don't know it
> well
> >     > :)  Personally, I have no problem with having a
> >     > FlattenRecord processor. I guess the question here, though, is are
> you
> >     > using Record-oriented processors,
> >     > or are you using JSON-specific processors?
> >     >
> >     > Personally, I'd like to see a FlattenRecord processor, rather than
> >     > FlattenJSON, because that would allow
> >     > the transformation to apply to Avro as well (and as soon as we get
> an XML
> >     > reader built, XML also). However,
> >     > the Record-oriented processors would expect that a schema be given
> (though
> >     > it could also be inferred using
> >     > another existing processor).
> >     >
> >     > -Mark
> >     >
> >     >
> >     >
> >     > > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> >     > [hidden email]> wrote:
> >     > >
> >     > > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >     > >
> >     > > For input data like that shown below from Yahoo [1]
> >     > >
> >     > > {
> >     > >  "query": {
> >     > >    "count": 1,
> >     > >    "created": "2017-09-15T11:20:26Z",
> >     > >    "lang": "en-US",
> >     > >    "results": {
> >     > >      "channel": {
> >     > >        "item": {
> >     > >          "condition": {
> >     > >            "code": "33",
> >     > >            "date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >     > >            "temp": "63",
> >     > >            "text": "Mostly Clear"
> >     > >          }
> >     > >        }
> >     > >      }
> >     > >    }
> >     > >  }
> >     > > }
> >     > >
> >     > >
> >     > > ...I'd like to end up with output something like this:
> >     > >
> >     > > {
> >     > >  "query.count": 1,
> >     > >  "query.created": "2017-09-15T11:20:26Z",
> >     > >  "query.lang": "en-US",
> >     > >  "query.results.channel.item.condition.code": "33",
> >     > >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017
> 06:00
> >     > AM EDT",
> >     > >  "query.results.channel.item.condition.temp": "63",
> >     > >  "query.results.channel.item.condition.text": "Mostly Clear"
> >     > > }
> >     > >
> >     > >
> >     > > I checked out the JoltTransformJSON processor and some examples,
> such as
> >     > > the nested data to "prefix soup" demo [2], but it seems as
> though I need
> >     > to
> >     > > enter information about the schema for the incoming data in
> order to
> >     > > transform it. Ideally, I'd like to have a processor "just figure
> it out"
> >     > > without explicit entry of a schema.
> >     > >
> >     > > Is there any way to accomplish this in a generic way with
> >     > JoltTransformJSON
> >     > > (or another native processor)?
> >     > >
> >     > > If not, would a ticket requesting a "Field Flattener" processor
> much like
> >     > > the one included in StreamSets Data Collector [3] be worthwhile?
> >     > >
> >     > > Thanks in advance!
> >     > >
> >     > > -Nick
> >     > >
> >     > >
> >     > > [1]
> >     > > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> >     > condition%20from%20weather.forecast%20where%20woeid%20%
> >     > 3D%202383558&format=json&env=store%3A%2F%2Fdatatables.org%
> >     > 2Falltableswithkeys
> >     > >
> >     > > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >     > >
> >     > > [3]
> >     > > https://github.com/streamsets/datacollector/tree/master/
> >     > basic-lib/src/main/java/com/streamsets/pipeline/stage/
> >     > processor/fieldflattener
> >     >
> >     >
> >
> >
> >
>