"One to many" provenance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

"One to many" provenance

Phil H
Hi team,

If I have a processor that takes one input flow file, and then generates many flow file outputs as a result (say, one output per line from a multi-line input file), how do I indicate the provenance of the new flow files? I would like to see where they have come from for errors/analysis.

I couldn't see a method in the Provenance Reporter that seemed like it would do that (essentially link a new flow file to an old one)

Cheers,
Phil
Reply | Threaded
Open this post in threaded view
|

Re: "One to many" provenance

Matt Burgess-2
Phil,

There should be a FORK event generated by a processor that generates
multiple flow files from an incoming one.  If a flow file is an exact
copy of an incoming one, I believe it will have a CLONE event
associated with it. Also, I think the session may handle this for you
in the general case, although there are other processors like
UnpackContent and QueryRecord that explicitly call
ProvenanceReporter.fork().

Regards,
Matt

On Mon, Nov 20, 2017 at 4:32 PM, Phil H <[hidden email]> wrote:
> Hi team,
>
> If I have a processor that takes one input flow file, and then generates many flow file outputs as a result (say, one output per line from a multi-line input file), how do I indicate the provenance of the new flow files? I would like to see where they have come from for errors/analysis.
>
> I couldn't see a method in the Provenance Reporter that seemed like it would do that (essentially link a new flow file to an old one)
>
> Cheers,
> Phil
Reply | Threaded
Open this post in threaded view
|

Re: "One to many" provenance

Phil H
Thanks Matt!

Cheers,
Phil

> On 21 Nov 2017, at 08:46, Matt Burgess <[hidden email]> wrote:
>
> Phil,
>
> There should be a FORK event generated by a processor that generates
> multiple flow files from an incoming one.  If a flow file is an exact
> copy of an incoming one, I believe it will have a CLONE event
> associated with it. Also, I think the session may handle this for you
> in the general case, although there are other processors like
> UnpackContent and QueryRecord that explicitly call
> ProvenanceReporter.fork().
>
> Regards,
> Matt
>
>> On Mon, Nov 20, 2017 at 4:32 PM, Phil H <[hidden email]> wrote:
>> Hi team,
>>
>> If I have a processor that takes one input flow file, and then generates many flow file outputs as a result (say, one output per line from a multi-line input file), how do I indicate the provenance of the new flow files? I would like to see where they have come from for errors/analysis.
>>
>> I couldn't see a method in the Provenance Reporter that seemed like it would do that (essentially link a new flow file to an old one)
>>
>> Cheers,
>> Phil
Reply | Threaded
Open this post in threaded view
|

Re: "One to many" provenance

Mark Payne
In reply to this post by Matt Burgess-2
Phil,

Just to clarify upon what Matt mentioned here, the session will automatically generate
the appropriate FORK event for you if you call ProcessSession.create(FlowFile) and
pass the 'parent' FlowFile to the session. If you just call ProcessSession.create() without
providing the parent, it will not be able to generate the FORK event for you. Calling
ProcessSession.create(FlowFile) is also important because it will copy the attributes from
the parent to the newly created FlowFile.

Thanks
-Mark

> On Nov 20, 2017, at 4:46 PM, Matt Burgess <[hidden email]> wrote:
>
> Phil,
>
> There should be a FORK event generated by a processor that generates
> multiple flow files from an incoming one.  If a flow file is an exact
> copy of an incoming one, I believe it will have a CLONE event
> associated with it. Also, I think the session may handle this for you
> in the general case, although there are other processors like
> UnpackContent and QueryRecord that explicitly call
> ProvenanceReporter.fork().
>
> Regards,
> Matt
>
> On Mon, Nov 20, 2017 at 4:32 PM, Phil H <[hidden email]> wrote:
>> Hi team,
>>
>> If I have a processor that takes one input flow file, and then generates many flow file outputs as a result (say, one output per line from a multi-line input file), how do I indicate the provenance of the new flow files? I would like to see where they have come from for errors/analysis.
>>
>> I couldn't see a method in the Provenance Reporter that seemed like it would do that (essentially link a new flow file to an old one)
>>
>> Cheers,
>> Phil

Reply | Threaded
Open this post in threaded view
|

Re: "One to many" provenance

Phil H
And thank you Mark!

Cheers,
Phil

> On 21 Nov 2017, at 09:00, Mark Payne <[hidden email]> wrote:
>
> Phil,
>
> Just to clarify upon what Matt mentioned here, the session will automatically generate
> the appropriate FORK event for you if you call ProcessSession.create(FlowFile) and
> pass the 'parent' FlowFile to the session. If you just call ProcessSession.create() without
> providing the parent, it will not be able to generate the FORK event for you. Calling
> ProcessSession.create(FlowFile) is also important because it will copy the attributes from
> the parent to the newly created FlowFile.
>
> Thanks
> -Mark
>
>> On Nov 20, 2017, at 4:46 PM, Matt Burgess <[hidden email]> wrote:
>>
>> Phil,
>>
>> There should be a FORK event generated by a processor that generates
>> multiple flow files from an incoming one.  If a flow file is an exact
>> copy of an incoming one, I believe it will have a CLONE event
>> associated with it. Also, I think the session may handle this for you
>> in the general case, although there are other processors like
>> UnpackContent and QueryRecord that explicitly call
>> ProvenanceReporter.fork().
>>
>> Regards,
>> Matt
>>
>>> On Mon, Nov 20, 2017 at 4:32 PM, Phil H <[hidden email]> wrote:
>>> Hi team,
>>>
>>> If I have a processor that takes one input flow file, and then generates many flow file outputs as a result (say, one output per line from a multi-line input file), how do I indicate the provenance of the new flow files? I would like to see where they have come from for errors/analysis.
>>>
>>> I couldn't see a method in the Provenance Reporter that seemed like it would do that (essentially link a new flow file to an old one)
>>>
>>> Cheers,
>>> Phil
>