New Standard Pattern - Put Exception that caused failure in an attribute

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.

I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?

--Peter
Reply | Threaded
Open this post in threaded view
|

Re: New Standard Pattern - Put Exception that caused failure in an attribute

Bryan Bende
I think processors should really have well defined relationships for
the error scenarios that need to be handled. Having the exception
message is ok for a human who wants to see it, but in order to do
anything with it in the flow you will have to have a bunch of
parsing/interpreting of the message with a bunch of routing
processors, which seems more brittle than just having the appropriate
relationships.
On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
>
> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
>
> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
>
> --Peter
Reply | Threaded
Open this post in threaded view
|

Re: New Standard Pattern - Put Exception that caused failure in an attribute

Mark Payne
I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern
that should be avoided. Relationships are not a workaround but rather the preferred approach
in this scenario - an attribute I would consider a workaround. This is due to the fact that not only
is it brittle and complex to add processors that route on such things, but there's no reason at all
to assume that from release to release (even bug fix/increment releases) that the Exception type
or message will be the same, so the flow could stop working at any time after upgrading nifi.
Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
similar IMO to Java Exception classes vs. throwing Strings in C.


> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
>
> I think processors should really have well defined relationships for
> the error scenarios that need to be handled. Having the exception
> message is ok for a human who wants to see it, but in order to do
> anything with it in the flow you will have to have a bunch of
> parsing/interpreting of the message with a bunch of routing
> processors, which seems more brittle than just having the appropriate
> relationships.
> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
>>
>> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
>> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
>>
>> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
>>
>> --Peter

Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Mark,

I agree with you that this is the best option in general terms. After thinking about it some more I think the biggest use case is for troubleshooting. If a file routes to failure, you need to be watching the UI to see what the exception was. An admin may have access to the NiFi log files and could grep the error out, but a normal user who checks in on the flow and sees a FlowFile in the error queue will not know what the cause was; this is especially frustrating if retrying the file works without failure the second time... Capturing the error message in an attribute makes this easy to find.

One thing I worry about too is adding new relationships to core processors. After an upgrade, won't users need to go to each instance of that processor and handle the new relationship? Right now I'd swagger we have at least five thousand ExecuteSQL processors in our environment; and while we have strong scripting skills in my NiFi team, I would not want to encounter this without that.

Thanks,
  Peter

-----Original Message-----
From: Mark Payne [mailto:[hidden email]]
Sent: Thursday, October 25, 2018 10:38 PM
To: [hidden email]
Subject: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern that should be avoided. Relationships are not a workaround but rather the preferred approach in this scenario - an attribute I would consider a workaround. This is due to the fact that not only is it brittle and complex to add processors that route on such things, but there's no reason at all to assume that from release to release (even bug fix/increment releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
similar IMO to Java Exception classes vs. throwing Strings in C.


> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
>
> I think processors should really have well defined relationships for
> the error scenarios that need to be handled. Having the exception
> message is ok for a human who wants to see it, but in order to do
> anything with it in the flow you will have to have a bunch of
> parsing/interpreting of the message with a bunch of routing
> processors, which seems more brittle than just having the appropriate
> relationships.
> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
>>
>> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
>> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
>>
>> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
>>
>> --Peter

Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Matt Burgess-2
NiFi (as of the last couple releases I think) has the ability to set
auto-terminating relationships; this IMO is one of those use cases
(for NiFi 1.x). If new relationships are added, they could default to
auto-terminate; then the existing processors should remain valid.
However we might want an "omnibus Jira" to capture those relationships
we'd like to remove the auto-termination from in NiFi 2.0.

Regards,
Matt
On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Mark,
>
> I agree with you that this is the best option in general terms. After thinking about it some more I think the biggest use case is for troubleshooting. If a file routes to failure, you need to be watching the UI to see what the exception was. An admin may have access to the NiFi log files and could grep the error out, but a normal user who checks in on the flow and sees a FlowFile in the error queue will not know what the cause was; this is especially frustrating if retrying the file works without failure the second time... Capturing the error message in an attribute makes this easy to find.
>
> One thing I worry about too is adding new relationships to core processors. After an upgrade, won't users need to go to each instance of that processor and handle the new relationship? Right now I'd swagger we have at least five thousand ExecuteSQL processors in our environment; and while we have strong scripting skills in my NiFi team, I would not want to encounter this without that.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: Mark Payne [mailto:[hidden email]]
> Sent: Thursday, October 25, 2018 10:38 PM
> To: [hidden email]
> Subject: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern that should be avoided. Relationships are not a workaround but rather the preferred approach in this scenario - an attribute I would consider a workaround. This is due to the fact that not only is it brittle and complex to add processors that route on such things, but there's no reason at all to assume that from release to release (even bug fix/increment releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
> similar IMO to Java Exception classes vs. throwing Strings in C.
>
>
> > On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> >
> > I think processors should really have well defined relationships for
> > the error scenarios that need to be handled. Having the exception
> > message is ok for a human who wants to see it, but in order to do
> > anything with it in the flow you will have to have a bunch of
> > parsing/interpreting of the message with a bunch of routing
> > processors, which seems more brittle than just having the appropriate
> > relationships.
> > On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
> >>
> >> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
> >> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
> >>
> >> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
> >>
> >> --Peter
>
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Matt,

If I were to split an existing failure relationship into several relationships, I do not think I would want to auto-terminate in most cases. Specifically, I'm interested in a failure relationship for a database disconnect during SQL execution (database was online when the connection was verified in the DBCP pool, but went down during execution). If I were to find a way to separate this into its own relationship, I do not think most users would appreciate it being a condition silently not handled by the normal failure path.

Thanks,
  Peter

-----Original Message-----
From: Matt Burgess [mailto:[hidden email]]
Sent: Friday, October 26, 2018 10:18 AM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

NiFi (as of the last couple releases I think) has the ability to set auto-terminating relationships; this IMO is one of those use cases (for NiFi 1.x). If new relationships are added, they could default to auto-terminate; then the existing processors should remain valid.
However we might want an "omnibus Jira" to capture those relationships we'd like to remove the auto-termination from in NiFi 2.0.

Regards,
Matt
On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Mark,
>
> I agree with you that this is the best option in general terms. After thinking about it some more I think the biggest use case is for troubleshooting. If a file routes to failure, you need to be watching the UI to see what the exception was. An admin may have access to the NiFi log files and could grep the error out, but a normal user who checks in on the flow and sees a FlowFile in the error queue will not know what the cause was; this is especially frustrating if retrying the file works without failure the second time... Capturing the error message in an attribute makes this easy to find.
>
> One thing I worry about too is adding new relationships to core processors. After an upgrade, won't users need to go to each instance of that processor and handle the new relationship? Right now I'd swagger we have at least five thousand ExecuteSQL processors in our environment; and while we have strong scripting skills in my NiFi team, I would not want to encounter this without that.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: Mark Payne [mailto:[hidden email]]
> Sent: Thursday, October 25, 2018 10:38 PM
> To: [hidden email]
> Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
> failure in an attribute
>
> I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern that should be avoided. Relationships are not a workaround but rather the preferred approach in this scenario - an attribute I would consider a workaround. This is due to the fact that not only is it brittle and complex to add processors that route on such things, but there's no reason at all to assume that from release to release (even bug fix/increment releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
> similar IMO to Java Exception classes vs. throwing Strings in C.
>
>
> > On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> >
> > I think processors should really have well defined relationships for
> > the error scenarios that need to be handled. Having the exception
> > message is ok for a human who wants to see it, but in order to do
> > anything with it in the flow you will have to have a bunch of
> > parsing/interpreting of the message with a bunch of routing
> > processors, which seems more brittle than just having the
> > appropriate relationships.
> > On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
> >>
> >> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
> >> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
> >>
> >> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
> >>
> >> --Peter
>
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Matt Burgess-2
Peter,

Totally agree, RDBMS/JDBC is in a weird class as always, there is a
teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
seems like we need to pick from a set of less-than-ideal solutions:

1) Add new relationships, but then your (possibly hundreds of)
processors are invalid
2) Add new auto-terminated relationships, but then your
previously-handled errors are "lost"
3) Add an attribute, but then each NiFi instance/release/flow is
responsible for parsing the error and handling it as desired.

We could mitigate 1-2 with a tool that updates your flow/template by
sending all new failure relationships to the same target as the
existing one, but then the tool itself suffers from maintainability
issues (as does option #3). If we could recognize that the new
relationships are self-terminated and then send the errors out to the
original failure relationship, that could be quite confusing to the
user, especially as time goes on (how to suppress the "new" errors,
e.g.).

IMHO I think we're between a rock and a hard place here, I guess with
great entropy comes great responsibility :P

P.S. For your use case, is the workaround to just keep retrying? Or
are there other constraints at play?

Regards,
Matt

On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Matt,
>
> If I were to split an existing failure relationship into several relationships, I do not think I would want to auto-terminate in most cases. Specifically, I'm interested in a failure relationship for a database disconnect during SQL execution (database was online when the connection was verified in the DBCP pool, but went down during execution). If I were to find a way to separate this into its own relationship, I do not think most users would appreciate it being a condition silently not handled by the normal failure path.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: Matt Burgess [mailto:[hidden email]]
> Sent: Friday, October 26, 2018 10:18 AM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> NiFi (as of the last couple releases I think) has the ability to set auto-terminating relationships; this IMO is one of those use cases (for NiFi 1.x). If new relationships are added, they could default to auto-terminate; then the existing processors should remain valid.
> However we might want an "omnibus Jira" to capture those relationships we'd like to remove the auto-termination from in NiFi 2.0.
>
> Regards,
> Matt
> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > Mark,
> >
> > I agree with you that this is the best option in general terms. After thinking about it some more I think the biggest use case is for troubleshooting. If a file routes to failure, you need to be watching the UI to see what the exception was. An admin may have access to the NiFi log files and could grep the error out, but a normal user who checks in on the flow and sees a FlowFile in the error queue will not know what the cause was; this is especially frustrating if retrying the file works without failure the second time... Capturing the error message in an attribute makes this easy to find.
> >
> > One thing I worry about too is adding new relationships to core processors. After an upgrade, won't users need to go to each instance of that processor and handle the new relationship? Right now I'd swagger we have at least five thousand ExecuteSQL processors in our environment; and while we have strong scripting skills in my NiFi team, I would not want to encounter this without that.
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: Mark Payne [mailto:[hidden email]]
> > Sent: Thursday, October 25, 2018 10:38 PM
> > To: [hidden email]
> > Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
> > failure in an attribute
> >
> > I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern that should be avoided. Relationships are not a workaround but rather the preferred approach in this scenario - an attribute I would consider a workaround. This is due to the fact that not only is it brittle and complex to add processors that route on such things, but there's no reason at all to assume that from release to release (even bug fix/increment releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
> > similar IMO to Java Exception classes vs. throwing Strings in C.
> >
> >
> > > On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > >
> > > I think processors should really have well defined relationships for
> > > the error scenarios that need to be handled. Having the exception
> > > message is ok for a human who wants to see it, but in order to do
> > > anything with it in the flow you will have to have a bunch of
> > > parsing/interpreting of the message with a bunch of routing
> > > processors, which seems more brittle than just having the
> > > appropriate relationships.
> > > On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
> > >>
> > >> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
> > >> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
> > >>
> > >> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
> > >>
> > >> --Peter
> >
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Koji Kawamura-2
Hi all,

I'd like to add another option to Matt's list of solutions:

4) Add a processor property, 'Enable detailed error handling'
(defaults to false), then toggle available list of relationships. This
way, existing flows such as Peter's don't have to change, while he can
opt-in new relationships. RouteOnAttribute can be a reference
implementation.

I like the idea of thinking relationships as potential exceptions. It
can be better if relationships have hierarchy.
Some users need more granular relationships while others don't.
For NiFi 2.0 or later, supporting relationship hierarchy at framework
can mitigate having additional property at each processor.

Thanks,
Koji
On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[hidden email]> wrote:

>
> Peter,
>
> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
> seems like we need to pick from a set of less-than-ideal solutions:
>
> 1) Add new relationships, but then your (possibly hundreds of)
> processors are invalid
> 2) Add new auto-terminated relationships, but then your
> previously-handled errors are "lost"
> 3) Add an attribute, but then each NiFi instance/release/flow is
> responsible for parsing the error and handling it as desired.
>
> We could mitigate 1-2 with a tool that updates your flow/template by
> sending all new failure relationships to the same target as the
> existing one, but then the tool itself suffers from maintainability
> issues (as does option #3). If we could recognize that the new
> relationships are self-terminated and then send the errors out to the
> original failure relationship, that could be quite confusing to the
> user, especially as time goes on (how to suppress the "new" errors,
> e.g.).
>
> IMHO I think we're between a rock and a hard place here, I guess with
> great entropy comes great responsibility :P
>
> P.S. For your use case, is the workaround to just keep retrying? Or
> are there other constraints at play?
>
> Regards,
> Matt
>
> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > Matt,
> >
> > If I were to split an existing failure relationship into several relationships, I do not think I would want to auto-terminate in most cases. Specifically, I'm interested in a failure relationship for a database disconnect during SQL execution (database was online when the connection was verified in the DBCP pool, but went down during execution). If I were to find a way to separate this into its own relationship, I do not think most users would appreciate it being a condition silently not handled by the normal failure path.
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: Matt Burgess [mailto:[hidden email]]
> > Sent: Friday, October 26, 2018 10:18 AM
> > To: [hidden email]
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
> >
> > NiFi (as of the last couple releases I think) has the ability to set auto-terminating relationships; this IMO is one of those use cases (for NiFi 1.x). If new relationships are added, they could default to auto-terminate; then the existing processors should remain valid.
> > However we might want an "omnibus Jira" to capture those relationships we'd like to remove the auto-termination from in NiFi 2.0.
> >
> > Regards,
> > Matt
> > On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:
> > >
> > > Mark,
> > >
> > > I agree with you that this is the best option in general terms. After thinking about it some more I think the biggest use case is for troubleshooting. If a file routes to failure, you need to be watching the UI to see what the exception was. An admin may have access to the NiFi log files and could grep the error out, but a normal user who checks in on the flow and sees a FlowFile in the error queue will not know what the cause was; this is especially frustrating if retrying the file works without failure the second time... Capturing the error message in an attribute makes this easy to find.
> > >
> > > One thing I worry about too is adding new relationships to core processors. After an upgrade, won't users need to go to each instance of that processor and handle the new relationship? Right now I'd swagger we have at least five thousand ExecuteSQL processors in our environment; and while we have strong scripting skills in my NiFi team, I would not want to encounter this without that.
> > >
> > > Thanks,
> > >   Peter
> > >
> > > -----Original Message-----
> > > From: Mark Payne [mailto:[hidden email]]
> > > Sent: Thursday, October 25, 2018 10:38 PM
> > > To: [hidden email]
> > > Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
> > > failure in an attribute
> > >
> > > I agree - the notion of adding a "failure.reason" attribute is, in my opinion, an anti-pattern that should be avoided. Relationships are not a workaround but rather the preferred approach in this scenario - an attribute I would consider a workaround. This is due to the fact that not only is it brittle and complex to add processors that route on such things, but there's no reason at all to assume that from release to release (even bug fix/increment releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > > Relationships offer a well-defined way to explicitly indicate "these are the possible outcomes,"
> > > similar IMO to Java Exception classes vs. throwing Strings in C.
> > >
> > >
> > > > On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > > >
> > > > I think processors should really have well defined relationships for
> > > > the error scenarios that need to be handled. Having the exception
> > > > message is ok for a human who wants to see it, but in order to do
> > > > anything with it in the flow you will have to have a bunch of
> > > > parsing/interpreting of the message with a bunch of routing
> > > > processors, which seems more brittle than just having the
> > > > appropriate relationships.
> > > > On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <[hidden email]> wrote:
> > > >>
> > > >> When a FlowFile is routed to failure, frequently there is no clear reason without looking into the actual error message.
> > > >> Some processors work around this by creating many different relationships, but even then frequently the generic Failure relationship also provides little guidance.
> > > >>
> > > >> I've seen a few cases recently where processors are including the exception message as an attribute on the FlowFile when routing to failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard pattern so that it's easier for users to route failures?
> > > >>
> > > >> --Peter
> > >
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Pierre Villard
Adding another option to the list.

Peter - if I understand correctly and based on my own experience, the idea
is not to have an 'exception' attribute to perform custom routing after the
failure relationship but rather have a more user friendly way to see what
happened without going through all the logs for a given flow file.

If that's correct, then could we add this information somehow to the
provenance event generated by the processor? Ideally adding a new field to
a provenance event or using the existing 'details' field?

Pierre


Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]> a
écrit :

> Hi all,
>
> I'd like to add another option to Matt's list of solutions:
>
> 4) Add a processor property, 'Enable detailed error handling'
> (defaults to false), then toggle available list of relationships. This
> way, existing flows such as Peter's don't have to change, while he can
> opt-in new relationships. RouteOnAttribute can be a reference
> implementation.
>
> I like the idea of thinking relationships as potential exceptions. It
> can be better if relationships have hierarchy.
> Some users need more granular relationships while others don't.
> For NiFi 2.0 or later, supporting relationship hierarchy at framework
> can mitigate having additional property at each processor.
>
> Thanks,
> Koji
> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[hidden email]>
> wrote:
> >
> > Peter,
> >
> > Totally agree, RDBMS/JDBC is in a weird class as always, there is a
> > teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
> > seems like we need to pick from a set of less-than-ideal solutions:
> >
> > 1) Add new relationships, but then your (possibly hundreds of)
> > processors are invalid
> > 2) Add new auto-terminated relationships, but then your
> > previously-handled errors are "lost"
> > 3) Add an attribute, but then each NiFi instance/release/flow is
> > responsible for parsing the error and handling it as desired.
> >
> > We could mitigate 1-2 with a tool that updates your flow/template by
> > sending all new failure relationships to the same target as the
> > existing one, but then the tool itself suffers from maintainability
> > issues (as does option #3). If we could recognize that the new
> > relationships are self-terminated and then send the errors out to the
> > original failure relationship, that could be quite confusing to the
> > user, especially as time goes on (how to suppress the "new" errors,
> > e.g.).
> >
> > IMHO I think we're between a rock and a hard place here, I guess with
> > great entropy comes great responsibility :P
> >
> > P.S. For your use case, is the workaround to just keep retrying? Or
> > are there other constraints at play?
> >
> > Regards,
> > Matt
> >
> > On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks) <[hidden email]>
> wrote:
> > >
> > > Matt,
> > >
> > > If I were to split an existing failure relationship into several
> relationships, I do not think I would want to auto-terminate in most cases.
> Specifically, I'm interested in a failure relationship for a database
> disconnect during SQL execution (database was online when the connection
> was verified in the DBCP pool, but went down during execution). If I were
> to find a way to separate this into its own relationship, I do not think
> most users would appreciate it being a condition silently not handled by
> the normal failure path.
> > >
> > > Thanks,
> > >   Peter
> > >
> > > -----Original Message-----
> > > From: Matt Burgess [mailto:[hidden email]]
> > > Sent: Friday, October 26, 2018 10:18 AM
> > > To: [hidden email]
> > > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
> > >
> > > NiFi (as of the last couple releases I think) has the ability to set
> auto-terminating relationships; this IMO is one of those use cases (for
> NiFi 1.x). If new relationships are added, they could default to
> auto-terminate; then the existing processors should remain valid.
> > > However we might want an "omnibus Jira" to capture those relationships
> we'd like to remove the auto-termination from in NiFi 2.0.
> > >
> > > Regards,
> > > Matt
> > > On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> [hidden email]> wrote:
> > > >
> > > > Mark,
> > > >
> > > > I agree with you that this is the best option in general terms.
> After thinking about it some more I think the biggest use case is for
> troubleshooting. If a file routes to failure, you need to be watching the
> UI to see what the exception was. An admin may have access to the NiFi log
> files and could grep the error out, but a normal user who checks in on the
> flow and sees a FlowFile in the error queue will not know what the cause
> was; this is especially frustrating if retrying the file works without
> failure the second time... Capturing the error message in an attribute
> makes this easy to find.
> > > >
> > > > One thing I worry about too is adding new relationships to core
> processors. After an upgrade, won't users need to go to each instance of
> that processor and handle the new relationship? Right now I'd swagger we
> have at least five thousand ExecuteSQL processors in our environment; and
> while we have strong scripting skills in my NiFi team, I would not want to
> encounter this without that.
> > > >
> > > > Thanks,
> > > >   Peter
> > > >
> > > > -----Original Message-----
> > > > From: Mark Payne [mailto:[hidden email]]
> > > > Sent: Thursday, October 25, 2018 10:38 PM
> > > > To: [hidden email]
> > > > Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
> > > > failure in an attribute
> > > >
> > > > I agree - the notion of adding a "failure.reason" attribute is, in
> my opinion, an anti-pattern that should be avoided. Relationships are not a
> workaround but rather the preferred approach in this scenario - an
> attribute I would consider a workaround. This is due to the fact that not
> only is it brittle and complex to add processors that route on such things,
> but there's no reason at all to assume that from release to release (even
> bug fix/increment releases) that the Exception type or message will be the
> same, so the flow could stop working at any time after upgrading nifi.
> > > > Relationships offer a well-defined way to explicitly indicate "these
> are the possible outcomes,"
> > > > similar IMO to Java Exception classes vs. throwing Strings in C.
> > > >
> > > >
> > > > > On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > > > >
> > > > > I think processors should really have well defined relationships
> for
> > > > > the error scenarios that need to be handled. Having the exception
> > > > > message is ok for a human who wants to see it, but in order to do
> > > > > anything with it in the flow you will have to have a bunch of
> > > > > parsing/interpreting of the message with a bunch of routing
> > > > > processors, which seems more brittle than just having the
> > > > > appropriate relationships.
> > > > > On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> [hidden email]> wrote:
> > > > >>
> > > > >> When a FlowFile is routed to failure, frequently there is no
> clear reason without looking into the actual error message.
> > > > >> Some processors work around this by creating many different
> relationships, but even then frequently the generic Failure relationship
> also provides little guidance.
> > > > >>
> > > > >> I've seen a few cases recently where processors are including the
> exception message as an attribute on the FlowFile when routing to failure
> (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard
> pattern so that it's easier for users to route failures?
> > > > >>
> > > > >> --Peter
> > > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Uwe@Moosheimer.com
Do you really want to mix provenance and data lineage with logging/error information?

Writing exception information/logging information within an attribute is not a bad idea in my opinion.
If a user wants to use this for routing, why not ... or whatever the user wants to do.

I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.

Regards,
Uwe

> Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
>
> Adding another option to the list.
>
> Peter - if I understand correctly and based on my own experience, the idea
> is not to have an 'exception' attribute to perform custom routing after the
> failure relationship but rather have a more user friendly way to see what
> happened without going through all the logs for a given flow file.
>
> If that's correct, then could we add this information somehow to the
> provenance event generated by the processor? Ideally adding a new field to
> a provenance event or using the existing 'details' field?
>
> Pierre
>
>
> Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]> a
> écrit :
>
>> Hi all,
>>
>> I'd like to add another option to Matt's list of solutions:
>>
>> 4) Add a processor property, 'Enable detailed error handling'
>> (defaults to false), then toggle available list of relationships. This
>> way, existing flows such as Peter's don't have to change, while he can
>> opt-in new relationships. RouteOnAttribute can be a reference
>> implementation.
>>
>> I like the idea of thinking relationships as potential exceptions. It
>> can be better if relationships have hierarchy.
>> Some users need more granular relationships while others don't.
>> For NiFi 2.0 or later, supporting relationship hierarchy at framework
>> can mitigate having additional property at each processor.
>>
>> Thanks,
>> Koji
>> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[hidden email]>
>> wrote:
>>>
>>> Peter,
>>>
>>> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
>>> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
>>> seems like we need to pick from a set of less-than-ideal solutions:
>>>
>>> 1) Add new relationships, but then your (possibly hundreds of)
>>> processors are invalid
>>> 2) Add new auto-terminated relationships, but then your
>>> previously-handled errors are "lost"
>>> 3) Add an attribute, but then each NiFi instance/release/flow is
>>> responsible for parsing the error and handling it as desired.
>>>
>>> We could mitigate 1-2 with a tool that updates your flow/template by
>>> sending all new failure relationships to the same target as the
>>> existing one, but then the tool itself suffers from maintainability
>>> issues (as does option #3). If we could recognize that the new
>>> relationships are self-terminated and then send the errors out to the
>>> original failure relationship, that could be quite confusing to the
>>> user, especially as time goes on (how to suppress the "new" errors,
>>> e.g.).
>>>
>>> IMHO I think we're between a rock and a hard place here, I guess with
>>> great entropy comes great responsibility :P
>>>
>>> P.S. For your use case, is the workaround to just keep retrying? Or
>>> are there other constraints at play?
>>>
>>> Regards,
>>> Matt
>>>
>>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks) <[hidden email]>
>> wrote:
>>>>
>>>> Matt,
>>>>
>>>> If I were to split an existing failure relationship into several
>> relationships, I do not think I would want to auto-terminate in most cases.
>> Specifically, I'm interested in a failure relationship for a database
>> disconnect during SQL execution (database was online when the connection
>> was verified in the DBCP pool, but went down during execution). If I were
>> to find a way to separate this into its own relationship, I do not think
>> most users would appreciate it being a condition silently not handled by
>> the normal failure path.
>>>>
>>>> Thanks,
>>>>  Peter
>>>>
>>>> -----Original Message-----
>>>> From: Matt Burgess [mailto:[hidden email]]
>>>> Sent: Friday, October 26, 2018 10:18 AM
>>>> To: [hidden email]
>>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
>> caused failure in an attribute
>>>>
>>>> NiFi (as of the last couple releases I think) has the ability to set
>> auto-terminating relationships; this IMO is one of those use cases (for
>> NiFi 1.x). If new relationships are added, they could default to
>> auto-terminate; then the existing processors should remain valid.
>>>> However we might want an "omnibus Jira" to capture those relationships
>> we'd like to remove the auto-termination from in NiFi 2.0.
>>>>
>>>> Regards,
>>>> Matt
>>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
>> [hidden email]> wrote:
>>>>>
>>>>> Mark,
>>>>>
>>>>> I agree with you that this is the best option in general terms.
>> After thinking about it some more I think the biggest use case is for
>> troubleshooting. If a file routes to failure, you need to be watching the
>> UI to see what the exception was. An admin may have access to the NiFi log
>> files and could grep the error out, but a normal user who checks in on the
>> flow and sees a FlowFile in the error queue will not know what the cause
>> was; this is especially frustrating if retrying the file works without
>> failure the second time... Capturing the error message in an attribute
>> makes this easy to find.
>>>>>
>>>>> One thing I worry about too is adding new relationships to core
>> processors. After an upgrade, won't users need to go to each instance of
>> that processor and handle the new relationship? Right now I'd swagger we
>> have at least five thousand ExecuteSQL processors in our environment; and
>> while we have strong scripting skills in my NiFi team, I would not want to
>> encounter this without that.
>>>>>
>>>>> Thanks,
>>>>>  Peter
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mark Payne [mailto:[hidden email]]
>>>>> Sent: Thursday, October 25, 2018 10:38 PM
>>>>> To: [hidden email]
>>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
>>>>> failure in an attribute
>>>>>
>>>>> I agree - the notion of adding a "failure.reason" attribute is, in
>> my opinion, an anti-pattern that should be avoided. Relationships are not a
>> workaround but rather the preferred approach in this scenario - an
>> attribute I would consider a workaround. This is due to the fact that not
>> only is it brittle and complex to add processors that route on such things,
>> but there's no reason at all to assume that from release to release (even
>> bug fix/increment releases) that the Exception type or message will be the
>> same, so the flow could stop working at any time after upgrading nifi.
>>>>> Relationships offer a well-defined way to explicitly indicate "these
>> are the possible outcomes,"
>>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
>>>>>
>>>>>
>>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
>>>>>>
>>>>>> I think processors should really have well defined relationships
>> for
>>>>>> the error scenarios that need to be handled. Having the exception
>>>>>> message is ok for a human who wants to see it, but in order to do
>>>>>> anything with it in the flow you will have to have a bunch of
>>>>>> parsing/interpreting of the message with a bunch of routing
>>>>>> processors, which seems more brittle than just having the
>>>>>> appropriate relationships.
>>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
>> [hidden email]> wrote:
>>>>>>>
>>>>>>> When a FlowFile is routed to failure, frequently there is no
>> clear reason without looking into the actual error message.
>>>>>>> Some processors work around this by creating many different
>> relationships, but even then frequently the generic Failure relationship
>> also provides little guidance.
>>>>>>>
>>>>>>> I've seen a few cases recently where processors are including the
>> exception message as an attribute on the FlowFile when routing to failure
>> (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be a standard
>> pattern so that it's easier for users to route failures?
>>>>>>>
>>>>>>> --Peter
>>>>>
>>

Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Sorry for the delayed response, I've been traveling.

Responses in order:

Matt,
Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.

Koji,
Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.

Pierre,
My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.

I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.

Uwe,
That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
 
But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.

Thanks,
  Peter

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: Saturday, October 27, 2018 10:46 AM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Do you really want to mix provenance and data lineage with logging/error information?

Writing exception information/logging information within an attribute is not a bad idea in my opinion.
If a user wants to use this for routing, why not ... or whatever the user wants to do.

I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.

Regards,
Uwe

> Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
>
> Adding another option to the list.
>
> Peter - if I understand correctly and based on my own experience, the
> idea is not to have an 'exception' attribute to perform custom routing
> after the failure relationship but rather have a more user friendly
> way to see what happened without going through all the logs for a given flow file.
>
> If that's correct, then could we add this information somehow to the
> provenance event generated by the processor? Ideally adding a new
> field to a provenance event or using the existing 'details' field?
>
> Pierre
>
>
> Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]> a
> écrit :
>
>> Hi all,
>>
>> I'd like to add another option to Matt's list of solutions:
>>
>> 4) Add a processor property, 'Enable detailed error handling'
>> (defaults to false), then toggle available list of relationships.
>> This way, existing flows such as Peter's don't have to change, while
>> he can opt-in new relationships. RouteOnAttribute can be a reference
>> implementation.
>>
>> I like the idea of thinking relationships as potential exceptions. It
>> can be better if relationships have hierarchy.
>> Some users need more granular relationships while others don't.
>> For NiFi 2.0 or later, supporting relationship hierarchy at framework
>> can mitigate having additional property at each processor.
>>
>> Thanks,
>> Koji
>> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[hidden email]>
>> wrote:
>>>
>>> Peter,
>>>
>>> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
>>> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
>>> seems like we need to pick from a set of less-than-ideal solutions:
>>>
>>> 1) Add new relationships, but then your (possibly hundreds of)
>>> processors are invalid
>>> 2) Add new auto-terminated relationships, but then your
>>> previously-handled errors are "lost"
>>> 3) Add an attribute, but then each NiFi instance/release/flow is
>>> responsible for parsing the error and handling it as desired.
>>>
>>> We could mitigate 1-2 with a tool that updates your flow/template by
>>> sending all new failure relationships to the same target as the
>>> existing one, but then the tool itself suffers from maintainability
>>> issues (as does option #3). If we could recognize that the new
>>> relationships are self-terminated and then send the errors out to
>>> the original failure relationship, that could be quite confusing to
>>> the user, especially as time goes on (how to suppress the "new"
>>> errors, e.g.).
>>>
>>> IMHO I think we're between a rock and a hard place here, I guess
>>> with great entropy comes great responsibility :P
>>>
>>> P.S. For your use case, is the workaround to just keep retrying? Or
>>> are there other constraints at play?
>>>
>>> Regards,
>>> Matt
>>>
>>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
>>> <[hidden email]>
>> wrote:
>>>>
>>>> Matt,
>>>>
>>>> If I were to split an existing failure relationship into several
>> relationships, I do not think I would want to auto-terminate in most cases.
>> Specifically, I'm interested in a failure relationship for a database
>> disconnect during SQL execution (database was online when the
>> connection was verified in the DBCP pool, but went down during
>> execution). If I were to find a way to separate this into its own
>> relationship, I do not think most users would appreciate it being a
>> condition silently not handled by the normal failure path.
>>>>
>>>> Thanks,
>>>>  Peter
>>>>
>>>> -----Original Message-----
>>>> From: Matt Burgess [mailto:[hidden email]]
>>>> Sent: Friday, October 26, 2018 10:18 AM
>>>> To: [hidden email]
>>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
>> caused failure in an attribute
>>>>
>>>> NiFi (as of the last couple releases I think) has the ability to
>>>> set
>> auto-terminating relationships; this IMO is one of those use cases
>> (for NiFi 1.x). If new relationships are added, they could default to
>> auto-terminate; then the existing processors should remain valid.
>>>> However we might want an "omnibus Jira" to capture those
>>>> relationships
>> we'd like to remove the auto-termination from in NiFi 2.0.
>>>>
>>>> Regards,
>>>> Matt
>>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
>> [hidden email]> wrote:
>>>>>
>>>>> Mark,
>>>>>
>>>>> I agree with you that this is the best option in general terms.
>> After thinking about it some more I think the biggest use case is for
>> troubleshooting. If a file routes to failure, you need to be watching
>> the UI to see what the exception was. An admin may have access to the
>> NiFi log files and could grep the error out, but a normal user who
>> checks in on the flow and sees a FlowFile in the error queue will not
>> know what the cause was; this is especially frustrating if retrying
>> the file works without failure the second time... Capturing the error
>> message in an attribute makes this easy to find.
>>>>>
>>>>> One thing I worry about too is adding new relationships to core
>> processors. After an upgrade, won't users need to go to each instance
>> of that processor and handle the new relationship? Right now I'd
>> swagger we have at least five thousand ExecuteSQL processors in our
>> environment; and while we have strong scripting skills in my NiFi
>> team, I would not want to encounter this without that.
>>>>>
>>>>> Thanks,
>>>>>  Peter
>>>>>
>>>>> -----Original Message-----
>>>>> From: Mark Payne [mailto:[hidden email]]
>>>>> Sent: Thursday, October 25, 2018 10:38 PM
>>>>> To: [hidden email]
>>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
>>>>> caused failure in an attribute
>>>>>
>>>>> I agree - the notion of adding a "failure.reason" attribute is, in
>> my opinion, an anti-pattern that should be avoided. Relationships are
>> not a workaround but rather the preferred approach in this scenario -
>> an attribute I would consider a workaround. This is due to the fact
>> that not only is it brittle and complex to add processors that route
>> on such things, but there's no reason at all to assume that from
>> release to release (even bug fix/increment releases) that the
>> Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
>>>>> Relationships offer a well-defined way to explicitly indicate
>>>>> "these
>> are the possible outcomes,"
>>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
>>>>>
>>>>>
>>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
>>>>>>
>>>>>> I think processors should really have well defined relationships
>> for
>>>>>> the error scenarios that need to be handled. Having the exception
>>>>>> message is ok for a human who wants to see it, but in order to do
>>>>>> anything with it in the flow you will have to have a bunch of
>>>>>> parsing/interpreting of the message with a bunch of routing
>>>>>> processors, which seems more brittle than just having the
>>>>>> appropriate relationships.
>>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
>> [hidden email]> wrote:
>>>>>>>
>>>>>>> When a FlowFile is routed to failure, frequently there is no
>> clear reason without looking into the actual error message.
>>>>>>> Some processors work around this by creating many different
>> relationships, but even then frequently the generic Failure
>> relationship also provides little guidance.
>>>>>>>
>>>>>>> I've seen a few cases recently where processors are including
>>>>>>> the
>> exception message as an attribute on the FlowFile when routing to
>> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be
>> a standard pattern so that it's easier for users to route failures?
>>>>>>>
>>>>>>> --Peter
>>>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

James Srinivasan
Apologies if I've missed this in the discussion so far - we use the
InvokeHTTP processor a lot, and the invokehttp.java.exception.message
attribute is really handy diving into why things have failed without
having to match up logs with flow files (from a system with hundreds
of processors making thousands of requests). We also route on
invokehttp.status.code (e.g. to retry 403s due to a race hazard in an
external system) but I don't imagine we'd route on
invokehttp.java.exception.* since (as others have mentioned) it looks
pretty fragile.

--
James
On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Sorry for the delayed response, I've been traveling.
>
> Responses in order:
>
> Matt,
> Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
>
> Koji,
> Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
>
> Pierre,
> My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
>
> I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
>
> Uwe,
> That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
>
> But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Saturday, October 27, 2018 10:46 AM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> Do you really want to mix provenance and data lineage with logging/error information?
>
> Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> If a user wants to use this for routing, why not ... or whatever the user wants to do.
>
> I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
>
> Regards,
> Uwe
>
> > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> >
> > Adding another option to the list.
> >
> > Peter - if I understand correctly and based on my own experience, the
> > idea is not to have an 'exception' attribute to perform custom routing
> > after the failure relationship but rather have a more user friendly
> > way to see what happened without going through all the logs for a given flow file.
> >
> > If that's correct, then could we add this information somehow to the
> > provenance event generated by the processor? Ideally adding a new
> > field to a provenance event or using the existing 'details' field?
> >
> > Pierre
> >
> >
> > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]> a
> > écrit :
> >
> >> Hi all,
> >>
> >> I'd like to add another option to Matt's list of solutions:
> >>
> >> 4) Add a processor property, 'Enable detailed error handling'
> >> (defaults to false), then toggle available list of relationships.
> >> This way, existing flows such as Peter's don't have to change, while
> >> he can opt-in new relationships. RouteOnAttribute can be a reference
> >> implementation.
> >>
> >> I like the idea of thinking relationships as potential exceptions. It
> >> can be better if relationships have hierarchy.
> >> Some users need more granular relationships while others don't.
> >> For NiFi 2.0 or later, supporting relationship hierarchy at framework
> >> can mitigate having additional property at each processor.
> >>
> >> Thanks,
> >> Koji
> >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess <[hidden email]>
> >> wrote:
> >>>
> >>> Peter,
> >>>
> >>> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
> >>> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
> >>> seems like we need to pick from a set of less-than-ideal solutions:
> >>>
> >>> 1) Add new relationships, but then your (possibly hundreds of)
> >>> processors are invalid
> >>> 2) Add new auto-terminated relationships, but then your
> >>> previously-handled errors are "lost"
> >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> >>> responsible for parsing the error and handling it as desired.
> >>>
> >>> We could mitigate 1-2 with a tool that updates your flow/template by
> >>> sending all new failure relationships to the same target as the
> >>> existing one, but then the tool itself suffers from maintainability
> >>> issues (as does option #3). If we could recognize that the new
> >>> relationships are self-terminated and then send the errors out to
> >>> the original failure relationship, that could be quite confusing to
> >>> the user, especially as time goes on (how to suppress the "new"
> >>> errors, e.g.).
> >>>
> >>> IMHO I think we're between a rock and a hard place here, I guess
> >>> with great entropy comes great responsibility :P
> >>>
> >>> P.S. For your use case, is the workaround to just keep retrying? Or
> >>> are there other constraints at play?
> >>>
> >>> Regards,
> >>> Matt
> >>>
> >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> >>> <[hidden email]>
> >> wrote:
> >>>>
> >>>> Matt,
> >>>>
> >>>> If I were to split an existing failure relationship into several
> >> relationships, I do not think I would want to auto-terminate in most cases.
> >> Specifically, I'm interested in a failure relationship for a database
> >> disconnect during SQL execution (database was online when the
> >> connection was verified in the DBCP pool, but went down during
> >> execution). If I were to find a way to separate this into its own
> >> relationship, I do not think most users would appreciate it being a
> >> condition silently not handled by the normal failure path.
> >>>>
> >>>> Thanks,
> >>>>  Peter
> >>>>
> >>>> -----Original Message-----
> >>>> From: Matt Burgess [mailto:[hidden email]]
> >>>> Sent: Friday, October 26, 2018 10:18 AM
> >>>> To: [hidden email]
> >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> >> caused failure in an attribute
> >>>>
> >>>> NiFi (as of the last couple releases I think) has the ability to
> >>>> set
> >> auto-terminating relationships; this IMO is one of those use cases
> >> (for NiFi 1.x). If new relationships are added, they could default to
> >> auto-terminate; then the existing processors should remain valid.
> >>>> However we might want an "omnibus Jira" to capture those
> >>>> relationships
> >> we'd like to remove the auto-termination from in NiFi 2.0.
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>
> >>>>> Mark,
> >>>>>
> >>>>> I agree with you that this is the best option in general terms.
> >> After thinking about it some more I think the biggest use case is for
> >> troubleshooting. If a file routes to failure, you need to be watching
> >> the UI to see what the exception was. An admin may have access to the
> >> NiFi log files and could grep the error out, but a normal user who
> >> checks in on the flow and sees a FlowFile in the error queue will not
> >> know what the cause was; this is especially frustrating if retrying
> >> the file works without failure the second time... Capturing the error
> >> message in an attribute makes this easy to find.
> >>>>>
> >>>>> One thing I worry about too is adding new relationships to core
> >> processors. After an upgrade, won't users need to go to each instance
> >> of that processor and handle the new relationship? Right now I'd
> >> swagger we have at least five thousand ExecuteSQL processors in our
> >> environment; and while we have strong scripting skills in my NiFi
> >> team, I would not want to encounter this without that.
> >>>>>
> >>>>> Thanks,
> >>>>>  Peter
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Mark Payne [mailto:[hidden email]]
> >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> >>>>> To: [hidden email]
> >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> >>>>> caused failure in an attribute
> >>>>>
> >>>>> I agree - the notion of adding a "failure.reason" attribute is, in
> >> my opinion, an anti-pattern that should be avoided. Relationships are
> >> not a workaround but rather the preferred approach in this scenario -
> >> an attribute I would consider a workaround. This is due to the fact
> >> that not only is it brittle and complex to add processors that route
> >> on such things, but there's no reason at all to assume that from
> >> release to release (even bug fix/increment releases) that the
> >> Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> >>>>> Relationships offer a well-defined way to explicitly indicate
> >>>>> "these
> >> are the possible outcomes,"
> >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> >>>>>
> >>>>>
> >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> >>>>>>
> >>>>>> I think processors should really have well defined relationships
> >> for
> >>>>>> the error scenarios that need to be handled. Having the exception
> >>>>>> message is ok for a human who wants to see it, but in order to do
> >>>>>> anything with it in the flow you will have to have a bunch of
> >>>>>> parsing/interpreting of the message with a bunch of routing
> >>>>>> processors, which seems more brittle than just having the
> >>>>>> appropriate relationships.
> >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>>>
> >>>>>>> When a FlowFile is routed to failure, frequently there is no
> >> clear reason without looking into the actual error message.
> >>>>>>> Some processors work around this by creating many different
> >> relationships, but even then frequently the generic Failure
> >> relationship also provides little guidance.
> >>>>>>>
> >>>>>>> I've seen a few cases recently where processors are including
> >>>>>>> the
> >> exception message as an attribute on the FlowFile when routing to
> >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this be
> >> a standard pattern so that it's easier for users to route failures?
> >>>>>>>
> >>>>>>> --Peter
> >>>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Dev Team,

I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.

Examples using this pattern already, with exception types/text written in FlowFile Attributes:
 - GenerateTableFetch (Matt added back in 2017) does this for incoming FlowFiles that cause a SQL exception
 - PutDatabaseRecord (Matt added also in 2017 with the original version of the processor)
 - ValidateCSV and ValidateXML puts that validation cause as an Attribute (maybe not exactly the same, but feels similar)
 - InvokeHTTP, InvokeGRPC Exception class name and Exception message
 - Couchbase Processors (Put/Get) provides the exception class name
 - PutLambda (six different exception fields get written to Attributes)
 - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
 - ExecuteStreamCommand provides the error message from a script execution as an attribute.
 - DeleteHDFS puts the error message as an Attribute
 - ScanHBase puts scanning errors as an Attribute
 - DeleteElasticSearch (both versions) put deletion failure messages as an attribute
 - InfluxDB processors do this also (influxdb.error.message)
 - ConvertExcelToCSV tracks conversion errors in an attribute
 - RethinkDB processors do this too

Thanks,
  Peter

-----Original Message-----
From: James Srinivasan [mailto:[hidden email]]
Sent: Tuesday, October 30, 2018 3:00 PM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Apologies if I've missed this in the discussion so far - we use the InvokeHTTP processor a lot, and the invokehttp.java.exception.message attribute is really handy diving into why things have failed without having to match up logs with flow files (from a system with hundreds of processors making thousands of requests). We also route on invokehttp.status.code (e.g. to retry 403s due to a race hazard in an external system) but I don't imagine we'd route on
invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.

--
James
On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Sorry for the delayed response, I've been traveling.
>
> Responses in order:
>
> Matt,
> Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
>
> Koji,
> Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
>
> Pierre,
> My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
>
> I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
>
> Uwe,
> That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
>
> But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Saturday, October 27, 2018 10:46 AM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
>
> Do you really want to mix provenance and data lineage with logging/error information?
>
> Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> If a user wants to use this for routing, why not ... or whatever the user wants to do.
>
> I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
>
> Regards,
> Uwe
>
> > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> >
> > Adding another option to the list.
> >
> > Peter - if I understand correctly and based on my own experience,
> > the idea is not to have an 'exception' attribute to perform custom
> > routing after the failure relationship but rather have a more user
> > friendly way to see what happened without going through all the logs for a given flow file.
> >
> > If that's correct, then could we add this information somehow to the
> > provenance event generated by the processor? Ideally adding a new
> > field to a provenance event or using the existing 'details' field?
> >
> > Pierre
> >
> >
> > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]>
> > a écrit :
> >
> >> Hi all,
> >>
> >> I'd like to add another option to Matt's list of solutions:
> >>
> >> 4) Add a processor property, 'Enable detailed error handling'
> >> (defaults to false), then toggle available list of relationships.
> >> This way, existing flows such as Peter's don't have to change,
> >> while he can opt-in new relationships. RouteOnAttribute can be a
> >> reference implementation.
> >>
> >> I like the idea of thinking relationships as potential exceptions.
> >> It can be better if relationships have hierarchy.
> >> Some users need more granular relationships while others don't.
> >> For NiFi 2.0 or later, supporting relationship hierarchy at
> >> framework can mitigate having additional property at each processor.
> >>
> >> Thanks,
> >> Koji
> >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> >> <[hidden email]>
> >> wrote:
> >>>
> >>> Peter,
> >>>
> >>> Totally agree, RDBMS/JDBC is in a weird class as always, there is
> >>> a teaspoon of exception types for an ocean of causes. For NiFi
> >>> 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> >>>
> >>> 1) Add new relationships, but then your (possibly hundreds of)
> >>> processors are invalid
> >>> 2) Add new auto-terminated relationships, but then your
> >>> previously-handled errors are "lost"
> >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> >>> responsible for parsing the error and handling it as desired.
> >>>
> >>> We could mitigate 1-2 with a tool that updates your flow/template
> >>> by sending all new failure relationships to the same target as the
> >>> existing one, but then the tool itself suffers from
> >>> maintainability issues (as does option #3). If we could recognize
> >>> that the new relationships are self-terminated and then send the
> >>> errors out to the original failure relationship, that could be
> >>> quite confusing to the user, especially as time goes on (how to suppress the "new"
> >>> errors, e.g.).
> >>>
> >>> IMHO I think we're between a rock and a hard place here, I guess
> >>> with great entropy comes great responsibility :P
> >>>
> >>> P.S. For your use case, is the workaround to just keep retrying?
> >>> Or are there other constraints at play?
> >>>
> >>> Regards,
> >>> Matt
> >>>
> >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> >>> <[hidden email]>
> >> wrote:
> >>>>
> >>>> Matt,
> >>>>
> >>>> If I were to split an existing failure relationship into several
> >> relationships, I do not think I would want to auto-terminate in most cases.
> >> Specifically, I'm interested in a failure relationship for a
> >> database disconnect during SQL execution (database was online when
> >> the connection was verified in the DBCP pool, but went down during
> >> execution). If I were to find a way to separate this into its own
> >> relationship, I do not think most users would appreciate it being a
> >> condition silently not handled by the normal failure path.
> >>>>
> >>>> Thanks,
> >>>>  Peter
> >>>>
> >>>> -----Original Message-----
> >>>> From: Matt Burgess [mailto:[hidden email]]
> >>>> Sent: Friday, October 26, 2018 10:18 AM
> >>>> To: [hidden email]
> >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> >> caused failure in an attribute
> >>>>
> >>>> NiFi (as of the last couple releases I think) has the ability to
> >>>> set
> >> auto-terminating relationships; this IMO is one of those use cases
> >> (for NiFi 1.x). If new relationships are added, they could default
> >> to auto-terminate; then the existing processors should remain valid.
> >>>> However we might want an "omnibus Jira" to capture those
> >>>> relationships
> >> we'd like to remove the auto-termination from in NiFi 2.0.
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>
> >>>>> Mark,
> >>>>>
> >>>>> I agree with you that this is the best option in general terms.
> >> After thinking about it some more I think the biggest use case is
> >> for troubleshooting. If a file routes to failure, you need to be
> >> watching the UI to see what the exception was. An admin may have
> >> access to the NiFi log files and could grep the error out, but a
> >> normal user who checks in on the flow and sees a FlowFile in the
> >> error queue will not know what the cause was; this is especially
> >> frustrating if retrying the file works without failure the second
> >> time... Capturing the error message in an attribute makes this easy to find.
> >>>>>
> >>>>> One thing I worry about too is adding new relationships to core
> >> processors. After an upgrade, won't users need to go to each
> >> instance of that processor and handle the new relationship? Right
> >> now I'd swagger we have at least five thousand ExecuteSQL
> >> processors in our environment; and while we have strong scripting
> >> skills in my NiFi team, I would not want to encounter this without that.
> >>>>>
> >>>>> Thanks,
> >>>>>  Peter
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Mark Payne [mailto:[hidden email]]
> >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> >>>>> To: [hidden email]
> >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> >>>>> caused failure in an attribute
> >>>>>
> >>>>> I agree - the notion of adding a "failure.reason" attribute is,
> >>>>> in
> >> my opinion, an anti-pattern that should be avoided. Relationships
> >> are not a workaround but rather the preferred approach in this
> >> scenario - an attribute I would consider a workaround. This is due
> >> to the fact that not only is it brittle and complex to add
> >> processors that route on such things, but there's no reason at all
> >> to assume that from release to release (even bug fix/increment
> >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> >>>>> Relationships offer a well-defined way to explicitly indicate
> >>>>> "these
> >> are the possible outcomes,"
> >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> >>>>>
> >>>>>
> >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> >>>>>>
> >>>>>> I think processors should really have well defined
> >>>>>> relationships
> >> for
> >>>>>> the error scenarios that need to be handled. Having the
> >>>>>> exception message is ok for a human who wants to see it, but in
> >>>>>> order to do anything with it in the flow you will have to have
> >>>>>> a bunch of parsing/interpreting of the message with a bunch of
> >>>>>> routing processors, which seems more brittle than just having
> >>>>>> the appropriate relationships.
> >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>>>
> >>>>>>> When a FlowFile is routed to failure, frequently there is no
> >> clear reason without looking into the actual error message.
> >>>>>>> Some processors work around this by creating many different
> >> relationships, but even then frequently the generic Failure
> >> relationship also provides little guidance.
> >>>>>>>
> >>>>>>> I've seen a few cases recently where processors are including
> >>>>>>> the
> >> exception message as an attribute on the FlowFile when routing to
> >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this
> >> be a standard pattern so that it's easier for users to route failures?
> >>>>>>>
> >>>>>>> --Peter
> >>>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
A one week bump on this thread. --Peter

-----Original Message-----
From: Peter Wicks (pwicks)
Sent: Friday, November 2, 2018 11:54 AM
To: [hidden email]
Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Dev Team,

I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.

Examples using this pattern already, with exception types/text written in FlowFile Attributes:
 - GenerateTableFetch (Matt added back in 2017) does this for incoming FlowFiles that cause a SQL exception
 - PutDatabaseRecord (Matt added also in 2017 with the original version of the processor)
 - ValidateCSV and ValidateXML puts that validation cause as an Attribute (maybe not exactly the same, but feels similar)
 - InvokeHTTP, InvokeGRPC Exception class name and Exception message
 - Couchbase Processors (Put/Get) provides the exception class name
 - PutLambda (six different exception fields get written to Attributes)
 - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
 - ExecuteStreamCommand provides the error message from a script execution as an attribute.
 - DeleteHDFS puts the error message as an Attribute
 - ScanHBase puts scanning errors as an Attribute
 - DeleteElasticSearch (both versions) put deletion failure messages as an attribute
 - InfluxDB processors do this also (influxdb.error.message)
 - ConvertExcelToCSV tracks conversion errors in an attribute
 - RethinkDB processors do this too

Thanks,
  Peter

-----Original Message-----
From: James Srinivasan [mailto:[hidden email]]
Sent: Tuesday, October 30, 2018 3:00 PM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Apologies if I've missed this in the discussion so far - we use the InvokeHTTP processor a lot, and the invokehttp.java.exception.message attribute is really handy diving into why things have failed without having to match up logs with flow files (from a system with hundreds of processors making thousands of requests). We also route on invokehttp.status.code (e.g. to retry 403s due to a race hazard in an external system) but I don't imagine we'd route on
invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.

--
James
On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Sorry for the delayed response, I've been traveling.
>
> Responses in order:
>
> Matt,
> Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
>
> Koji,
> Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
>
> Pierre,
> My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
>
> I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
>
> Uwe,
> That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
>
> But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: Saturday, October 27, 2018 10:46 AM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
>
> Do you really want to mix provenance and data lineage with logging/error information?
>
> Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> If a user wants to use this for routing, why not ... or whatever the user wants to do.
>
> I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
>
> Regards,
> Uwe
>
> > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> >
> > Adding another option to the list.
> >
> > Peter - if I understand correctly and based on my own experience,
> > the idea is not to have an 'exception' attribute to perform custom
> > routing after the failure relationship but rather have a more user
> > friendly way to see what happened without going through all the logs for a given flow file.
> >
> > If that's correct, then could we add this information somehow to the
> > provenance event generated by the processor? Ideally adding a new
> > field to a provenance event or using the existing 'details' field?
> >
> > Pierre
> >
> >
> > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]>
> > a écrit :
> >
> >> Hi all,
> >>
> >> I'd like to add another option to Matt's list of solutions:
> >>
> >> 4) Add a processor property, 'Enable detailed error handling'
> >> (defaults to false), then toggle available list of relationships.
> >> This way, existing flows such as Peter's don't have to change,
> >> while he can opt-in new relationships. RouteOnAttribute can be a
> >> reference implementation.
> >>
> >> I like the idea of thinking relationships as potential exceptions.
> >> It can be better if relationships have hierarchy.
> >> Some users need more granular relationships while others don't.
> >> For NiFi 2.0 or later, supporting relationship hierarchy at
> >> framework can mitigate having additional property at each processor.
> >>
> >> Thanks,
> >> Koji
> >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> >> <[hidden email]>
> >> wrote:
> >>>
> >>> Peter,
> >>>
> >>> Totally agree, RDBMS/JDBC is in a weird class as always, there is
> >>> a teaspoon of exception types for an ocean of causes. For NiFi
> >>> 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> >>>
> >>> 1) Add new relationships, but then your (possibly hundreds of)
> >>> processors are invalid
> >>> 2) Add new auto-terminated relationships, but then your
> >>> previously-handled errors are "lost"
> >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> >>> responsible for parsing the error and handling it as desired.
> >>>
> >>> We could mitigate 1-2 with a tool that updates your flow/template
> >>> by sending all new failure relationships to the same target as the
> >>> existing one, but then the tool itself suffers from
> >>> maintainability issues (as does option #3). If we could recognize
> >>> that the new relationships are self-terminated and then send the
> >>> errors out to the original failure relationship, that could be
> >>> quite confusing to the user, especially as time goes on (how to suppress the "new"
> >>> errors, e.g.).
> >>>
> >>> IMHO I think we're between a rock and a hard place here, I guess
> >>> with great entropy comes great responsibility :P
> >>>
> >>> P.S. For your use case, is the workaround to just keep retrying?
> >>> Or are there other constraints at play?
> >>>
> >>> Regards,
> >>> Matt
> >>>
> >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> >>> <[hidden email]>
> >> wrote:
> >>>>
> >>>> Matt,
> >>>>
> >>>> If I were to split an existing failure relationship into several
> >> relationships, I do not think I would want to auto-terminate in most cases.
> >> Specifically, I'm interested in a failure relationship for a
> >> database disconnect during SQL execution (database was online when
> >> the connection was verified in the DBCP pool, but went down during
> >> execution). If I were to find a way to separate this into its own
> >> relationship, I do not think most users would appreciate it being a
> >> condition silently not handled by the normal failure path.
> >>>>
> >>>> Thanks,
> >>>>  Peter
> >>>>
> >>>> -----Original Message-----
> >>>> From: Matt Burgess [mailto:[hidden email]]
> >>>> Sent: Friday, October 26, 2018 10:18 AM
> >>>> To: [hidden email]
> >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> >> caused failure in an attribute
> >>>>
> >>>> NiFi (as of the last couple releases I think) has the ability to
> >>>> set
> >> auto-terminating relationships; this IMO is one of those use cases
> >> (for NiFi 1.x). If new relationships are added, they could default
> >> to auto-terminate; then the existing processors should remain valid.
> >>>> However we might want an "omnibus Jira" to capture those
> >>>> relationships
> >> we'd like to remove the auto-termination from in NiFi 2.0.
> >>>>
> >>>> Regards,
> >>>> Matt
> >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>
> >>>>> Mark,
> >>>>>
> >>>>> I agree with you that this is the best option in general terms.
> >> After thinking about it some more I think the biggest use case is
> >> for troubleshooting. If a file routes to failure, you need to be
> >> watching the UI to see what the exception was. An admin may have
> >> access to the NiFi log files and could grep the error out, but a
> >> normal user who checks in on the flow and sees a FlowFile in the
> >> error queue will not know what the cause was; this is especially
> >> frustrating if retrying the file works without failure the second
> >> time... Capturing the error message in an attribute makes this easy to find.
> >>>>>
> >>>>> One thing I worry about too is adding new relationships to core
> >> processors. After an upgrade, won't users need to go to each
> >> instance of that processor and handle the new relationship? Right
> >> now I'd swagger we have at least five thousand ExecuteSQL
> >> processors in our environment; and while we have strong scripting
> >> skills in my NiFi team, I would not want to encounter this without that.
> >>>>>
> >>>>> Thanks,
> >>>>>  Peter
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Mark Payne [mailto:[hidden email]]
> >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> >>>>> To: [hidden email]
> >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> >>>>> caused failure in an attribute
> >>>>>
> >>>>> I agree - the notion of adding a "failure.reason" attribute is,
> >>>>> in
> >> my opinion, an anti-pattern that should be avoided. Relationships
> >> are not a workaround but rather the preferred approach in this
> >> scenario - an attribute I would consider a workaround. This is due
> >> to the fact that not only is it brittle and complex to add
> >> processors that route on such things, but there's no reason at all
> >> to assume that from release to release (even bug fix/increment
> >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> >>>>> Relationships offer a well-defined way to explicitly indicate
> >>>>> "these
> >> are the possible outcomes,"
> >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> >>>>>
> >>>>>
> >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> >>>>>>
> >>>>>> I think processors should really have well defined
> >>>>>> relationships
> >> for
> >>>>>> the error scenarios that need to be handled. Having the
> >>>>>> exception message is ok for a human who wants to see it, but in
> >>>>>> order to do anything with it in the flow you will have to have
> >>>>>> a bunch of parsing/interpreting of the message with a bunch of
> >>>>>> routing processors, which seems more brittle than just having
> >>>>>> the appropriate relationships.
> >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> >> [hidden email]> wrote:
> >>>>>>>
> >>>>>>> When a FlowFile is routed to failure, frequently there is no
> >> clear reason without looking into the actual error message.
> >>>>>>> Some processors work around this by creating many different
> >> relationships, but even then frequently the generic Failure
> >> relationship also provides little guidance.
> >>>>>>>
> >>>>>>> I've seen a few cases recently where processors are including
> >>>>>>> the
> >> exception message as an attribute on the FlowFile when routing to
> >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this
> >> be a standard pattern so that it's easier for users to route failures?
> >>>>>>>
> >>>>>>> --Peter
> >>>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Joe Witt
Peter,

I'm not clear on what you're are asking be done precisely and how far
it be carried.

You're right there are many processors which store exception/log
details as attributes on flowfiles before they route them
(success,failure, etc..).  This is fine and can be documented with the
WritesAttribute annotations to be helpful.

Where that model breaks down though and should never be used is when
someone wants to use the text of that String to safely/reliable
automate something.  If the 'failure' reason for a given situation is
precisely knowable enough and could reasonably be valuable for routing
it should be an explicit relationship.  Attributes for exceptions/log
values are useful provided they are 'advisory' only meaning largely
just intended for users/general awareness.  But not for automation or
to define an explicit interface.

So, with the above said can you clarify what you are precisely
requesting and for 'who' - who is the actor.

Thanks
On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> A one week bump on this thread. --Peter
>
> -----Original Message-----
> From: Peter Wicks (pwicks)
> Sent: Friday, November 2, 2018 11:54 AM
> To: [hidden email]
> Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> Dev Team,
>
> I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.
>
> Examples using this pattern already, with exception types/text written in FlowFile Attributes:
>  - GenerateTableFetch (Matt added back in 2017) does this for incoming FlowFiles that cause a SQL exception
>  - PutDatabaseRecord (Matt added also in 2017 with the original version of the processor)
>  - ValidateCSV and ValidateXML puts that validation cause as an Attribute (maybe not exactly the same, but feels similar)
>  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
>  - Couchbase Processors (Put/Get) provides the exception class name
>  - PutLambda (six different exception fields get written to Attributes)
>  - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
>  - ExecuteStreamCommand provides the error message from a script execution as an attribute.
>  - DeleteHDFS puts the error message as an Attribute
>  - ScanHBase puts scanning errors as an Attribute
>  - DeleteElasticSearch (both versions) put deletion failure messages as an attribute
>  - InfluxDB processors do this also (influxdb.error.message)
>  - ConvertExcelToCSV tracks conversion errors in an attribute
>  - RethinkDB processors do this too
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: James Srinivasan [mailto:[hidden email]]
> Sent: Tuesday, October 30, 2018 3:00 PM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> Apologies if I've missed this in the discussion so far - we use the InvokeHTTP processor a lot, and the invokehttp.java.exception.message attribute is really handy diving into why things have failed without having to match up logs with flow files (from a system with hundreds of processors making thousands of requests). We also route on invokehttp.status.code (e.g. to retry 403s due to a race hazard in an external system) but I don't imagine we'd route on
> invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.
>
> --
> James
> On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > Sorry for the delayed response, I've been traveling.
> >
> > Responses in order:
> >
> > Matt,
> > Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
> >
> > Koji,
> > Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
> >
> > Pierre,
> > My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
> >
> > I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
> >
> > Uwe,
> > That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
> >
> > But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]]
> > Sent: Saturday, October 27, 2018 10:46 AM
> > To: [hidden email]
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Do you really want to mix provenance and data lineage with logging/error information?
> >
> > Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> > If a user wants to use this for routing, why not ... or whatever the user wants to do.
> >
> > I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
> >
> > Regards,
> > Uwe
> >
> > > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> > >
> > > Adding another option to the list.
> > >
> > > Peter - if I understand correctly and based on my own experience,
> > > the idea is not to have an 'exception' attribute to perform custom
> > > routing after the failure relationship but rather have a more user
> > > friendly way to see what happened without going through all the logs for a given flow file.
> > >
> > > If that's correct, then could we add this information somehow to the
> > > provenance event generated by the processor? Ideally adding a new
> > > field to a provenance event or using the existing 'details' field?
> > >
> > > Pierre
> > >
> > >
> > > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura <[hidden email]>
> > > a écrit :
> > >
> > >> Hi all,
> > >>
> > >> I'd like to add another option to Matt's list of solutions:
> > >>
> > >> 4) Add a processor property, 'Enable detailed error handling'
> > >> (defaults to false), then toggle available list of relationships.
> > >> This way, existing flows such as Peter's don't have to change,
> > >> while he can opt-in new relationships. RouteOnAttribute can be a
> > >> reference implementation.
> > >>
> > >> I like the idea of thinking relationships as potential exceptions.
> > >> It can be better if relationships have hierarchy.
> > >> Some users need more granular relationships while others don't.
> > >> For NiFi 2.0 or later, supporting relationship hierarchy at
> > >> framework can mitigate having additional property at each processor.
> > >>
> > >> Thanks,
> > >> Koji
> > >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> > >> <[hidden email]>
> > >> wrote:
> > >>>
> > >>> Peter,
> > >>>
> > >>> Totally agree, RDBMS/JDBC is in a weird class as always, there is
> > >>> a teaspoon of exception types for an ocean of causes. For NiFi
> > >>> 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> > >>>
> > >>> 1) Add new relationships, but then your (possibly hundreds of)
> > >>> processors are invalid
> > >>> 2) Add new auto-terminated relationships, but then your
> > >>> previously-handled errors are "lost"
> > >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> > >>> responsible for parsing the error and handling it as desired.
> > >>>
> > >>> We could mitigate 1-2 with a tool that updates your flow/template
> > >>> by sending all new failure relationships to the same target as the
> > >>> existing one, but then the tool itself suffers from
> > >>> maintainability issues (as does option #3). If we could recognize
> > >>> that the new relationships are self-terminated and then send the
> > >>> errors out to the original failure relationship, that could be
> > >>> quite confusing to the user, especially as time goes on (how to suppress the "new"
> > >>> errors, e.g.).
> > >>>
> > >>> IMHO I think we're between a rock and a hard place here, I guess
> > >>> with great entropy comes great responsibility :P
> > >>>
> > >>> P.S. For your use case, is the workaround to just keep retrying?
> > >>> Or are there other constraints at play?
> > >>>
> > >>> Regards,
> > >>> Matt
> > >>>
> > >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> > >>> <[hidden email]>
> > >> wrote:
> > >>>>
> > >>>> Matt,
> > >>>>
> > >>>> If I were to split an existing failure relationship into several
> > >> relationships, I do not think I would want to auto-terminate in most cases.
> > >> Specifically, I'm interested in a failure relationship for a
> > >> database disconnect during SQL execution (database was online when
> > >> the connection was verified in the DBCP pool, but went down during
> > >> execution). If I were to find a way to separate this into its own
> > >> relationship, I do not think most users would appreciate it being a
> > >> condition silently not handled by the normal failure path.
> > >>>>
> > >>>> Thanks,
> > >>>>  Peter
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Matt Burgess [mailto:[hidden email]]
> > >>>> Sent: Friday, October 26, 2018 10:18 AM
> > >>>> To: [hidden email]
> > >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > >> caused failure in an attribute
> > >>>>
> > >>>> NiFi (as of the last couple releases I think) has the ability to
> > >>>> set
> > >> auto-terminating relationships; this IMO is one of those use cases
> > >> (for NiFi 1.x). If new relationships are added, they could default
> > >> to auto-terminate; then the existing processors should remain valid.
> > >>>> However we might want an "omnibus Jira" to capture those
> > >>>> relationships
> > >> we'd like to remove the auto-termination from in NiFi 2.0.
> > >>>>
> > >>>> Regards,
> > >>>> Matt
> > >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> > >> [hidden email]> wrote:
> > >>>>>
> > >>>>> Mark,
> > >>>>>
> > >>>>> I agree with you that this is the best option in general terms.
> > >> After thinking about it some more I think the biggest use case is
> > >> for troubleshooting. If a file routes to failure, you need to be
> > >> watching the UI to see what the exception was. An admin may have
> > >> access to the NiFi log files and could grep the error out, but a
> > >> normal user who checks in on the flow and sees a FlowFile in the
> > >> error queue will not know what the cause was; this is especially
> > >> frustrating if retrying the file works without failure the second
> > >> time... Capturing the error message in an attribute makes this easy to find.
> > >>>>>
> > >>>>> One thing I worry about too is adding new relationships to core
> > >> processors. After an upgrade, won't users need to go to each
> > >> instance of that processor and handle the new relationship? Right
> > >> now I'd swagger we have at least five thousand ExecuteSQL
> > >> processors in our environment; and while we have strong scripting
> > >> skills in my NiFi team, I would not want to encounter this without that.
> > >>>>>
> > >>>>> Thanks,
> > >>>>>  Peter
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Mark Payne [mailto:[hidden email]]
> > >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> > >>>>> To: [hidden email]
> > >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> > >>>>> caused failure in an attribute
> > >>>>>
> > >>>>> I agree - the notion of adding a "failure.reason" attribute is,
> > >>>>> in
> > >> my opinion, an anti-pattern that should be avoided. Relationships
> > >> are not a workaround but rather the preferred approach in this
> > >> scenario - an attribute I would consider a workaround. This is due
> > >> to the fact that not only is it brittle and complex to add
> > >> processors that route on such things, but there's no reason at all
> > >> to assume that from release to release (even bug fix/increment
> > >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > >>>>> Relationships offer a well-defined way to explicitly indicate
> > >>>>> "these
> > >> are the possible outcomes,"
> > >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> > >>>>>
> > >>>>>
> > >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > >>>>>>
> > >>>>>> I think processors should really have well defined
> > >>>>>> relationships
> > >> for
> > >>>>>> the error scenarios that need to be handled. Having the
> > >>>>>> exception message is ok for a human who wants to see it, but in
> > >>>>>> order to do anything with it in the flow you will have to have
> > >>>>>> a bunch of parsing/interpreting of the message with a bunch of
> > >>>>>> routing processors, which seems more brittle than just having
> > >>>>>> the appropriate relationships.
> > >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> > >> [hidden email]> wrote:
> > >>>>>>>
> > >>>>>>> When a FlowFile is routed to failure, frequently there is no
> > >> clear reason without looking into the actual error message.
> > >>>>>>> Some processors work around this by creating many different
> > >> relationships, but even then frequently the generic Failure
> > >> relationship also provides little guidance.
> > >>>>>>>
> > >>>>>>> I've seen a few cases recently where processors are including
> > >>>>>>> the
> > >> exception message as an attribute on the FlowFile when routing to
> > >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should this
> > >> be a standard pattern so that it's easier for users to route failures?
> > >>>>>>>
> > >>>>>>> --Peter
> > >>>>>
> > >>
> >
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Joe,

Several different opinions have been expressed by Matt Burgess, Mark Payne and Bryan Bende about whether we should be storing exception information in attributes, and the pros and cons, in this thread. Those opinions generally matched yours, which is that a well-defined relationship is the best approach. I don't disagree in anyway with the consensus, I agree that the best solution is to use a well-defined relationship.

The pattern I see in the Processor list I provided below is that almost all of the processors work with external systems (outside of NiFi), and in many cases the number of distinct exception classes that can occur is low, but the variety of exceptions is high (JDBC/Stream Command). Matt did a good job of discussing this for JDBC type processors in his reply.

My users are desperate for these error details, especially on ExecuteSQL; and I won't lie, users are absolutely going to parse the exceptions and use RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I don't know that this will be the primary use case, as troubleshooting using this information will also be a major facet of its usefulness) The problem, especially with JDBC, is that I don't see a reasonable alternative. There are so many different JDBC drivers, and NiFi will only see a SQLException type with differing text. Even if we went down the route of putting exception parsers into the DBAdapters to provide per driver failure handling, all we've really done is move the fragileness into NiFi, where it's hard coded until the next release.

Thank you,
  Peter

-----Original Message-----
From: Joe Witt [mailto:[hidden email]]
Sent: Friday, November 9, 2018 12:23 PM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter,

I'm not clear on what you're are asking be done precisely and how far it be carried.

You're right there are many processors which store exception/log details as attributes on flowfiles before they route them (success,failure, etc..).  This is fine and can be documented with the WritesAttribute annotations to be helpful.

Where that model breaks down though and should never be used is when someone wants to use the text of that String to safely/reliable automate something.  If the 'failure' reason for a given situation is precisely knowable enough and could reasonably be valuable for routing it should be an explicit relationship.  Attributes for exceptions/log values are useful provided they are 'advisory' only meaning largely just intended for users/general awareness.  But not for automation or to define an explicit interface.

So, with the above said can you clarify what you are precisely requesting and for 'who' - who is the actor.

Thanks
On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> A one week bump on this thread. --Peter
>
> -----Original Message-----
> From: Peter Wicks (pwicks)
> Sent: Friday, November 2, 2018 11:54 AM
> To: [hidden email]
> Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
>
> Dev Team,
>
> I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.
>
> Examples using this pattern already, with exception types/text written in FlowFile Attributes:
>  - GenerateTableFetch (Matt added back in 2017) does this for incoming
> FlowFiles that cause a SQL exception
>  - PutDatabaseRecord (Matt added also in 2017 with the original
> version of the processor)
>  - ValidateCSV and ValidateXML puts that validation cause as an
> Attribute (maybe not exactly the same, but feels similar)
>  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
>  - Couchbase Processors (Put/Get) provides the exception class name
>  - PutLambda (six different exception fields get written to
> Attributes)
>  - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
>  - ExecuteStreamCommand provides the error message from a script execution as an attribute.
>  - DeleteHDFS puts the error message as an Attribute
>  - ScanHBase puts scanning errors as an Attribute
>  - DeleteElasticSearch (both versions) put deletion failure messages
> as an attribute
>  - InfluxDB processors do this also (influxdb.error.message)
>  - ConvertExcelToCSV tracks conversion errors in an attribute
>  - RethinkDB processors do this too
>
> Thanks,
>   Peter
>
> -----Original Message-----
> From: James Srinivasan [mailto:[hidden email]]
> Sent: Tuesday, October 30, 2018 3:00 PM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
>
> Apologies if I've missed this in the discussion so far - we use the
> InvokeHTTP processor a lot, and the invokehttp.java.exception.message
> attribute is really handy diving into why things have failed without
> having to match up logs with flow files (from a system with hundreds
> of processors making thousands of requests). We also route on
> invokehttp.status.code (e.g. to retry 403s due to a race hazard in an
> external system) but I don't imagine we'd route on
> invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.
>
> --
> James
> On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > Sorry for the delayed response, I've been traveling.
> >
> > Responses in order:
> >
> > Matt,
> > Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
> >
> > Koji,
> > Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
> >
> > Pierre,
> > My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
> >
> > I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
> >
> > Uwe,
> > That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
> >
> > But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]]
> > Sent: Saturday, October 27, 2018 10:46 AM
> > To: [hidden email]
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Do you really want to mix provenance and data lineage with logging/error information?
> >
> > Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> > If a user wants to use this for routing, why not ... or whatever the user wants to do.
> >
> > I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
> >
> > Regards,
> > Uwe
> >
> > > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> > >
> > > Adding another option to the list.
> > >
> > > Peter - if I understand correctly and based on my own experience,
> > > the idea is not to have an 'exception' attribute to perform custom
> > > routing after the failure relationship but rather have a more user
> > > friendly way to see what happened without going through all the logs for a given flow file.
> > >
> > > If that's correct, then could we add this information somehow to
> > > the provenance event generated by the processor? Ideally adding a
> > > new field to a provenance event or using the existing 'details' field?
> > >
> > > Pierre
> > >
> > >
> > > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura
> > > <[hidden email]> a écrit :
> > >
> > >> Hi all,
> > >>
> > >> I'd like to add another option to Matt's list of solutions:
> > >>
> > >> 4) Add a processor property, 'Enable detailed error handling'
> > >> (defaults to false), then toggle available list of relationships.
> > >> This way, existing flows such as Peter's don't have to change,
> > >> while he can opt-in new relationships. RouteOnAttribute can be a
> > >> reference implementation.
> > >>
> > >> I like the idea of thinking relationships as potential exceptions.
> > >> It can be better if relationships have hierarchy.
> > >> Some users need more granular relationships while others don't.
> > >> For NiFi 2.0 or later, supporting relationship hierarchy at
> > >> framework can mitigate having additional property at each processor.
> > >>
> > >> Thanks,
> > >> Koji
> > >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> > >> <[hidden email]>
> > >> wrote:
> > >>>
> > >>> Peter,
> > >>>
> > >>> Totally agree, RDBMS/JDBC is in a weird class as always, there
> > >>> is a teaspoon of exception types for an ocean of causes. For
> > >>> NiFi 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> > >>>
> > >>> 1) Add new relationships, but then your (possibly hundreds of)
> > >>> processors are invalid
> > >>> 2) Add new auto-terminated relationships, but then your
> > >>> previously-handled errors are "lost"
> > >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> > >>> responsible for parsing the error and handling it as desired.
> > >>>
> > >>> We could mitigate 1-2 with a tool that updates your
> > >>> flow/template by sending all new failure relationships to the
> > >>> same target as the existing one, but then the tool itself
> > >>> suffers from maintainability issues (as does option #3). If we
> > >>> could recognize that the new relationships are self-terminated
> > >>> and then send the errors out to the original failure
> > >>> relationship, that could be quite confusing to the user, especially as time goes on (how to suppress the "new"
> > >>> errors, e.g.).
> > >>>
> > >>> IMHO I think we're between a rock and a hard place here, I guess
> > >>> with great entropy comes great responsibility :P
> > >>>
> > >>> P.S. For your use case, is the workaround to just keep retrying?
> > >>> Or are there other constraints at play?
> > >>>
> > >>> Regards,
> > >>> Matt
> > >>>
> > >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> > >>> <[hidden email]>
> > >> wrote:
> > >>>>
> > >>>> Matt,
> > >>>>
> > >>>> If I were to split an existing failure relationship into
> > >>>> several
> > >> relationships, I do not think I would want to auto-terminate in most cases.
> > >> Specifically, I'm interested in a failure relationship for a
> > >> database disconnect during SQL execution (database was online
> > >> when the connection was verified in the DBCP pool, but went down
> > >> during execution). If I were to find a way to separate this into
> > >> its own relationship, I do not think most users would appreciate
> > >> it being a condition silently not handled by the normal failure path.
> > >>>>
> > >>>> Thanks,
> > >>>>  Peter
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Matt Burgess [mailto:[hidden email]]
> > >>>> Sent: Friday, October 26, 2018 10:18 AM
> > >>>> To: [hidden email]
> > >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception
> > >>>> that
> > >> caused failure in an attribute
> > >>>>
> > >>>> NiFi (as of the last couple releases I think) has the ability
> > >>>> to set
> > >> auto-terminating relationships; this IMO is one of those use
> > >> cases (for NiFi 1.x). If new relationships are added, they could
> > >> default to auto-terminate; then the existing processors should remain valid.
> > >>>> However we might want an "omnibus Jira" to capture those
> > >>>> relationships
> > >> we'd like to remove the auto-termination from in NiFi 2.0.
> > >>>>
> > >>>> Regards,
> > >>>> Matt
> > >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> > >> [hidden email]> wrote:
> > >>>>>
> > >>>>> Mark,
> > >>>>>
> > >>>>> I agree with you that this is the best option in general terms.
> > >> After thinking about it some more I think the biggest use case is
> > >> for troubleshooting. If a file routes to failure, you need to be
> > >> watching the UI to see what the exception was. An admin may have
> > >> access to the NiFi log files and could grep the error out, but a
> > >> normal user who checks in on the flow and sees a FlowFile in the
> > >> error queue will not know what the cause was; this is especially
> > >> frustrating if retrying the file works without failure the second
> > >> time... Capturing the error message in an attribute makes this easy to find.
> > >>>>>
> > >>>>> One thing I worry about too is adding new relationships to
> > >>>>> core
> > >> processors. After an upgrade, won't users need to go to each
> > >> instance of that processor and handle the new relationship? Right
> > >> now I'd swagger we have at least five thousand ExecuteSQL
> > >> processors in our environment; and while we have strong scripting
> > >> skills in my NiFi team, I would not want to encounter this without that.
> > >>>>>
> > >>>>> Thanks,
> > >>>>>  Peter
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Mark Payne [mailto:[hidden email]]
> > >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> > >>>>> To: [hidden email]
> > >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> > >>>>> caused failure in an attribute
> > >>>>>
> > >>>>> I agree - the notion of adding a "failure.reason" attribute
> > >>>>> is, in
> > >> my opinion, an anti-pattern that should be avoided. Relationships
> > >> are not a workaround but rather the preferred approach in this
> > >> scenario - an attribute I would consider a workaround. This is
> > >> due to the fact that not only is it brittle and complex to add
> > >> processors that route on such things, but there's no reason at
> > >> all to assume that from release to release (even bug
> > >> fix/increment
> > >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > >>>>> Relationships offer a well-defined way to explicitly indicate
> > >>>>> "these
> > >> are the possible outcomes,"
> > >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> > >>>>>
> > >>>>>
> > >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > >>>>>>
> > >>>>>> I think processors should really have well defined
> > >>>>>> relationships
> > >> for
> > >>>>>> the error scenarios that need to be handled. Having the
> > >>>>>> exception message is ok for a human who wants to see it, but
> > >>>>>> in order to do anything with it in the flow you will have to
> > >>>>>> have a bunch of parsing/interpreting of the message with a
> > >>>>>> bunch of routing processors, which seems more brittle than
> > >>>>>> just having the appropriate relationships.
> > >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> > >> [hidden email]> wrote:
> > >>>>>>>
> > >>>>>>> When a FlowFile is routed to failure, frequently there is no
> > >> clear reason without looking into the actual error message.
> > >>>>>>> Some processors work around this by creating many different
> > >> relationships, but even then frequently the generic Failure
> > >> relationship also provides little guidance.
> > >>>>>>>
> > >>>>>>> I've seen a few cases recently where processors are
> > >>>>>>> including the
> > >> exception message as an attribute on the FlowFile when routing to
> > >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should
> > >> this be a standard pattern so that it's easier for users to route failures?
> > >>>>>>>
> > >>>>>>> --Peter
> > >>>>>
> > >>
> >
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Joe Witt
Peter

Ok cool.  So i think we agree on the state of things.  And for
processors you want to add more details in failure cases to you can do
so (provided we're not just bloating attributes all over).  And we'll
recognize that this model is basically to help users and will likely
be abused and be brittle.  But I think i'm saying 'there is nothing
new to do now' then right?

Do you agree?

Thanks
On Fri, Nov 9, 2018 at 3:59 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Joe,
>
> Several different opinions have been expressed by Matt Burgess, Mark Payne and Bryan Bende about whether we should be storing exception information in attributes, and the pros and cons, in this thread. Those opinions generally matched yours, which is that a well-defined relationship is the best approach. I don't disagree in anyway with the consensus, I agree that the best solution is to use a well-defined relationship.
>
> The pattern I see in the Processor list I provided below is that almost all of the processors work with external systems (outside of NiFi), and in many cases the number of distinct exception classes that can occur is low, but the variety of exceptions is high (JDBC/Stream Command). Matt did a good job of discussing this for JDBC type processors in his reply.
>
> My users are desperate for these error details, especially on ExecuteSQL; and I won't lie, users are absolutely going to parse the exceptions and use RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I don't know that this will be the primary use case, as troubleshooting using this information will also be a major facet of its usefulness) The problem, especially with JDBC, is that I don't see a reasonable alternative. There are so many different JDBC drivers, and NiFi will only see a SQLException type with differing text. Even if we went down the route of putting exception parsers into the DBAdapters to provide per driver failure handling, all we've really done is move the fragileness into NiFi, where it's hard coded until the next release.
>
> Thank you,
>   Peter
>
> -----Original Message-----
> From: Joe Witt [mailto:[hidden email]]
> Sent: Friday, November 9, 2018 12:23 PM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute
>
> Peter,
>
> I'm not clear on what you're are asking be done precisely and how far it be carried.
>
> You're right there are many processors which store exception/log details as attributes on flowfiles before they route them (success,failure, etc..).  This is fine and can be documented with the WritesAttribute annotations to be helpful.
>
> Where that model breaks down though and should never be used is when someone wants to use the text of that String to safely/reliable automate something.  If the 'failure' reason for a given situation is precisely knowable enough and could reasonably be valuable for routing it should be an explicit relationship.  Attributes for exceptions/log values are useful provided they are 'advisory' only meaning largely just intended for users/general awareness.  But not for automation or to define an explicit interface.
>
> So, with the above said can you clarify what you are precisely requesting and for 'who' - who is the actor.
>
> Thanks
> On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > A one week bump on this thread. --Peter
> >
> > -----Original Message-----
> > From: Peter Wicks (pwicks)
> > Sent: Friday, November 2, 2018 11:54 AM
> > To: [hidden email]
> > Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Dev Team,
> >
> > I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.
> >
> > Examples using this pattern already, with exception types/text written in FlowFile Attributes:
> >  - GenerateTableFetch (Matt added back in 2017) does this for incoming
> > FlowFiles that cause a SQL exception
> >  - PutDatabaseRecord (Matt added also in 2017 with the original
> > version of the processor)
> >  - ValidateCSV and ValidateXML puts that validation cause as an
> > Attribute (maybe not exactly the same, but feels similar)
> >  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
> >  - Couchbase Processors (Put/Get) provides the exception class name
> >  - PutLambda (six different exception fields get written to
> > Attributes)
> >  - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
> >  - ExecuteStreamCommand provides the error message from a script execution as an attribute.
> >  - DeleteHDFS puts the error message as an Attribute
> >  - ScanHBase puts scanning errors as an Attribute
> >  - DeleteElasticSearch (both versions) put deletion failure messages
> > as an attribute
> >  - InfluxDB processors do this also (influxdb.error.message)
> >  - ConvertExcelToCSV tracks conversion errors in an attribute
> >  - RethinkDB processors do this too
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: James Srinivasan [mailto:[hidden email]]
> > Sent: Tuesday, October 30, 2018 3:00 PM
> > To: [hidden email]
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Apologies if I've missed this in the discussion so far - we use the
> > InvokeHTTP processor a lot, and the invokehttp.java.exception.message
> > attribute is really handy diving into why things have failed without
> > having to match up logs with flow files (from a system with hundreds
> > of processors making thousands of requests). We also route on
> > invokehttp.status.code (e.g. to retry 403s due to a race hazard in an
> > external system) but I don't imagine we'd route on
> > invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.
> >
> > --
> > James
> > On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:
> > >
> > > Sorry for the delayed response, I've been traveling.
> > >
> > > Responses in order:
> > >
> > > Matt,
> > > Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
> > >
> > > Koji,
> > > Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
> > >
> > > Pierre,
> > > My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
> > >
> > > I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
> > >
> > > Uwe,
> > > That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
> > >
> > > But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
> > >
> > > Thanks,
> > >   Peter
> > >
> > > -----Original Message-----
> > > From: [hidden email] [mailto:[hidden email]]
> > > Sent: Saturday, October 27, 2018 10:46 AM
> > > To: [hidden email]
> > > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > > caused failure in an attribute
> > >
> > > Do you really want to mix provenance and data lineage with logging/error information?
> > >
> > > Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> > > If a user wants to use this for routing, why not ... or whatever the user wants to do.
> > >
> > > I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
> > >
> > > Regards,
> > > Uwe
> > >
> > > > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> > > >
> > > > Adding another option to the list.
> > > >
> > > > Peter - if I understand correctly and based on my own experience,
> > > > the idea is not to have an 'exception' attribute to perform custom
> > > > routing after the failure relationship but rather have a more user
> > > > friendly way to see what happened without going through all the logs for a given flow file.
> > > >
> > > > If that's correct, then could we add this information somehow to
> > > > the provenance event generated by the processor? Ideally adding a
> > > > new field to a provenance event or using the existing 'details' field?
> > > >
> > > > Pierre
> > > >
> > > >
> > > > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura
> > > > <[hidden email]> a écrit :
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to add another option to Matt's list of solutions:
> > > >>
> > > >> 4) Add a processor property, 'Enable detailed error handling'
> > > >> (defaults to false), then toggle available list of relationships.
> > > >> This way, existing flows such as Peter's don't have to change,
> > > >> while he can opt-in new relationships. RouteOnAttribute can be a
> > > >> reference implementation.
> > > >>
> > > >> I like the idea of thinking relationships as potential exceptions.
> > > >> It can be better if relationships have hierarchy.
> > > >> Some users need more granular relationships while others don't.
> > > >> For NiFi 2.0 or later, supporting relationship hierarchy at
> > > >> framework can mitigate having additional property at each processor.
> > > >>
> > > >> Thanks,
> > > >> Koji
> > > >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> > > >> <[hidden email]>
> > > >> wrote:
> > > >>>
> > > >>> Peter,
> > > >>>
> > > >>> Totally agree, RDBMS/JDBC is in a weird class as always, there
> > > >>> is a teaspoon of exception types for an ocean of causes. For
> > > >>> NiFi 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> > > >>>
> > > >>> 1) Add new relationships, but then your (possibly hundreds of)
> > > >>> processors are invalid
> > > >>> 2) Add new auto-terminated relationships, but then your
> > > >>> previously-handled errors are "lost"
> > > >>> 3) Add an attribute, but then each NiFi instance/release/flow is
> > > >>> responsible for parsing the error and handling it as desired.
> > > >>>
> > > >>> We could mitigate 1-2 with a tool that updates your
> > > >>> flow/template by sending all new failure relationships to the
> > > >>> same target as the existing one, but then the tool itself
> > > >>> suffers from maintainability issues (as does option #3). If we
> > > >>> could recognize that the new relationships are self-terminated
> > > >>> and then send the errors out to the original failure
> > > >>> relationship, that could be quite confusing to the user, especially as time goes on (how to suppress the "new"
> > > >>> errors, e.g.).
> > > >>>
> > > >>> IMHO I think we're between a rock and a hard place here, I guess
> > > >>> with great entropy comes great responsibility :P
> > > >>>
> > > >>> P.S. For your use case, is the workaround to just keep retrying?
> > > >>> Or are there other constraints at play?
> > > >>>
> > > >>> Regards,
> > > >>> Matt
> > > >>>
> > > >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> > > >>> <[hidden email]>
> > > >> wrote:
> > > >>>>
> > > >>>> Matt,
> > > >>>>
> > > >>>> If I were to split an existing failure relationship into
> > > >>>> several
> > > >> relationships, I do not think I would want to auto-terminate in most cases.
> > > >> Specifically, I'm interested in a failure relationship for a
> > > >> database disconnect during SQL execution (database was online
> > > >> when the connection was verified in the DBCP pool, but went down
> > > >> during execution). If I were to find a way to separate this into
> > > >> its own relationship, I do not think most users would appreciate
> > > >> it being a condition silently not handled by the normal failure path.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>  Peter
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: Matt Burgess [mailto:[hidden email]]
> > > >>>> Sent: Friday, October 26, 2018 10:18 AM
> > > >>>> To: [hidden email]
> > > >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception
> > > >>>> that
> > > >> caused failure in an attribute
> > > >>>>
> > > >>>> NiFi (as of the last couple releases I think) has the ability
> > > >>>> to set
> > > >> auto-terminating relationships; this IMO is one of those use
> > > >> cases (for NiFi 1.x). If new relationships are added, they could
> > > >> default to auto-terminate; then the existing processors should remain valid.
> > > >>>> However we might want an "omnibus Jira" to capture those
> > > >>>> relationships
> > > >> we'd like to remove the auto-termination from in NiFi 2.0.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Matt
> > > >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> > > >> [hidden email]> wrote:
> > > >>>>>
> > > >>>>> Mark,
> > > >>>>>
> > > >>>>> I agree with you that this is the best option in general terms.
> > > >> After thinking about it some more I think the biggest use case is
> > > >> for troubleshooting. If a file routes to failure, you need to be
> > > >> watching the UI to see what the exception was. An admin may have
> > > >> access to the NiFi log files and could grep the error out, but a
> > > >> normal user who checks in on the flow and sees a FlowFile in the
> > > >> error queue will not know what the cause was; this is especially
> > > >> frustrating if retrying the file works without failure the second
> > > >> time... Capturing the error message in an attribute makes this easy to find.
> > > >>>>>
> > > >>>>> One thing I worry about too is adding new relationships to
> > > >>>>> core
> > > >> processors. After an upgrade, won't users need to go to each
> > > >> instance of that processor and handle the new relationship? Right
> > > >> now I'd swagger we have at least five thousand ExecuteSQL
> > > >> processors in our environment; and while we have strong scripting
> > > >> skills in my NiFi team, I would not want to encounter this without that.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>>  Peter
> > > >>>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Mark Payne [mailto:[hidden email]]
> > > >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> > > >>>>> To: [hidden email]
> > > >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> > > >>>>> caused failure in an attribute
> > > >>>>>
> > > >>>>> I agree - the notion of adding a "failure.reason" attribute
> > > >>>>> is, in
> > > >> my opinion, an anti-pattern that should be avoided. Relationships
> > > >> are not a workaround but rather the preferred approach in this
> > > >> scenario - an attribute I would consider a workaround. This is
> > > >> due to the fact that not only is it brittle and complex to add
> > > >> processors that route on such things, but there's no reason at
> > > >> all to assume that from release to release (even bug
> > > >> fix/increment
> > > >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > > >>>>> Relationships offer a well-defined way to explicitly indicate
> > > >>>>> "these
> > > >> are the possible outcomes,"
> > > >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > > >>>>>>
> > > >>>>>> I think processors should really have well defined
> > > >>>>>> relationships
> > > >> for
> > > >>>>>> the error scenarios that need to be handled. Having the
> > > >>>>>> exception message is ok for a human who wants to see it, but
> > > >>>>>> in order to do anything with it in the flow you will have to
> > > >>>>>> have a bunch of parsing/interpreting of the message with a
> > > >>>>>> bunch of routing processors, which seems more brittle than
> > > >>>>>> just having the appropriate relationships.
> > > >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> > > >> [hidden email]> wrote:
> > > >>>>>>>
> > > >>>>>>> When a FlowFile is routed to failure, frequently there is no
> > > >> clear reason without looking into the actual error message.
> > > >>>>>>> Some processors work around this by creating many different
> > > >> relationships, but even then frequently the generic Failure
> > > >> relationship also provides little guidance.
> > > >>>>>>>
> > > >>>>>>> I've seen a few cases recently where processors are
> > > >>>>>>> including the
> > > >> exception message as an attribute on the FlowFile when routing to
> > > >> failure (ExecuteStreamCommand, new PR for ExecuteSQL). Should
> > > >> this be a standard pattern so that it's easier for users to route failures?
> > > >>>>>>>
> > > >>>>>>> --Peter
> > > >>>>>
> > > >>
> > >
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter Wicks (pwicks)
Joe,

The only new thing we can do is Matt can finish reviewing the PR for NIFI-5744, which adds this as a feature to ExecuteSQL, as he was waiting for this discussion to come to a close. https://github.com/apache/nifi/pull/3107#issuecomment-433260483
But apart from that, nothing new to do now.

Thanks,
  Peter

-----Original Message-----
From: Joe Witt [mailto:[hidden email]]
Sent: Friday, November 9, 2018 2:36 PM
To: [hidden email]
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

Peter

Ok cool.  So i think we agree on the state of things.  And for processors you want to add more details in failure cases to you can do so (provided we're not just bloating attributes all over).  And we'll recognize that this model is basically to help users and will likely be abused and be brittle.  But I think i'm saying 'there is nothing new to do now' then right?

Do you agree?

Thanks
On Fri, Nov 9, 2018 at 3:59 PM Peter Wicks (pwicks) <[hidden email]> wrote:

>
> Joe,
>
> Several different opinions have been expressed by Matt Burgess, Mark Payne and Bryan Bende about whether we should be storing exception information in attributes, and the pros and cons, in this thread. Those opinions generally matched yours, which is that a well-defined relationship is the best approach. I don't disagree in anyway with the consensus, I agree that the best solution is to use a well-defined relationship.
>
> The pattern I see in the Processor list I provided below is that almost all of the processors work with external systems (outside of NiFi), and in many cases the number of distinct exception classes that can occur is low, but the variety of exceptions is high (JDBC/Stream Command). Matt did a good job of discussing this for JDBC type processors in his reply.
>
> My users are desperate for these error details, especially on ExecuteSQL; and I won't lie, users are absolutely going to parse the exceptions and use RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I don't know that this will be the primary use case, as troubleshooting using this information will also be a major facet of its usefulness) The problem, especially with JDBC, is that I don't see a reasonable alternative. There are so many different JDBC drivers, and NiFi will only see a SQLException type with differing text. Even if we went down the route of putting exception parsers into the DBAdapters to provide per driver failure handling, all we've really done is move the fragileness into NiFi, where it's hard coded until the next release.
>
> Thank you,
>   Peter
>
> -----Original Message-----
> From: Joe Witt [mailto:[hidden email]]
> Sent: Friday, November 9, 2018 12:23 PM
> To: [hidden email]
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> caused failure in an attribute
>
> Peter,
>
> I'm not clear on what you're are asking be done precisely and how far it be carried.
>
> You're right there are many processors which store exception/log details as attributes on flowfiles before they route them (success,failure, etc..).  This is fine and can be documented with the WritesAttribute annotations to be helpful.
>
> Where that model breaks down though and should never be used is when someone wants to use the text of that String to safely/reliable automate something.  If the 'failure' reason for a given situation is precisely knowable enough and could reasonably be valuable for routing it should be an explicit relationship.  Attributes for exceptions/log values are useful provided they are 'advisory' only meaning largely just intended for users/general awareness.  But not for automation or to define an explicit interface.
>
> So, with the above said can you clarify what you are precisely requesting and for 'who' - who is the actor.
>
> Thanks
> On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks) <[hidden email]> wrote:
> >
> > A one week bump on this thread. --Peter
> >
> > -----Original Message-----
> > From: Peter Wicks (pwicks)
> > Sent: Friday, November 2, 2018 11:54 AM
> > To: [hidden email]
> > Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Dev Team,
> >
> > I don’t think we've reached a conclusion on this discussion, but would like too. I had not done enough research when I originally suggested this as a, "New Pattern". Having done a bit more research now, I'd say this is already a well established pattern.
> >
> > Examples using this pattern already, with exception types/text written in FlowFile Attributes:
> >  - GenerateTableFetch (Matt added back in 2017) does this for
> > incoming FlowFiles that cause a SQL exception
> >  - PutDatabaseRecord (Matt added also in 2017 with the original
> > version of the processor)
> >  - ValidateCSV and ValidateXML puts that validation cause as an
> > Attribute (maybe not exactly the same, but feels similar)
> >  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
> >  - Couchbase Processors (Put/Get) provides the exception class name
> >  - PutLambda (six different exception fields get written to
> > Attributes)
> >  - Other AWS processors are similar in how they handle this, such as the Dynamo processor.
> >  - ExecuteStreamCommand provides the error message from a script execution as an attribute.
> >  - DeleteHDFS puts the error message as an Attribute
> >  - ScanHBase puts scanning errors as an Attribute
> >  - DeleteElasticSearch (both versions) put deletion failure messages
> > as an attribute
> >  - InfluxDB processors do this also (influxdb.error.message)
> >  - ConvertExcelToCSV tracks conversion errors in an attribute
> >  - RethinkDB processors do this too
> >
> > Thanks,
> >   Peter
> >
> > -----Original Message-----
> > From: James Srinivasan [mailto:[hidden email]]
> > Sent: Tuesday, October 30, 2018 3:00 PM
> > To: [hidden email]
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Apologies if I've missed this in the discussion so far - we use the
> > InvokeHTTP processor a lot, and the
> > invokehttp.java.exception.message attribute is really handy diving
> > into why things have failed without having to match up logs with
> > flow files (from a system with hundreds of processors making
> > thousands of requests). We also route on invokehttp.status.code
> > (e.g. to retry 403s due to a race hazard in an external system) but
> > I don't imagine we'd route on
> > invokehttp.java.exception.* since (as others have mentioned) it looks pretty fragile.
> >
> > --
> > James
> > On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks) <[hidden email]> wrote:
> > >
> > > Sorry for the delayed response, I've been traveling.
> > >
> > > Responses in order:
> > >
> > > Matt,
> > > Right now our work around is to keep retrying errors, usually with a penalty or control rate processor. The problem is that we don't know why it failed, and thus don't know if retry is the correct option. I have not found a way, without code change, to be able to determine if retrying is the correct option or not.
> > >
> > > Koji,
> > > Detailed error handling would indeed be a good workaround to the problems raised by myself and Matt. I have not see this on other processors, but that does not mean we can't do it of course.  I agree that having some kind of hierarchy system for errors would be a much better solution.
> > >
> > > Pierre,
> > > My primary use case is as you described, a user friendly way to see what actually happened without going through the log files. But I while I know it's fragile, routing on exception text stored in an attribute still feels like a very legitimate use case. I know in many systems there are good exception types that can be used to route FlowFile's to appropriate failure relationships, but as Matt mentioned, JDBC has just a handful of exception types for a very large number of possible error types.
> > >
> > > I think this is probably the same rational that was used to justify this feature for Execute Stream Command's inclusion of this feature in the past. To many possible failure conditions to handle with just a few failure conditions.
> > >
> > > Uwe,
> > > That is a fair question, but it doesn't feel like such a bad fit to me. It's like extra metadata on the lineage, "We followed this path through the flow because we had exception " .... " which caused the FlowFile to follow the failure route".
> > >
> > > But I still prefer the attribute, it could be another option for Detailed error handling; instead of, or in addition to, additional relationships for failures, the exception text could be included in an attribute.
> > >
> > > Thanks,
> > >   Peter
> > >
> > > -----Original Message-----
> > > From: [hidden email] [mailto:[hidden email]]
> > > Sent: Saturday, October 27, 2018 10:46 AM
> > > To: [hidden email]
> > > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
> > > caused failure in an attribute
> > >
> > > Do you really want to mix provenance and data lineage with logging/error information?
> > >
> > > Writing exception information/logging information within an attribute is not a bad idea in my opinion.
> > > If a user wants to use this for routing, why not ... or whatever the user wants to do.
> > >
> > > I could imagine that this can be switched on and off by a property via config. E.g. in development on and on production off.
> > >
> > > Regards,
> > > Uwe
> > >
> > > > Am 26.10.2018 um 09:26 schrieb Pierre Villard <[hidden email]>:
> > > >
> > > > Adding another option to the list.
> > > >
> > > > Peter - if I understand correctly and based on my own
> > > > experience, the idea is not to have an 'exception' attribute to
> > > > perform custom routing after the failure relationship but rather
> > > > have a more user friendly way to see what happened without going through all the logs for a given flow file.
> > > >
> > > > If that's correct, then could we add this information somehow to
> > > > the provenance event generated by the processor? Ideally adding
> > > > a new field to a provenance event or using the existing 'details' field?
> > > >
> > > > Pierre
> > > >
> > > >
> > > > Le ven. 26 oct. 2018 à 08:40, Koji Kawamura
> > > > <[hidden email]> a écrit :
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to add another option to Matt's list of solutions:
> > > >>
> > > >> 4) Add a processor property, 'Enable detailed error handling'
> > > >> (defaults to false), then toggle available list of relationships.
> > > >> This way, existing flows such as Peter's don't have to change,
> > > >> while he can opt-in new relationships. RouteOnAttribute can be
> > > >> a reference implementation.
> > > >>
> > > >> I like the idea of thinking relationships as potential exceptions.
> > > >> It can be better if relationships have hierarchy.
> > > >> Some users need more granular relationships while others don't.
> > > >> For NiFi 2.0 or later, supporting relationship hierarchy at
> > > >> framework can mitigate having additional property at each processor.
> > > >>
> > > >> Thanks,
> > > >> Koji
> > > >> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess
> > > >> <[hidden email]>
> > > >> wrote:
> > > >>>
> > > >>> Peter,
> > > >>>
> > > >>> Totally agree, RDBMS/JDBC is in a weird class as always, there
> > > >>> is a teaspoon of exception types for an ocean of causes. For
> > > >>> NiFi 1.x, it seems like we need to pick from a set of less-than-ideal solutions:
> > > >>>
> > > >>> 1) Add new relationships, but then your (possibly hundreds of)
> > > >>> processors are invalid
> > > >>> 2) Add new auto-terminated relationships, but then your
> > > >>> previously-handled errors are "lost"
> > > >>> 3) Add an attribute, but then each NiFi instance/release/flow
> > > >>> is responsible for parsing the error and handling it as desired.
> > > >>>
> > > >>> We could mitigate 1-2 with a tool that updates your
> > > >>> flow/template by sending all new failure relationships to the
> > > >>> same target as the existing one, but then the tool itself
> > > >>> suffers from maintainability issues (as does option #3). If we
> > > >>> could recognize that the new relationships are self-terminated
> > > >>> and then send the errors out to the original failure
> > > >>> relationship, that could be quite confusing to the user, especially as time goes on (how to suppress the "new"
> > > >>> errors, e.g.).
> > > >>>
> > > >>> IMHO I think we're between a rock and a hard place here, I
> > > >>> guess with great entropy comes great responsibility :P
> > > >>>
> > > >>> P.S. For your use case, is the workaround to just keep retrying?
> > > >>> Or are there other constraints at play?
> > > >>>
> > > >>> Regards,
> > > >>> Matt
> > > >>>
> > > >>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)
> > > >>> <[hidden email]>
> > > >> wrote:
> > > >>>>
> > > >>>> Matt,
> > > >>>>
> > > >>>> If I were to split an existing failure relationship into
> > > >>>> several
> > > >> relationships, I do not think I would want to auto-terminate in most cases.
> > > >> Specifically, I'm interested in a failure relationship for a
> > > >> database disconnect during SQL execution (database was online
> > > >> when the connection was verified in the DBCP pool, but went
> > > >> down during execution). If I were to find a way to separate
> > > >> this into its own relationship, I do not think most users would
> > > >> appreciate it being a condition silently not handled by the normal failure path.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>  Peter
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: Matt Burgess [mailto:[hidden email]]
> > > >>>> Sent: Friday, October 26, 2018 10:18 AM
> > > >>>> To: [hidden email]
> > > >>>> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception
> > > >>>> that
> > > >> caused failure in an attribute
> > > >>>>
> > > >>>> NiFi (as of the last couple releases I think) has the ability
> > > >>>> to set
> > > >> auto-terminating relationships; this IMO is one of those use
> > > >> cases (for NiFi 1.x). If new relationships are added, they
> > > >> could default to auto-terminate; then the existing processors should remain valid.
> > > >>>> However we might want an "omnibus Jira" to capture those
> > > >>>> relationships
> > > >> we'd like to remove the auto-termination from in NiFi 2.0.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Matt
> > > >>>> On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
> > > >> [hidden email]> wrote:
> > > >>>>>
> > > >>>>> Mark,
> > > >>>>>
> > > >>>>> I agree with you that this is the best option in general terms.
> > > >> After thinking about it some more I think the biggest use case
> > > >> is for troubleshooting. If a file routes to failure, you need
> > > >> to be watching the UI to see what the exception was. An admin
> > > >> may have access to the NiFi log files and could grep the error
> > > >> out, but a normal user who checks in on the flow and sees a
> > > >> FlowFile in the error queue will not know what the cause was;
> > > >> this is especially frustrating if retrying the file works
> > > >> without failure the second time... Capturing the error message in an attribute makes this easy to find.
> > > >>>>>
> > > >>>>> One thing I worry about too is adding new relationships to
> > > >>>>> core
> > > >> processors. After an upgrade, won't users need to go to each
> > > >> instance of that processor and handle the new relationship?
> > > >> Right now I'd swagger we have at least five thousand ExecuteSQL
> > > >> processors in our environment; and while we have strong
> > > >> scripting skills in my NiFi team, I would not want to encounter this without that.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>>  Peter
> > > >>>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Mark Payne [mailto:[hidden email]]
> > > >>>>> Sent: Thursday, October 25, 2018 10:38 PM
> > > >>>>> To: [hidden email]
> > > >>>>> Subject: [EXT] Re: New Standard Pattern - Put Exception that
> > > >>>>> caused failure in an attribute
> > > >>>>>
> > > >>>>> I agree - the notion of adding a "failure.reason" attribute
> > > >>>>> is, in
> > > >> my opinion, an anti-pattern that should be avoided.
> > > >> Relationships are not a workaround but rather the preferred
> > > >> approach in this scenario - an attribute I would consider a
> > > >> workaround. This is due to the fact that not only is it brittle
> > > >> and complex to add processors that route on such things, but
> > > >> there's no reason at all to assume that from release to release
> > > >> (even bug fix/increment
> > > >> releases) that the Exception type or message will be the same, so the flow could stop working at any time after upgrading nifi.
> > > >>>>> Relationships offer a well-defined way to explicitly
> > > >>>>> indicate "these
> > > >> are the possible outcomes,"
> > > >>>>> similar IMO to Java Exception classes vs. throwing Strings in C.
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Oct 25, 2018, at 9:47 AM, Bryan Bende <[hidden email]> wrote:
> > > >>>>>>
> > > >>>>>> I think processors should really have well defined
> > > >>>>>> relationships
> > > >> for
> > > >>>>>> the error scenarios that need to be handled. Having the
> > > >>>>>> exception message is ok for a human who wants to see it,
> > > >>>>>> but in order to do anything with it in the flow you will
> > > >>>>>> have to have a bunch of parsing/interpreting of the message
> > > >>>>>> with a bunch of routing processors, which seems more
> > > >>>>>> brittle than just having the appropriate relationships.
> > > >>>>>> On Thu, Oct 25, 2018 at 1:36 AM Peter Wicks (pwicks) <
> > > >> [hidden email]> wrote:
> > > >>>>>>>
> > > >>>>>>> When a FlowFile is routed to failure, frequently there is
> > > >>>>>>> no
> > > >> clear reason without looking into the actual error message.
> > > >>>>>>> Some processors work around this by creating many
> > > >>>>>>> different
> > > >> relationships, but even then frequently the generic Failure
> > > >> relationship also provides little guidance.
> > > >>>>>>>
> > > >>>>>>> I've seen a few cases recently where processors are
> > > >>>>>>> including the
> > > >> exception message as an attribute on the FlowFile when routing
> > > >> to failure (ExecuteStreamCommand, new PR for ExecuteSQL).
> > > >> Should this be a standard pattern so that it's easier for users to route failures?
> > > >>>>>>>
> > > >>>>>>> --Peter
> > > >>>>>
> > > >>
> > >