Purpose of Disallowing Attribute Expression

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Purpose of Disallowing Attribute Expression

dale.chang13
What is the purpose of not allowing a Processor property to support expression language? Not allowing a property such as "Character set" in the ExtractText Processor is proving to be a hindrance. Would it affect NiFi under the hood if it were otherwise?
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of Disallowing Attribute Expression

Joe Witt
Hello

It is generally quite easy to enable for Property Descriptors which
accept user supplied strings.  And this is one that does seem like a
candidate.  Were you wanting it to look at a flowfile attribute to be
the way of indicating the character set?

Thinking through this example the challenges that come to mind are:
- What to do if the flow file doesn't have the charset indicated as an
attribute?
- What to do if the charset indicated by the flowfile attribute isn't supported?

There are various cases to consider is all and your idea is a good one
to pursue in my view.  We had wanted to make it be an enumerated value
at one point so users could only selected from known/valid charsets.
But your idea is good too.

Thanks
Joe

On Thu, May 12, 2016 at 2:58 PM, dale.chang13 <[hidden email]> wrote:

> What is the purpose of not allowing a Processor property to support
> expression language? Not allowing a property such as "Character set" in the
> ExtractText Processor is proving to be a hindrance. Would it affect NiFi
> under the hood if it were otherwise?
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of Disallowing Attribute Expression

dale.chang13
Joe Witt wrote
It is generally quite easy to enable for Property Descriptors which
accept user supplied strings.  And this is one that does seem like a
candidate.  Were you wanting it to look at a flowfile attribute to be
the way of indicating the character set?

Thinking through this example the challenges that come to mind are:
- What to do if the flow file doesn't have the charset indicated as an
attribute?
- What to do if the charset indicated by the flowfile attribute isn't supported?

There are various cases to consider is all and your idea is a good one
to pursue in my view.  We had wanted to make it be an enumerated value
at one point so users could only selected from known/valid charsets.
But your idea is good too.
Yes, setting the character set or other properties as a flowfile attribute would be helpful. I have already tweaked Extract Text in order to support expression language as well as providing UTF-8 as the default character set and remove its mandatory specification

I suppose the ExtractText processor could route to an "invalid character set" relationship if there is a conflict. That would require a character set detection service at the least though.

I only asked because our limitation was to use as much out-of-the-box functionality and as little custom processors as possible for maintenance's sake.

Would it be possible to implement this change (more properties supporting expression language) in future releases? I know it would warrant an in-depth discussion on the goals that NiFi would like to achieve
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of Disallowing Attribute Expression

Michael Moser
Hi,

NIFI-1077 [1] has discussed this a bit in the past, when
ConvertCharacterSet was improved to support expression language.  A JIRA
ticket is needed to spur action on these requests.

An interesting case to help this would be to improve the IdentifyMimeType
processor to detect character encodings on text data.  Apache Tika can do
it with an EncodingDetector [2], so why not take advantage since it's
already part of IdentifyMimeType?  I think this would be cool so I wrote
NIFI-1874 [3].

-- MIke

[1] - https://issues.apache.org/jira/browse/NIFI-1077
[2] -
https://tika.apache.org/1.12/api/org/apache/tika/detect/EncodingDetector.html
[3] - https://issues.apache.org/jira/browse/NIFI-1874



On Thu, May 12, 2016 at 3:52 PM, dale.chang13 <[hidden email]>
wrote:

> Joe Witt wrote
> > It is generally quite easy to enable for Property Descriptors which
> > accept user supplied strings.  And this is one that does seem like a
> > candidate.  Were you wanting it to look at a flowfile attribute to be
> > the way of indicating the character set?
> >
> > Thinking through this example the challenges that come to mind are:
> > - What to do if the flow file doesn't have the charset indicated as an
> > attribute?
> > - What to do if the charset indicated by the flowfile attribute isn't
> > supported?
> >
> > There are various cases to consider is all and your idea is a good one
> > to pursue in my view.  We had wanted to make it be an enumerated value
> > at one point so users could only selected from known/valid charsets.
> > But your idea is good too.
>
> Yes, setting the character set or other properties as a flowfile attribute
> would be helpful. I have already tweaked Extract Text in order to support
> expression language as well as providing UTF-8 as the default character set
> and remove its mandatory specification
>
> I suppose the ExtractText processor could route to an "invalid character
> set" relationship if there is a conflict. That would require a character
> set
> detection service at the least though.
>
> I only asked because our limitation was to use as much out-of-the-box
> functionality and as little custom processors as possible for maintenance's
> sake.
>
> Would it be possible to implement this change (more properties supporting
> expression language) in future releases? I know it would warrant an
> in-depth
> discussion on the goals that NiFi would like to achieve
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221p10227.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Purpose of Disallowing Attribute Expression

dale.chang13
Michael Moser wrote
NIFI-1077 [1] has discussed this a bit in the past, when
ConvertCharacterSet was improved to support expression language.  A JIRA
ticket is needed to spur action on these requests.

An interesting case to help this would be to improve the IdentifyMimeType
processor to detect character encodings on text data.  Apache Tika can do
it with an EncodingDetector [2], so why not take advantage since it's
already part of IdentifyMimeType?  I think this would be cool so I wrote
NIFI-1874 [3].

[1] - https://issues.apache.org/jira/browse/NIFI-1077
[2] -
https://tika.apache.org/1.12/api/org/apache/tika/detect/EncodingDetector.html
[3] - https://issues.apache.org/jira/browse/NIFI-1874
Funny enough, my company's backlog says that we would need to have a character set detection processor. However, I just got assigned a bunch of tasks for our next sprint. I'd love to have either my colleague or I take up JIRA 1874, but we'll have to wait until the sprint after