I have attribute called X. But X.0 and X.1 also got created. Why?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

I have attribute called X. But X.0 and X.1 also got created. Why?

srini
Why,
I have a attribute called original_flowfile, but I noticed these two extra got created. Why?

original_flowfile.0
original_flowfile.1

thanks
Srini
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Andy LoPresto-2
If this was in an ExtractText processor, your regular expression may have included groups (segments contained in “()”) — if this is the case, each group will be extracted and captured as well. 

From the ExtractText documentation:

The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided, with the exception of a capturing group that is optional and does not match - for example, given the attribute name "regex" and expression "abc(def)?(g)" we would add an attribute "regex.1" with a value of "def" if the "def" matched. If the "def" did not match, no attribute named "regex.1" would be added but an attribute named "regex.2" with a value of "g" will be added regardless.

Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 6, 2017, at 4:26 PM, srini <[hidden email]> wrote:

Why,
I have a attribute called original_flowfile, but I noticed these two extra
got created. Why?

original_flowfile.0
original_flowfile.1

thanks
Srini



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

srini
Hi Andy,
Thanks for your reply. This is the regex I used: (.*)
I want just one attribute, I don't want X.0 and X.1.  As the value in the attribute is huge,I get OutOfMemory error. So I want to limit to one attribute.

Any change I need to make in (.*) to get just one attribute?

thanks
Srini
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Joe Witt
If the value is huge it should certainly be avoided as a flowfile
attribute.  The design of nifi allows you to keep the content where it
belongs so that you can obtain high performance and efficient memory
usage throughout the flow on tiny objects as well as huge objects.
Can you talk more about the overall flow so we can suggest better
alternative approaches?

On Tue, Mar 7, 2017 at 10:30 AM, srini <[hidden email]> wrote:

> Hi Andy,
> Thanks for your reply. This is the regex I used: (.*)
> I want just one attribute, I don't want X.0 and X.1.  As the value in the
> attribute is huge,I get OutOfMemory error. So I want to limit to one
> attribute.
>
> Any change I need to make in (.*) to get just one attribute?
>
> thanks
> Srini
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15068.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Andy LoPresto
Srini,

Joe is right that there are perhaps other improvements to your flow. An immediate fix is to remove the parentheses from your regex; .* will still match the entire contents, but as it has no explicit group, you'll save half the heap usage immediately. Is there a reason you are extracting the entire content into an attribute?

Sent from my iPhone

> On Mar 7, 2017, at 07:44, Joe Witt <[hidden email]> wrote:
>
> If the value is huge it should certainly be avoided as a flowfile
> attribute.  The design of nifi allows you to keep the content where it
> belongs so that you can obtain high performance and efficient memory
> usage throughout the flow on tiny objects as well as huge objects.
> Can you talk more about the overall flow so we can suggest better
> alternative approaches?
>
>> On Tue, Mar 7, 2017 at 10:30 AM, srini <[hidden email]> wrote:
>> Hi Andy,
>> Thanks for your reply. This is the regex I used: (.*)
>> I want just one attribute, I don't want X.0 and X.1.  As the value in the
>> attribute is huge,I get OutOfMemory error. So I want to limit to one
>> attribute.
>>
>> Any change I need to make in (.*) to get just one attribute?
>>
>> thanks
>> Srini
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15068.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

srini
Hi Any,
I dropped the idea of saving the flowfile to an attribute. So I am good in that part.

And you said "An immediate fix is to remove the parentheses from your regex; .*"
But It is not taking if I remove parentheses.

thanks
Srini
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Andy LoPresto-2
Yes, I evaluated locally and apparently the ExtractText regex validation requires “1 to 40 capturing groups”. You can set “Include Capture Group 0” to false to reduce the duplication of the captured attribute (you’ll go from 3*n to 2*n). I am unaware of a technical reason the provided regex is required to have at least one capture group. I would recommend you open a Jira to reduce the minimum capture group count to 0 during validation if “Include Capture Group 0” is set to true. 



 
Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 3:25 PM, srini <[hidden email]> wrote:

Hi Any,
I dropped the idea of saving the flowfile to an attribute. So I am good in
that part.

And you said "An immediate fix is to remove the parentheses from your regex;
.*"
But It is not taking if I remove parentheses.

thanks
Srini



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15114.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Andy LoPresto-2
Here is the specific source code for reference: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L262-L262

Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 4:56 PM, Andy LoPresto <[hidden email]> wrote:

Yes, I evaluated locally and apparently the ExtractText regex validation requires “1 to 40 capturing groups”. You can set “Include Capture Group 0” to false to reduce the duplication of the captured attribute (you’ll go from 3*n to 2*n). I am unaware of a technical reason the provided regex is required to have at least one capture group. I would recommend you open a Jira to reduce the minimum capture group count to 0 during validation if “Include Capture Group 0” is set to true. 

<Screen Shot 2017-03-13 at 4.54.49 PM.png><Screen Shot 2017-03-13 at 4.55.25 PM.png>

 
Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 3:25 PM, srini <[hidden email]> wrote:

Hi Any,
I dropped the idea of saving the flowfile to an attribute. So I am good in
that part.

And you said "An immediate fix is to remove the parentheses from your regex;
.*"
But It is not taking if I remove parentheses.

thanks
Srini



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15114.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.



signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: I have attribute called X. But X.0 and X.1 also got created. Why?

Andy LoPresto-2
Srini,

I thought about it a little bit more and I think I have a temporary solution that will actually work for you. I still recommend you open the Jira but the following regex should work for you:

^.*(.??)$


I’ll break down the regex:

^     - Match at the start of the content
.*    - Match any character any number of times
(.??) - Capture group to match any character 0 or 1 times, greedy (i.e. will prefer 0 over 1)
$     - Match the end of the content

This results in the following LogAttribute output:

--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Mon Mar 13 17:38:03 PDT 2017'
Key: 'lineageStartDate'
Value: 'Mon Mar 13 17:38:03 PDT 2017'
Key: 'fileSize'
Value: '29'
FlowFile Attribute Map Content
Key: 'entire_match.0'
Value: 'This is a plaintext message. '
Key: 'filename'
Value: '1343455595942828'
Key: 'path'
Value: './'
Key: 'uuid'
Value: '9382e5f0-782d-4c71-963f-1004c2a50275'
--------------------------------------------------

Now your expression passes validation (because it has 1 explicit capture group), but won’t waste space on duplicate attributes. You just have to reference “attribute.0” instead of “attribute” in your follow-on processors (or use UpdateAttribute to copy and delete the original attribute, but this also wastes space). 

Hope this helps until we can provide the improved UX. 

Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 5:00 PM, Andy LoPresto <[hidden email]> wrote:

Here is the specific source code for reference: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ExtractText.java#L262-L262

Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 4:56 PM, Andy LoPresto <[hidden email]> wrote:

Yes, I evaluated locally and apparently the ExtractText regex validation requires “1 to 40 capturing groups”. You can set “Include Capture Group 0” to false to reduce the duplication of the captured attribute (you’ll go from 3*n to 2*n). I am unaware of a technical reason the provided regex is required to have at least one capture group. I would recommend you open a Jira to reduce the minimum capture group count to 0 during validation if “Include Capture Group 0” is set to true. 

<Screen Shot 2017-03-13 at 4.54.49 PM.png><Screen Shot 2017-03-13 at 4.55.25 PM.png>

 
Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Mar 13, 2017, at 3:25 PM, srini <[hidden email]> wrote:

Hi Any,
I dropped the idea of saving the flowfile to an attribute. So I am good in
that part.

And you said "An immediate fix is to remove the parentheses from your regex;
.*"
But It is not taking if I remove parentheses.

thanks
Srini



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/I-have-attribute-called-X-But-X-0-and-X-1-also-got-created-Why-tp15062p15114.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.




signature.asc (859 bytes) Download Attachment