Help with ReplaceTextWithMapping processor: multi-column mappings

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with ReplaceTextWithMapping processor: multi-column mappings

idioma
I have been trying to understand how to use ReplaceTextWithMapping in a specific case where I would like to substitute the same input value with different mapping values depending on the matching group.

My input file looks like this (this is only an excerpt):

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

The mapping file looks like this:

 Header1;Header2;Header3
 A;some text;2

My expected result would be as follows:

   {"field1" : "some text",
    "field2": "A",
    "field3": "A2"
    }


The Regular Expression set is simply as follows:

[A-Z0-9]+

and it matches the field key in the mapping file (we are expecting either a capital letter or capital letter + digit), but then I am not sure how you decide to which value (from col 1 or from col3) you want to assign the input value to. Specifically, although the value for field2 matches the regex, I do not want the substitution to happen, instead I will need to preserve the original value in the input file (I am not sure if this is possible). At the moment, I am getting something like this:

  {"field1" : "some text A2",
    "field2": "some text A2",
    "field3": "some text A2"
    }

I guess my main question is: can you map the same value in your input file to different values coming from different columns of your mapping file? Can you ignore the mapping, although the regex matches the input value? (for example, field 2). Can you have multiple values in the Matching group property?  

Thank you so much for your help. This has been troubling me for a while and I do not seem to find a solution to it.
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

Joe Percivall
Hello,

So I dove into ReplaceTextWithMapping to try and get it to solve your use-case but I just don't think it is powerful enough to do what you want. Currently it is designed almost solely for the purpose: match a simple regex, map one group of non-whitespace characters to another group of characters (can have white space and back references).

When looking at your use-case as pure text, it is to change the value of one capture group based on the value of another capture group and a mapping file. Looking at it in terms of JSON, your use-case is much simpler, you want to change the value of a key/value pair based on what the key is and a mapping file. Side note, if you didn't need the mapping file, I believe there is a new JSON to JSON processor coming in 0.7.0[1] that would work.

As for looking for a solution, both ways of looking at your problem are valid. ReplaceTextWithMapping certainly could use expanded functionality to allow for advanced use-cases but may make it too complicated (though it could be more confusing now due to the unclear scope of it's functionality). A new processor, along the lines of "ReplaceJsonWithMapping" could certainly be added as well but would need to clearly define it's scope and purpose.

Does anyone have thoughts on this use-case? Also if anyone has more experience with ReplaceTextWithMapping, please feel free to correct me.

[1] https://issues.apache.org/jira/browse/NIFI-1850

Joe

- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: [hidden email]



On Sunday, May 15, 2016 3:09 PM, idioma <[hidden email]> wrote:



I have been trying to understand how to use ReplaceTextWithMapping in a
specific case where I would like to substitute the same input value with
different mapping values depending on the matching group.

My input file looks like this (this is only an excerpt):

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

The mapping file looks like this:

Header1;Header2;Header3
A;some text;2

My expected result would be as follows:

   {"field1" : "some text",
    "field2": "A",
    "field3": "A2"
    }


The Regular Expression set is simply as follows:

[A-Z0-9]+

and it matches the field key in the mapping file (we are expecting either a
capital letter or capital letter + digit), but then I am not sure how you
decide to which value (from col 1 or from col3) you want to assign the input
value to. Specifically, although the value for field2 matches the regex, I
do not want the substitution to happen, instead I will need to preserve the
original value in the input file (I am not sure if this is possible). At the
moment, I am getting something like this:

  {"field1" : "some text A2",
    "field2": "some text A2",
    "field3": "some text A2"
    }

I guess my main question is: can you map the same value in your input file
to different values coming from different columns of your mapping file? Can
you ignore the mapping, although the regex matches the input value? (for
example, field 2). Can you have multiple values in the Matching group
property?  

Thank you so much for your help. This has been troubling me for a while and
I do not seem to find a solution to it.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Help-with-ReplaceTextWithMapping-processor-multi-column-mappings-tp10280.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

idioma
Joe, thanks for your reply. I wonder whether a typo in my example might have made what I would like to achieve slightly confusing. My mapping file looks like this (note value in bold as opposed to "2" as included in my first post":

Header1;Header2;Header3
A;some text;A2

Am I correct in saying that ReplaceTextWithMapping does not allow multiple matching groups? The main problem, also, seems to come from the fact that my input Json has different keys, but same values:

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

I am not sure how to go about it, if ReplaceTextWithMapping is not the ideal solution.

Any further suggestion/recommendation is much appreciated.

Thank you,

I.  
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

Andy LoPresto-2
Hi,

Because this seems to be a more complex mapping, using ExecuteScript with Groovy (or another supported scripting language if you’re more comfortable with one) to do the translation may make sense. Matt Burgess has provided detailed tutorials in a variety of languages on his blog [1]. Performing JSON translation and regex operations in the scripting language should be pretty straightforward. This might be a faster solution than trying to modify ReplaceTextWithMapping or writing a new processor from scratch. 

If you need help writing the script to do the translation, we can try to put something together, but may need more sample data to ensure we understand your expectations correctly. Hope this helps. 



Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 16, 2016, at 1:42 PM, idioma <[hidden email]> wrote:

Joe, thanks for your reply. I wonder whether a typo in my example might have
made what I would like to achieve slightly confusing. My mapping file looks
like this (note value in bold as opposed to "2" as included in my first
post":

Header1;Header2;Header3
A;some text;*A2 *

Am I correct in saying that ReplaceTextWithMapping does not allow multiple
matching groups? The main problem, also, seems to come from the fact that my
input Json has different keys, but same values:

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

I am not sure how to go about it, if ReplaceTextWithMapping is not the ideal
solution.

Any further suggestion/recommendation is much appreciated.

Thank you,

I.  



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Help-with-ReplaceTextWithMapping-processor-multi-column-mappings-tp10280p10338.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

idioma
Andy, all,
thank you so much for your help. I have already started using ExecuteScript quite heavily, in particular for data cleaning and wrangling between processors. I was only concern that being experimental, it was not particularly recommended to use in complex dataflow like the one I am working on. I will be happy to share the final version of the dataflow as a template. Can you advise how you would do it?

Thank you again,

I.
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

Andy LoPresto-2
Glad to hear ExecuteScript is helping you. It is becoming very heavily used as a “glue” processor to adapt edge cases, and (without speaking for Matt), I would consider it no longer “experimental” when used with certain scripting languages (Groovy, primarily). It still has some rough edges with other languages. There is actually a PR open right now [1] to add pooled execution to allow multithreading as well. This should greatly improve performance. 

We would definitely appreciate you sharing the template for our template gallery [2]. You can export the flow as a template, then upload the XML file, and add a short description on the wiki. 

Just to be clear, were you asking for advice on that process, or on the code to do the regex processing in the ExecuteScript processor?


Andy LoPresto
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On May 17, 2016, at 7:41 AM, idioma <[hidden email]> wrote:

Andy, all,
thank you so much for your help. I have already started using ExecuteScript
quite heavily, in particular for data cleaning and wrangling between
processors. I was only concern that being experimental, it was not
particularly recommended to use in complex dataflow like the one I am
working on. I will be happy to share the final version of the dataflow as a
template. Can you advise how you would do it?

Thank you again,

I.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Help-with-ReplaceTextWithMapping-processor-multi-column-mappings-tp10280p10410.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Help with ReplaceTextWithMapping processor: multi-column mappings

idioma
In reply to this post by Andy LoPresto-2
Hi Andy, apologies for the delay in replying. I have come back to the task and tried to write a simple Groovy script in order to make the substitutions coming from the mapping files. In order to have something working, I have modified the mapping files, so that each of the incoming Json values initially translate into "some text". I am harcoding the substitution in the below script:


import groovy.json.JsonBuilder
import groovy.json.JsonSlurper
import java.nio.charset.StandardCharsets

def flowFile = session.get();
if (flowFile == null) {
    return;
}

flowFile = session.write(flowFile,
        { inputStream, outputStream ->

            def content = """
{
  "field2": "some text",
  "field3": "some text"

}"""

            def slurped = new JsonSlurper().parseText(content)
            def builder = new JsonBuilder(slurped)
            builder.content.field2 = "A"
            builder.content.field3 = "A*"
            outputStream.write(builder.toPrettyString().getBytes(StandardCharsets.UTF_8))
        } as StreamCallback)
session.transfer(flowFile, ExecuteScript.REL_SUCCESS)

The substitutions work fine, but obviously the approach is very much hardcoded. First off, how do I map the attribute 'content' to my flowfile? The preceding processor is UpdateAttribute where I simply map the incoming flowfile to the filename = myResultingJson.json. Secondly, is it possible to load the mapping file with the replacements via Groovy in the script and make Groovy to work out the mapping in a more generic way?

Thank you so much for your help,

Ilaria