send JSON format to kafka and avoid duplication

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

send JSON format to kafka and avoid duplication

anup s
This post was updated on .
I have a set of files for which I need to send its filename and another property in a JSON format (shown below) to Kafka.

{
     filename=${filename},
     property=${property}
}


I tried to replace the entire content with the above JSON content and send it to Kafka. But instead of obtaining values of $filename and $property, I obtained the entire JSON content as it was.. (with duplication)


Then I tried replacing only ${filename} as the replacement text (shown below) , for which I obtained the values but the duplication was still present.

Property used in replaceText:

Regular ExpressionInfo = [\S\s]*
Replacement ValueInfo = ${filename}
Character SetInfo =UTF-8
Maximum Buffer SizeInfo = 1 MB
Evaluation ModeInfo = Entire text


@Kafka consumer

file_21.txtfile_21.txt
file_15.txtfile_15.txt
file_19.txtfile_19.txt
file.txtfile.txt


So how do I
- avoid repetition?
- send it as a JSON string?



Reply | Threaded
Open this post in threaded view
|

RE: send JSON format to kafka and avoid duplication

Mark Payne
Anup,

The Regular Expression that you are using: "[\S\s]*" can be read as "a non-space character or a space character 0 or more times." So it's really "anything 0 or more times." So when you evaluate that, it will end up matching the entire content that it's evaluated against, and then again matches 0 characters at the end, so the regular expression matches two times. If you instead changed it to "[\S\s]+" so that it matches 1 or more characters, then this should avoid the duplication.

Does this make sense?

Thanks
-Mark

----------------------------------------

> Date: Mon, 25 May 2015 08:51:14 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: send JSON format to kafka and avoid duplication
>
> I have a set of files for which I need to send its filename and another
> property in a JSON format (shown below) to Kafka.
>
> /{
> filename=${filename},
> property=${property}
> }
> /
>
> I tried to replace the entire content with the above JSON content and send
> it to Kafka. But instead of obtaining values of $filename and $property, I
> obtained the entire JSON content as it was.. (with duplication)
>
>
> Then I tried replacing only ${filename} as the replacement text (shown
> below) , for which I obtained the values but the duplication was still
> present.
>
> /Property used in replaceText:
>
> Regular ExpressionInfo = [\S\s]*
> Replacement ValueInfo = ${filename}
> Character SetInfo =UTF-8
> Maximum Buffer SizeInfo = 1 MB
> Evaluation ModeInfo = Entire text/
>
> @Kafka consumer
>
> file_21.txtfile_21.txt
> file_15.txtfile_15.txt
> file_19.txtfile_19.txt
> file.txtfile.txt
>
>
> *So how do I
> - avoid repetition?
> - send it as a JSON string?*
>
> <http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/file/n1624/putKafka.png>
>
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/send-JSON-format-to-kafka-and-avoid-duplication-tp1624.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.
     
Reply | Threaded
Open this post in threaded view
|

Re: send JSON format to kafka and avoid duplication

anup s
In reply to this post by anup s
That works, but what if I have to replace it with the JSON format, how do I send the values?

For the below replacement, the whole JSON text goes as it is, without replacing the values!

{
     filename=${filename}, ?
     property=${property}
}

Reply | Threaded
Open this post in threaded view
|

RE: send JSON format to kafka and avoid duplication

Mark Payne
Anup,

You can escape the ${ by using an extra $:

{
filename=$${filename},
property=$${property}
}

That should evaluate literally to:

{
filename=${filename},
property=${property}
}

Thanks
-Mark

----------------------------------------

> Date: Mon, 25 May 2015 23:49:21 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: Re: send JSON format to kafka and avoid duplication
>
> That works, but what if I have to replace it with the JSON format, how do I
> send the values?
>
> For the below replacement, the whole JSON text goes as it is, without
> replacing the values!
>
> {
> filename=${filename}, ?
> property=${property}
> }
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/send-JSON-format-to-kafka-and-avoid-duplication-tp1624p1627.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.
     
Reply | Threaded
Open this post in threaded view
|

RE: send JSON format to kafka and avoid duplication

anup s
Mark,
That doesn't help. I'm getting the same results with the extra "$" too..

Observation: If I have the curly braces around it doesn't replace the actual value (for the below ones)
- {filename:${filename}}
- {filename:$${filename}}

But, if I omit the curly braces, the evaluation goes fine
filename:${filename}

Is there something like a 'eval' function or a backtick ` that I could use to evaluate and obtain the real filename inside a JSON like the below??

{filename:eval(${filename}}
Reply | Threaded
Open this post in threaded view
|

RE: send JSON format to kafka and avoid duplication

Mark Payne
Anup,

I think I'm understanding now. Looks like you're getting bitten by a bug: https://issues.apache.org/jira/browse/NIFI-625

It should be addressed in the next version of NiFi.

So I realize this is a pain, but a work around that you can use in the mean time is to chain together two ReplaceText processors:

The first one, like you outlined below, would replace ".+" with:
filename: ${filename}

This should perform the replacement correctly because there's no enclosing { }.

Then second would then be configured with the Regular Expression set to "(.+)" and the Replacement Value set to:
{
$1
}

This will replace the "$1" with whatever value matches Capturing Group 1 of the regular expression - in this case the entire content of the FlowFile.

Please let me know if this helps!

Thanks
-Mark


----------------------------------------

> Date: Tue, 26 May 2015 21:31:11 -0700
> From: [hidden email]
> To: [hidden email]
> Subject: RE: send JSON format to kafka and avoid duplication
>
> Mark,
> That doesn't help. I'm getting the same results with the extra "$" too..
>
> Observation: If I have the curly braces around it doesn't replace the actual
> value (for the below ones)
> - {filename:${filename}}
> - {filename:$${filename}}
>
> But, if I omit the curly braces, the evaluation goes fine
> filename:${filename}
>
> Is there something like a 'eval' function or a backtick ` that I could use
> to evaluate and obtain the real filename inside a JSON like the below??
>
> {filename:eval(${filename}}
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/send-JSON-format-to-kafka-and-avoid-duplication-tp1624p1645.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.
     
Reply | Threaded
Open this post in threaded view
|

RE: send JSON format to kafka and avoid duplication

anup s
This post was updated on .
Hi Mark,
  Great! That works for now until the next release.

Thanks,
anup