SplitText Usage - how to output my individual files?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SplitText Usage - how to output my individual files?

idioma
Hi,
I am not sure whether I am terribly missing the point, but I have a simple dataflow (CSV2JSON) that does the following:

GetFile (1000 line-cvs file)
SplitText (one line per file)
Extract and ReplaceText in order to extract the content and construct the JSON structure
UpdateAttribute in which a new attribute called filename has been created and gets assigned to myOutput.json in order to have visibility of the output
PutFile

My understanding is that when in presence of large files (this is not probably my case right now, but I thought to test it for future reference), it is recommended to use SplitText, so that NiFi will create a JSON file for each line. My question is how do I actually prove that SplitText is doing the job? How do you test that the file has been successfully split into multiple json files?

Thank you for your help, I am rather stuck with this,

I.
Reply | Threaded
Open this post in threaded view
|

Re: SplitText Usage - how to output my individual files?

Mark Payne
Hi Idioma,

There are two different ways that I can recommend you verify this. The first is to right-click on a connection
and then choose "List Queue". From here, you can see the FlowFiles that are queued up in the connection.
You can then click the 'info' icon on the left to see the attributes and content of the FlowFile, so you can download
the content and inspect it.

Secondly, and more importantly, is NiFi's notion of Data Provenance. The User Guide explains how to use this
feature [1]. As data traverses through NiFi, every event that happens to every piece of data is recorded in the
Provenance Repository. This allows you to see exactly what happened to each piece of data as it flows through
your system, and this can be used (among many other things) to debug workflows. It will allow you to see the
attributes as well as the content at every step along the way, so that you can understand exactly how the data
looked at every step along the way and replay the data at any step along the way if it wasn't handled correctly.

Does all of this make sense and give you what you need? Let us know if you have any further questions!

Thanks
-Mark

[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance>


> On Jul 18, 2016, at 8:26 AM, idioma <[hidden email]> wrote:
>
> Hi,
> I am not sure whether I am terribly missing the point, but I have a simple
> dataflow (CSV2JSON) that does the following:
>
> GetFile (1000 line-cvs file)
> SplitText (one line per file)
> Extract and ReplaceText in order to extract the content and construct the
> JSON structure
> UpdateAttribute in which a new attribute called filename has been created
> and gets assigned to myOutput.json in order to have visibility of the output
> PutFile
>
> My understanding is that when in presence of large files (this is not
> probably my case right now, but I thought to test it for future reference),
> it is recommended to use SplitText, so that NiFi will create a JSON file for
> each line. My question is how do I actually prove that SplitText is doing
> the job? How do you test that the file has been successfully split into
> multiple json files?
>
> Thank you for your help, I am rather stuck with this,
>
> I.
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/SplitText-Usage-how-to-output-my-individual-files-tp12845.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: SplitText Usage - how to output my individual files?

idioma
In reply to this post by idioma
Mark, thank you so much for your reply. It is very much clear now and I had suspected the mgmt toolbar and in particular the Data Provenance area is very informative regarding the issue.

Thank you again!