NIFI Usage for Data Transformation

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

NIFI Usage for Data Transformation

Ameer Mawia
We have a use case where we take data from a source(text data in csv
format), do transformation and manipulation of textual record, and output
the data in another (csv)format. This is being done by a Java based custom
framework, written specifically for this *transformation* piece.

Recently as Apache NIFI is being adopted at enterprise level by the
organisation, we have been asked to try *Apache NIFI* and see if can use
that as a replacement to this custom tool?

*My question is*:

   - How much leverage does *Apache NIFI *provides on the flowfile *content
   *manipulation?

I understand *NIFI *is good for creating data flow pipeline, but is it good
for *extensive TEXT Transformation* as well?   So far I have not found
obvious way to achieve that.

Appreciate the feedback.

Thanks,

--
http://ca.linkedin.com/in/ameermawia
Toronto, ON
Reply | Threaded
Open this post in threaded view
|

Re: NIFI Usage for Data Transformation

Ameer Mawia
Thanks for the input folks.

I had this impression that for actual processing of the data :

   - we may have to put in place a custom processor which will have the
   transformation framework logic in it.
   - Or we can use ExcecuteProcess processor to trigger an external
   process(which will be this transformation logic) and route back the output
   in the NIFI.

Our flow inside the framework generally looks like this:


   - Split the CSV file line by line.
   - For each line Split it in array of string.
   - For each record in the array determine its invoke it transformation
   method.
   - Transformation Method contains the transformation logic. This logic
   can be pretty intensive like:
      - searching for hundreds of different pattern.
      - lookup against hundreds of configured string constants.
      - Appending/Prepending/Trimming/Padding...
   - Finally map the each record into an output csv format.

So far we have been trying to see if SplitRecord, UpdateRecord,
ExtractText, etc can come in handy?

Thanks,

On Thu, Nov 1, 2018 at 12:39 PM Mike Thomsen <[hidden email]> wrote:

> Ameer,
>
> Depending on how you implemented the custom framework, you may be able to
> easily drop it in place into a custom NiFi processor. Without knowing much
> about your implementation details, if you can act on Java streams, Strings,
> byte arrays and things like that it will probably be very straight forward
> to drop in place.
>
> This is a really simple of how you could bring it in depending on how
> encapsulated your business logic is:
>
> @Override
> public void onTrigger(ProcessContext context, ProcessSession session)
> throws ProcessException {
>     FlowFile input = session.get();
>     if (input == null) {
>         return;
>     }
>
>     FlowFile output = session.create(input);
>     try (InputStream is = session.read(input);
>         OutputStream os = session.write(output)
>     ) {
>         transformerPojo.transform(is, os);
>
>         is.close();
>         os.close();
>
>         session.transfer(input, REL_ORIGINAL); //If you created an
> "original relationship"
>         session.transfer(output, REL_SUCCESS);
>     } catch (Exception ex) {
>         session.remove(output);
>         session.transfer(input, REL_FAILURE);
>     }
> }
>
> That's the general idea, and that approach can scale to your disk space
> limits. Hope that helps put it into perspective.
>
> Mike
>
> On Thu, Nov 1, 2018 at 10:16 AM Nathan Gough <[hidden email]> wrote:
>
>> Hi Ameer,
>>
>> This blog by Mark Payne describes how to manipulate record based data
>> like CSV using schemas:
>> https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi. This
>> would probably be the most efficient method. And another here:
>> https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries
>> .
>>
>> An alternative option would be to port your custom java code into your
>> own NiFi processor:
>>
>> https://medium.com/hashmapinc/creating-custom-processors-and-controllers-in-apache-nifi-e14148740ea
>> under 'Steps for Creating a Custom Apache NiFi Processor'
>> https://nifi.apache.org/developer-guide.html
>>
>> Nathan
>>
>> ´╗┐On 10/31/18, 5:02 PM, "Ameer Mawia" <[hidden email]> wrote:
>>
>>     We have a use case where we take data from a source(text data in csv
>>     format), do transformation and manipulation of textual record, and
>> output
>>     the data in another (csv)format. This is being done by a Java based
>> custom
>>     framework, written specifically for this *transformation* piece.
>>
>>     Recently as Apache NIFI is being adopted at enterprise level by the
>>     organisation, we have been asked to try *Apache NIFI* and see if can
>> use
>>     that as a replacement to this custom tool?
>>
>>     *My question is*:
>>
>>        - How much leverage does *Apache NIFI *provides on the flowfile
>> *content
>>        *manipulation?
>>
>>     I understand *NIFI *is good for creating data flow pipeline, but is
>> it good
>>     for *extensive TEXT Transformation* as well?   So far I have not found
>>     obvious way to achieve that.
>>
>>     Appreciate the feedback.
>>
>>     Thanks,
>>
>>     --
>>     http://ca.linkedin.com/in/ameermawia
>>     Toronto, ON
>>
>>
>>
>>

--
http://ca.linkedin.com/in/ameermawia
Toronto, ON