Conflict Resolution Strategy

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Conflict Resolution Strategy

Kartik Veerepalli
Hello all,

I have a use case to copy files from one server to another server and use those files for further processing. Using simple GetFile (with Keep Source File property set to true) and PutFile processors. But, when I move those files from the destination server for processing, Apache Nifi is copying the files from the source server to the destination server all over again. Is there a way I can work around this? There are only 3 options under the conflict resolution strategy property - replace, ignore and fail none of which fits my use case. Any suggestions?

Thanks
Kartik
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Joe Witt
Kartik

I would recommend running something lime rsync for that case.  Nifi is
meant to automate the continuous flow of data from producers to consumers.

Happy to help you think through this further if i have misunderstood.

Thanks
Joe
On Apr 7, 2015 4:14 PM, "Kartik Veerepalli" <[hidden email]>
wrote:

> Hello all,
>
> I have a use case to copy files from one server to another server and use
> those files for further processing. Using simple GetFile (with Keep Source
> File property set to true) and PutFile processors. But, when I move those
> files from the destination server for processing, Apache Nifi is copying
> the files from the source server to the destination server all over again.
> Is there a way I can work around this? There are only 3 options under the
> conflict resolution strategy property - replace, ignore and fail none of
> which fits my use case. Any suggestions?
>
> Thanks
> Kartik
>
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Corey Flowers
In reply to this post by Kartik Veerepalli
Hey Kartik!

       I am not sure I fully understand your problem. Let me see if I have this right.

1) you want to keep the files on the source but have a copy picked up and sent to the destination.
2) you then want the copies to be processed on the destination without keeping a copy of them.

Is that right?

Sent from my iPhone

> On Apr 7, 2015, at 5:08 PM, Kartik Veerepalli <[hidden email]> wrote:
>
> Hello all,
>
> I have a use case to copy files from one server to another server and use those files for further processing. Using simple GetFile (with Keep Source File property set to true) and PutFile processors. But, when I move those files from the destination server for processing, Apache Nifi is copying the files from the source server to the destination server all over again. Is there a way I can work around this? There are only 3 options under the conflict resolution strategy property - replace, ignore and fail none of which fits my use case. Any suggestions?
>
> Thanks
> Kartik
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Kartik Veerepalli
In reply to this post by Kartik Veerepalli
Corey,


My apologies for not making myself clear. But, the points you listed are exactly what I meant.


Joe: I did checkout RSync, but we are planning to establish a continuos data flow pipeline from wide range of servers, message bus, etc. to HDFS. We think Apache Nifi can be integrated/used as a data flow system with our Analytics as a Service Platform that we are building. Thanks for the help.


Kartik
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Joe Witt
Kartik

Ok yes so your reply is definitely in the nifi wheelhouse.

For your original case whereby you want to copy but retain the original
object there are a few ways to do it.  One is to actually pull the data
from its original location and send a copy to your analytic system and also
give a copy back to the original system.

If you truly must keep the original where it was then there are really only
'ok' options.  You need nifi then to act as an idempotent receiver which
means it will keep state about what it has grabbed a copy of and will avoid
sending it through more than once.  Sounds like no big deal but it means
some database and constantly checking the same things and tension on
clustering.  It is in many ways something which isnt conducive to healthy
dataflow.  It can be done but isnt fun.

So before walking that path is putting back a copy of the data in the
original system but not in a directory you are polling an option?

Please feel free to subscribe to the mailing list so your notes will get
through without delay.

Thanks
Joe
On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[hidden email]>
wrote:

> Corey,
>
>
> My apologies for not making myself clear. But, the points you listed are
> exactly what I meant.
>
>
> Joe: I did checkout RSync, but we are planning to establish a continuos
> data flow pipeline from wide range of servers, message bus, etc. to HDFS.
> We think Apache Nifi can be integrated/used as a data flow system with our
> Analytics as a Service Platform that we are building. Thanks for the help.
>
>
> Kartik
>
Reply | Threaded
Open this post in threaded view
|

RE: Conflict Resolution Strategy

Ralph.Spangler
We did something similar to this, but kept a simple flat file of where we left off, basically used the date or a sequence number along with a custom flow processor. We also had the system with the data on it send it and put in a directory that NiFi monitored with the GetFile processor. This would require something on the sending system then to keep track.

Ralph Spangler
Chief Engineer
L-3 NSS Data Tactics
7901 Jones Branch Drive, Suite 700
McLean, VA  22102
Office: (571) 257-0491
Cell: (321) 212-9552
Fax: (703) 506-6703
[hidden email]
 
The information contained in this message may be privileged and/or confidential and protected from disclosure.  If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.  If you have received this communication in error, please notify the sender immediately by replying to this message and deleting the material from any computer.

-----Original Message-----
From: Joe Witt [mailto:[hidden email]]
Sent: Wednesday, April 08, 2015 12:35 AM
To: [hidden email]
Subject: Re: Conflict Resolution Strategy

Kartik

Ok yes so your reply is definitely in the nifi wheelhouse.

For your original case whereby you want to copy but retain the original object there are a few ways to do it.  One is to actually pull the data from its original location and send a copy to your analytic system and also give a copy back to the original system.

If you truly must keep the original where it was then there are really only 'ok' options.  You need nifi then to act as an idempotent receiver which means it will keep state about what it has grabbed a copy of and will avoid sending it through more than once.  Sounds like no big deal but it means some database and constantly checking the same things and tension on clustering.  It is in many ways something which isnt conducive to healthy dataflow.  It can be done but isnt fun.

So before walking that path is putting back a copy of the data in the original system but not in a directory you are polling an option?

Please feel free to subscribe to the mailing list so your notes will get through without delay.

Thanks
Joe
On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[hidden email]>
wrote:

> Corey,
>
>
> My apologies for not making myself clear. But, the points you listed
> are exactly what I meant.
>
>
> Joe: I did checkout RSync, but we are planning to establish a
> continuos data flow pipeline from wide range of servers, message bus, etc. to HDFS.
> We think Apache Nifi can be integrated/used as a data flow system with
> our Analytics as a Service Platform that we are building. Thanks for the help.
>
>
> Kartik
>
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Corey Flowers
In reply to this post by Joe Witt
Good morning Kartik,

            Joe is correct, the real question is what puts the original file in place. If have a "temporary landing folder" to use the getfile on, then the problem is very simple. Just have the getfile pick them up and send copies to the permanent location and the outside systems. If the creation/delivery file location can't be changed then there are really only "ok" solutions because of the idea of known state of the files. You could use file renaming, temp files, and a few other ideas but really the best would be to use a neutral landing folder for the original files and then have nifi place them where you needed them.

Sorry, I hope that, isn't too confusing.
Corey

Sent from my iPhone

> On Apr 8, 2015, at 12:33 AM, Joe Witt <[hidden email]> wrote:
>
> Kartik
>
> Ok yes so your reply is definitely in the nifi wheelhouse.
>
> For your original case whereby you want to copy but retain the original
> object there are a few ways to do it.  One is to actually pull the data
> from its original location and send a copy to your analytic system and also
> give a copy back to the original system.
>
> If you truly must keep the original where it was then there are really only
> 'ok' options.  You need nifi then to act as an idempotent receiver which
> means it will keep state about what it has grabbed a copy of and will avoid
> sending it through more than once.  Sounds like no big deal but it means
> some database and constantly checking the same things and tension on
> clustering.  It is in many ways something which isnt conducive to healthy
> dataflow.  It can be done but isnt fun.
>
> So before walking that path is putting back a copy of the data in the
> original system but not in a directory you are polling an option?
>
> Please feel free to subscribe to the mailing list so your notes will get
> through without delay.
>
> Thanks
> Joe
> On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[hidden email]>
> wrote:
>
>> Corey,
>>
>>
>> My apologies for not making myself clear. But, the points you listed are
>> exactly what I meant.
>>
>>
>> Joe: I did checkout RSync, but we are planning to establish a continuos
>> data flow pipeline from wide range of servers, message bus, etc. to HDFS.
>> We think Apache Nifi can be integrated/used as a data flow system with our
>> Analytics as a Service Platform that we are building. Thanks for the help.
>>
>>
>> Kartik
>>
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Mark Payne
In reply to this post by Kartik Veerepalli
Kartik,

Thanks for your interest in NiFi!

I know you've gotten a few responses to this already, but you're right -
this is something we should address. I think the basic idea is that many
people just pick up from a temp directory and push it back to a
permanent directory.

But if that doesn't work for you, we could update the processor to do
something a bit smarter. One idea that might make sense is to pick up
the oldest files first. Then, we can keep track of the "last modified
date" of the last file that it has picked up. This way, we can keep
minimal state about what has been pulled in but still pull in only new
data and avoid deleting it.

Do you think this solution would help you?

Thanks
-Mark



------ Original Message ------
From: "Kartik Veerepalli" <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: 4/7/2015 10:46:11 PM
Subject: Re: Conflict Resolution Strategy

>Corey,
>
>
>My apologies for not making myself clear. But, the points you listed
>are exactly what I meant.
>
>
>Joe: I did checkout RSync, but we are planning to establish a continuos
>data flow pipeline from wide range of servers, message bus, etc. to
>HDFS. We think Apache Nifi can be integrated/used as a data flow system
>with our Analytics as a Service Platform that we are building. Thanks
>for the help.
>
>
>Kartik
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Kartik Veerepalli
In reply to this post by Kartik Veerepalli
Mark,

It would definitely be helpful to maintain some bookmark information and pull only the new files rather than   having back copy in some other folder location.

Thanks
Kartik
Reply | Threaded
Open this post in threaded view
|

Re: Conflict Resolution Strategy

Mark Payne
Kartik,

I have created a ticket for this,
https://issues.apache.org/jira/browse/NIFI-512

Hopefully we will be able to get to this pretty quickly, as there have
been several people requesting this functionality.


------ Original Message ------
From: "Kartik Veerepalli" <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: 4/10/2015 9:54:32 AM
Subject: Re: Conflict Resolution Strategy

>Mark,
>
>It would definitely be helpful to maintain some bookmark information
>and pull only the new files rather than having back copy in some other
>folder location.
>
>Thanks
>Kartik