Keep Files

classic Classic list List threaded Threaded
6 messages Options
plj
Reply | Threaded
Open this post in threaded view
|

Keep Files

plj
Is there a way for GetFile to not delete a file but only read it once?  I have a directory with files in it.  I only want the new files that are added to the to be processed.  It seems that if I set GetFile to not delete the files, the same files get read over and over.


thoughts?
Reply | Threaded
Open this post in threaded view
|

Re: Keep Files

Matthew Clarke
   The behavior you describe is exactly how getFile was designed to work.
We do understand the use case you have here and new processors are in the
works to cover it. Keep a look out for listFile and fetchFile processors. I
am not developing these processors myself and can not give a status on what
release they will become available in.

Matt
On Nov 14, 2015 9:33 AM, "plj" <[hidden email]> wrote:

> Is there a way for GetFile to not delete a file but only read it once?  I
> have a directory with files in it.  I only want the new files that are
> added
> to the to be processed.  It seems that if I set GetFile to not delete the
> files, the same files get read over and over.
>
>
> thoughts?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Keep Files

Mark Petronic
In reply to this post by plj
Keep, yes, There is a parameter to configure that. Read once. No. But there
is a set of processors in the works to address that. ListFile and
FetchFile. ListFile will return the list of files that have changed since
the last time the files were read - it is stateful. FetchFile can then take
a list and fetch them, and I would assume it would have a parameter for
keep=<yes|no> like GetFile. Not sure of the status of the changes - have
not checked recently but see: https://issues.apache.org/jira/browse/NIFI-631

Mark

On Fri, Nov 13, 2015 at 8:55 AM, plj <[hidden email]> wrote:

> Is there a way for GetFile to not delete a file but only read it once?  I
> have a directory with files in it.  I only want the new files that are
> added
> to the to be processed.  It seems that if I set GetFile to not delete the
> files, the same files get read over and over.
>
>
> thoughts?
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Keep Files

Adam Taft
Also, as a potential work-around, it's possible to use GetFile with
"delete" mode and then somewhere in your flow, use PutFile to place the
file back down into a "complete" directory.  i.e. something like:

/path/incoming  <- use GetFile to pick up files here
/path/complete  <- use PutFile to place files here after processing

As a variation of the above, if you need the files consistently in the same
directory, you could configure GetFile to only pick up certain file
patterns.  In this way, you could rename a file after it has been processed:

/path/incoming  <- use GetFile to pick up files named $filename.new
/path/incoming  <- rename file (using UpdateAttribute) to
$filename.complete and use PutFile to place files here after rename

Hope that gives you some possible alternatives.

Adam



On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic <[hidden email]>
wrote:

> Keep, yes, There is a parameter to configure that. Read once. No. But there
> is a set of processors in the works to address that. ListFile and
> FetchFile. ListFile will return the list of files that have changed since
> the last time the files were read - it is stateful. FetchFile can then take
> a list and fetch them, and I would assume it would have a parameter for
> keep=<yes|no> like GetFile. Not sure of the status of the changes - have
> not checked recently but see:
> https://issues.apache.org/jira/browse/NIFI-631
>
> Mark
>
> On Fri, Nov 13, 2015 at 8:55 AM, plj <[hidden email]> wrote:
>
> > Is there a way for GetFile to not delete a file but only read it once?  I
> > have a directory with files in it.  I only want the new files that are
> > added
> > to the to be processed.  It seems that if I set GetFile to not delete the
> > files, the same files get read over and over.
> >
> >
> > thoughts?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> > Nabble.com.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Keep Files

Salvatore Papa
If you're on a linux system, a alternative i've used in the past is to
create another directory, full of symlinks pointing to the original
directory.

As an example, assuming you have a directory: /data/input_files/ full of
files, create a directory /data/input_links/, and from that new directory,
do: "ln -s ../input_files/* ./"

Now in NiFi, use the original GetFile processor, configured with
/data/input_links/, and set Keep Source File to False. When the GetFile
processor picks up the file, it'll read the contents and create a flowfile
by following the symlink, delete the symlink, and the original file will
remain in /data/input_files.

On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft <[hidden email]> wrote:

> Also, as a potential work-around, it's possible to use GetFile with
> "delete" mode and then somewhere in your flow, use PutFile to place the
> file back down into a "complete" directory.  i.e. something like:
>
> /path/incoming  <- use GetFile to pick up files here
> /path/complete  <- use PutFile to place files here after processing
>
> As a variation of the above, if you need the files consistently in the same
> directory, you could configure GetFile to only pick up certain file
> patterns.  In this way, you could rename a file after it has been
> processed:
>
> /path/incoming  <- use GetFile to pick up files named $filename.new
> /path/incoming  <- rename file (using UpdateAttribute) to
> $filename.complete and use PutFile to place files here after rename
>
> Hope that gives you some possible alternatives.
>
> Adam
>
>
>
> On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic <[hidden email]>
> wrote:
>
> > Keep, yes, There is a parameter to configure that. Read once. No. But
> there
> > is a set of processors in the works to address that. ListFile and
> > FetchFile. ListFile will return the list of files that have changed since
> > the last time the files were read - it is stateful. FetchFile can then
> take
> > a list and fetch them, and I would assume it would have a parameter for
> > keep=<yes|no> like GetFile. Not sure of the status of the changes - have
> > not checked recently but see:
> > https://issues.apache.org/jira/browse/NIFI-631
> >
> > Mark
> >
> > On Fri, Nov 13, 2015 at 8:55 AM, plj <[hidden email]> wrote:
> >
> > > Is there a way for GetFile to not delete a file but only read it
> once?  I
> > > have a directory with files in it.  I only want the new files that are
> > > added
> > > to the to be processed.  It seems that if I set GetFile to not delete
> the
> > > files, the same files get read over and over.
> > >
> > >
> > > thoughts?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > > Sent from the Apache NiFi Developer List mailing list archive at
> > > Nabble.com.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Keep Files

Adam Taft
Oooh, neat idea Salvatore.  +1 to creativity.  Really interesting.

Adam

On Mon, Nov 16, 2015 at 6:25 AM, Salvatore Papa <[hidden email]>
wrote:

> If you're on a linux system, a alternative i've used in the past is to
> create another directory, full of symlinks pointing to the original
> directory.
>
> As an example, assuming you have a directory: /data/input_files/ full of
> files, create a directory /data/input_links/, and from that new directory,
> do: "ln -s ../input_files/* ./"
>
> Now in NiFi, use the original GetFile processor, configured with
> /data/input_links/, and set Keep Source File to False. When the GetFile
> processor picks up the file, it'll read the contents and create a flowfile
> by following the symlink, delete the symlink, and the original file will
> remain in /data/input_files.
>
> On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft <[hidden email]> wrote:
>
> > Also, as a potential work-around, it's possible to use GetFile with
> > "delete" mode and then somewhere in your flow, use PutFile to place the
> > file back down into a "complete" directory.  i.e. something like:
> >
> > /path/incoming  <- use GetFile to pick up files here
> > /path/complete  <- use PutFile to place files here after processing
> >
> > As a variation of the above, if you need the files consistently in the
> same
> > directory, you could configure GetFile to only pick up certain file
> > patterns.  In this way, you could rename a file after it has been
> > processed:
> >
> > /path/incoming  <- use GetFile to pick up files named $filename.new
> > /path/incoming  <- rename file (using UpdateAttribute) to
> > $filename.complete and use PutFile to place files here after rename
> >
> > Hope that gives you some possible alternatives.
> >
> > Adam
> >
> >
> >
> > On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic <[hidden email]>
> > wrote:
> >
> > > Keep, yes, There is a parameter to configure that. Read once. No. But
> > there
> > > is a set of processors in the works to address that. ListFile and
> > > FetchFile. ListFile will return the list of files that have changed
> since
> > > the last time the files were read - it is stateful. FetchFile can then
> > take
> > > a list and fetch them, and I would assume it would have a parameter for
> > > keep=<yes|no> like GetFile. Not sure of the status of the changes -
> have
> > > not checked recently but see:
> > > https://issues.apache.org/jira/browse/NIFI-631
> > >
> > > Mark
> > >
> > > On Fri, Nov 13, 2015 at 8:55 AM, plj <[hidden email]> wrote:
> > >
> > > > Is there a way for GetFile to not delete a file but only read it
> > once?  I
> > > > have a directory with files in it.  I only want the new files that
> are
> > > > added
> > > > to the to be processed.  It seems that if I set GetFile to not delete
> > the
> > > > files, the same files get read over and over.
> > > >
> > > >
> > > > thoughts?
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > > > Sent from the Apache NiFi Developer List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> >
>