ExecuteStreamCommand with output directory

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ExecuteStreamCommand with output directory

mbwagne
I have an ExecuteStreamCommand Processor that runs an application that generates an output directory with several files and subdirectories (no option to specify output stream). How would I pickup the output from that command from another processor, ensuring the command has completely finished (i.e. GetFile would start right away)?
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

Joe Witt
Mike,

Consider using GetFile to a parent directory of the
executestreamcommand output and telling it to recurse for files.

Keep in mind though no matter what if the process doesn't write files
with some sort of flag you have a race condition.  GetFile let's you
do things to reduce the risk of the inherent race condition though.
For instance, you can tell it to only pick up data that is a certain
age as indicated by its last modified date.

Does this sound like it would take care of it?

Thanks
Joe

On Mon, Apr 20, 2015 at 4:45 PM, mbwagne <[hidden email]> wrote:

> I have an ExecuteStreamCommand Processor that runs an application that
> generates an output directory with several files and subdirectories (no
> option to specify output stream). How would I pickup the output from that
> command from another processor, ensuring the command has completely finished
> (i.e. GetFile would start right away)?
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

Brandon DeVries
Mike,

You could also create a wrapper script for your application and call that
from ExecuteStreamCommand. The wrapper would call your application,  tar up
the output,  and steam that out.  Follow that with UnpackContent,  and
continue from there.

Brandon

On Mon, Apr 20, 2015, 5:07 PM Joe Witt <[hidden email]> wrote:

> Mike,
>
> Consider using GetFile to a parent directory of the
> executestreamcommand output and telling it to recurse for files.
>
> Keep in mind though no matter what if the process doesn't write files
> with some sort of flag you have a race condition.  GetFile let's you
> do things to reduce the risk of the inherent race condition though.
> For instance, you can tell it to only pick up data that is a certain
> age as indicated by its last modified date.
>
> Does this sound like it would take care of it?
>
> Thanks
> Joe
>
> On Mon, Apr 20, 2015 at 4:45 PM, mbwagne <[hidden email]>
> wrote:
> > I have an ExecuteStreamCommand Processor that runs an application that
> > generates an output directory with several files and subdirectories (no
> > option to specify output stream). How would I pickup the output from that
> > command from another processor, ensuring the command has completely
> finished
> > (i.e. GetFile would start right away)?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177.html
> > Sent from the Apache NiFi (incubating) Developer List mailing list
> archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

mbwagne
In reply to this post by Joe Witt
Joe,

The Minimum File Age should solve most cases. It's not currently possible to wait for a "done" file to know when a directory is complete is there? Like the following example where GetFile has an Input Directory of "output" and "1234" and "3456" are complete, but "2345" is not.


output/1234/
    resources/
        test1.txt
        te2t2.txt
    .done
output/2345/
    resources/
        test3.txt
output/3456/
    resources/
        test3.txt
        test4.txt
    .done


Thanks,
Mike
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

mbwagne
In reply to this post by Brandon DeVries
Good idea Brandon! I'll look at that approach. I was just hoping to NIFI it all, but maybe that's not the correct way to think about it.
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

Joe Witt
In reply to this post by mbwagne
Mike,

There is presently no support for such a mechanism (wait for '.done').
That is a really specific model.  I think for such a case what Brandon
mentions is the best/safe way.

Or it would be pretty easy to build a custom processor for that too.

Thanks
Joe

On Mon, Apr 20, 2015 at 5:05 PM, mbwagne <[hidden email]> wrote:

> Joe,
>
> The Minimum File Age should solve most cases. It's not currently possible to
> wait for a "done" file to know when a directory is complete is there? Like
> the following example where GetFile has an Input Directory of "output" and
> "1234" and "3456" are complete, but "2345" is not.
>
>
> output/1234/
>     resources/
>         test1.txt
>         te2t2.txt
>     .done
> output/2345/
>     resources/
>         test3.txt
> output/3456/
>     resources/
>         test3.txt
>         test4.txt
>     .done
>
>
> Thanks,
> Mike
>
>
>
> --
> View this message in context: http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1183.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

mbwagne
Brandon's approach did work. Thanks Brandon!

My concern is I'm tarring up a directory in a shell script just to unpack in a nifi processor. That seems like a lot of unnecessary IO. I wish I could trigger GetFile on the completion of the ExecuteStreamCommand via the success route.
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

Mark Payne
Mike,

Any chance you can control where the data is written to by the script?

If you write it for instance to ".myDir"  your "wrapper" script could
just rename it form ".myDir" to "myDir".

GetFile by default does not pick up any file that begins with a .

------ Original Message ------
From: "mbwagne" <[hidden email]>
To: [hidden email]
Sent: 4/23/2015 12:43:12 PM
Subject: Re: ExecuteStreamCommand with output directory

>Brandon's approach did work. Thanks Brandon!
>
>My concern is I'm tarring up a directory in a shell script just to
>unpack in
>a nifi processor. That seems like a lot of unnecessary IO. I wish I
>could
>trigger GetFile on the completion of the ExecuteStreamCommand via the
>success route.
>
>
>
>--
>View this message in context:
>http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1219.html
>Sent from the Apache NiFi (incubating) Developer List mailing list
>archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteStreamCommand with output directory

Brandon DeVries
In reply to this post by mbwagne
Mike,

    Yes, it's a bit of a compromise.  But honestly, when you use
ExecuteStreamCommand you're stepping a bit outside of NiFi, and its
probably going to be less than ideal.  As Joe said, writing your own
processor would allow you to handle things a bit more directly and
efficiently.  However, ExecuteStreamCommand allows you to put together a
proof of concept, and from there you can decide if the gains from writing
your own processor are worth it for your case.  Let us know if there's
anything else we can do to help.

Brandon

On Thu, Apr 23, 2015 at 12:59 PM mbwagne <[hidden email]> wrote:

> Brandon's approach did work. Thanks Brandon!
>
> My concern is I'm tarring up a directory in a shell script just to unpack
> in
> a nifi processor. That seems like a lot of unnecessary IO. I wish I could
> trigger GetFile on the completion of the ExecuteStreamCommand via the
> success route.
>
>
>
> --
> View this message in context:
> http://apache-nifi-incubating-developer-list.39713.n7.nabble.com/ExecuteStreamCommand-with-output-directory-tp1177p1219.html
> Sent from the Apache NiFi (incubating) Developer List mailing list archive
> at Nabble.com.
>