PutParquet fails - "Could not rename file"

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

PutParquet fails - "Could not rename file"

VinShar
Hi,

I am exploring NiFi and was trying to use it to save data in HDFS in Parquet
format. I used PutParquet processor for the same. I am able to write a new
file to HDFS but if i try to overwrite an existing one then i get below
exception ("Overwrite Files" property of processor was set to true). I have
also attached screenshot of flow. I used UpdateAttribute processor to rename
flow file to a constant name so that flow always overwrites existing file. I
can see dot file getting created in hdfs but it gets deleted after failure
which i think is right. File permissions are not an issue, file was created
by NiFi and if i use GetHDFS processor then it is able to get the file and
delete it from HDFS.

I will appreciate any pointers to resolve this issue.

2017-11-30 19:49:12,900 ERROR [Timer-Driven Process Thread-5]
o.a.nifi.processors.parquet.PutParquet
PutParquet[id=46f35988-1e6a-36ec-89ff-6400608bee87] Failed to write due to
org.apache.nifi.processors.hadoop.exception.FailureException: Could not
rename file /user/nifi/.gg_usr_test to its final filename: {}
org.apache.nifi.processors.hadoop.exception.FailureException: Could not
rename file /user/nifi/.gg_usr_test to its final filename
        at
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.rename(AbstractPutHDFSRecord.java:420)
        at
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:345)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)


<http://apache-nifi-developer-list.39713.n7.nabble.com/file/t849/Capture.png>




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: PutParquet fails - "Could not rename file"

Bryan Bende
Hello,

I haven't verified this against HDFS yet, but this may be a bug in the
processor...

The value of "Overwrite Files" is being passed to the Parquet Writer
to put it in "overwrite" mode, but since we first write a temp file,
but this would only help to overwrite the temp file if it was already
there.

Then we try to rename the temp file to the final name, and at this
point it fails because a file with the final name already exists. We
should be deleting the existing file before the rename when the file
already exists.

There is a unit test that tests this and passes, but it would be
running against a local filesystem so maybe it somehow behaves
differently than HDFS would.

-Bryan


On Thu, Nov 30, 2017 at 5:08 PM, VinShar <[hidden email]> wrote:

> Hi,
>
> I am exploring NiFi and was trying to use it to save data in HDFS in Parquet
> format. I used PutParquet processor for the same. I am able to write a new
> file to HDFS but if i try to overwrite an existing one then i get below
> exception ("Overwrite Files" property of processor was set to true). I have
> also attached screenshot of flow. I used UpdateAttribute processor to rename
> flow file to a constant name so that flow always overwrites existing file. I
> can see dot file getting created in hdfs but it gets deleted after failure
> which i think is right. File permissions are not an issue, file was created
> by NiFi and if i use GetHDFS processor then it is able to get the file and
> delete it from HDFS.
>
> I will appreciate any pointers to resolve this issue.
>
> 2017-11-30 19:49:12,900 ERROR [Timer-Driven Process Thread-5]
> o.a.nifi.processors.parquet.PutParquet
> PutParquet[id=46f35988-1e6a-36ec-89ff-6400608bee87] Failed to write due to
> org.apache.nifi.processors.hadoop.exception.FailureException: Could not
> rename file /user/nifi/.gg_usr_test to its final filename: {}
> org.apache.nifi.processors.hadoop.exception.FailureException: Could not
> rename file /user/nifi/.gg_usr_test to its final filename
>         at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.rename(AbstractPutHDFSRecord.java:420)
>         at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:345)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>
>
> <http://apache-nifi-developer-list.39713.n7.nabble.com/file/t849/Capture.png>
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: PutParquet fails - "Could not rename file"

VinShar
Thanks for Reply. Actually I saw some posts where people wanted their files
not to be overwritten by PutParquet so i thought that may be i have
something wrong in configuration of flow.

I know putParquet internally renames file on HDFS but is there a processor
that i can use in my flow to rename a file on HDFS? i see processors to get,
fetch, delete and put on HDFS but couldn't figure out a way to rename. If
this is a defect then I can save file with a different name, delete existing
file through DeleteHDFS and then rename new file to file i deleted.
If there no processor to rename file then i will try to modify the code or
create a new processor by extending existing one.





--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: PutParquet fails - "Could not rename file"

Bryan Bende
I think there is an open PR for a "MoveHDFS" processor that might do
what you are describing, but currently I think you'd have to use
ExecuteScript to issue an hdfs mv command.

If you are interested in trying to fix the code for PutParquet, then I
would suggest trying to add an overwrite parameter to this method:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-hadoop-record-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractPutHDFSRecord.java#L408

Inside that method, if overwrite is true then we just need to delete
destFile before calling rename.

The value to pass in for the overwrite parameter is already available:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-hadoop-record-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractPutHDFSRecord.java#L288



On Thu, Nov 30, 2017 at 10:14 PM, VinShar <[hidden email]> wrote:

> Thanks for Reply. Actually I saw some posts where people wanted their files
> not to be overwritten by PutParquet so i thought that may be i have
> something wrong in configuration of flow.
>
> I know putParquet internally renames file on HDFS but is there a processor
> that i can use in my flow to rename a file on HDFS? i see processors to get,
> fetch, delete and put on HDFS but couldn't figure out a way to rename. If
> this is a defect then I can save file with a different name, delete existing
> file through DeleteHDFS and then rename new file to file i deleted.
> If there no processor to rename file then i will try to modify the code or
> create a new processor by extending existing one.
>
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/