Sending Parquet files to S3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Sending Parquet files to S3

shitij
Hi,
For sending parquet files to s3, can I use the PutParquet processor directly, giving it an s3 path or do I first write to HDFS and then use PutS3Object?
Reply | Threaded
Open this post in threaded view
|

Re: Sending Parquet files to S3

Bryan Bende
Hello,

The PutParquet processor uses the Hadoop client to write to a filesystem.

For example, to write to HDFS you would have a core-site.xml with a
filesystem like:

<property>
      <name>fs.defaultFS</name>
      <value>hdfs://yourhost</value>
    </property>

And to write to a local filesystem you could have a core-site.xml with:

<property>
        <name>fs.defaultFS</name>
        <value>file:///</value>
    </property>

If there is a way to declare s3 as a filesystem, then I would expect
it to work, but I am not familiar with doing that.

The alternative would be what you suggested where you would write to
HDFS first, and then use ListHDFS -> FetchHDFS -> PutS3Object.

Thanks,

Bryan


On Fri, Jul 28, 2017 at 2:17 PM, shitij <[hidden email]> wrote:

> Hi,
> For sending parquet files to s3, can I use the PutParquet processor
> directly, giving it an s3 path or do I first write to HDFS and then use
> PutS3Object?
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Sending-Parquet-files-to-S3-tp16525.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.