Merging csv files based on criterias

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging csv files based on criterias

Yaismel Miranda Pons
Hi all,

I am testing nifi and I am wondering if it is possible to merge two CSV
files based on specific criterias.
For example, given these csv files:

business.csv
businessid,businessname,categoryid
1,McDonalds,1
2,Burger King,1
3,Walmart,2
4,Publix,2

categories.csv
categoryid,categoryname
1,Fast food chain
2,Super market

I would like to know if there is an effective way in nifi to combine both
csv files using the categoryid as criteria in this case:

generated.csv
businessid,businessname,categoryid,categoryname
1,McDonalds,1,Fast food chain
2,Burger King,1,Fast food chain
3,Walmart,2,Super market
4,Publix,2,Super market

Thank you for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Merging csv files based on criterias

Joe Witt
Hello

Certainly ways to slice this but it would be helpful to understand
your use case a bit more in the context of an automated flow of data.
Can you describe how this applies in that context?

If you had two streams of data feeding in and can pair of data from
one stream with data from another stream and run them through a sort
of combiner function then this could be fairly straightforward but
does require building a processor that doesn't exist as of now (as far
as I know).

But let's understand the context of your use case a bit more to see if
helping with a NiFi answer is the right thing or not.

Thanks
Joe

On Thu, Nov 12, 2015 at 10:02 AM, Yaismel Miranda Pons
<[hidden email]> wrote:

> Hi all,
>
> I am testing nifi and I am wondering if it is possible to merge two CSV
> files based on specific criterias.
> For example, given these csv files:
>
> business.csv
> businessid,businessname,categoryid
> 1,McDonalds,1
> 2,Burger King,1
> 3,Walmart,2
> 4,Publix,2
>
> categories.csv
> categoryid,categoryname
> 1,Fast food chain
> 2,Super market
>
> I would like to know if there is an effective way in nifi to combine both
> csv files using the categoryid as criteria in this case:
>
> generated.csv
> businessid,businessname,categoryid,categoryname
> 1,McDonalds,1,Fast food chain
> 2,Burger King,1,Fast food chain
> 3,Walmart,2,Super market
> 4,Publix,2,Super market
>
> Thank you for your time.
Reply | Threaded
Open this post in threaded view
|

Re: Merging csv files based on criterias

Yaismel Miranda Pons
Hi Joe, thanks for taking the time to answer.

This is the scenario I'm trying to accomplish with nifi: I want to create a simple dataflow for automating the process of ingesting CSV data found in some datasets. The dataset could come from either an Http endpoint or just be csv files in a directory and it has to be ingested every month.
I was able to implement this scenario with nifi when the data is just a single CSV file, but I have some cases where can be 2 or more CSV files related. I would like to know if there is an effective way in nifi to combine these CSV files into a single one, based on specific criterias. Each file contains around 15 million records.

Thanks
Yaismel
Reply | Threaded
Open this post in threaded view
|

Re: Merging csv files based on criterias

Joe Witt
Yaismel,

My best guess is that to best accomplish this it would require custom
coding to handling the merging logic.  As described thus far I believe
I understand the use case but I still have a lot of questions about
frequency of arrival for each dataset, how to handle misses (where one
side doesn't have a reference for a row in the other side), what to do
with out of order arrival, etc..

Thanks
joe

On Fri, Nov 13, 2015 at 6:18 PM, Yaismel Miranda Pons
<[hidden email]> wrote:

> Hi Joe, thanks for taking the time to answer.
>
> This is the scenario I'm trying to accomplish with nifi: I want to create a
> simple dataflow for automating the process of ingesting CSV data found in
> some datasets. The dataset could come from either an Http endpoint or just
> be csv files in a directory and it has to be ingested every month.
> I was able to implement this scenario with nifi when the data is just a
> single CSV file, but I have some cases where can be 2 or more CSV files
> related. I would like to know if there is an effective way in nifi to
> combine these CSV files into a single one, based on specific criterias. Each
> file contains around 15 million records.
>
> Thanks
> Yaismel
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Merging-csv-files-based-on-criterias-tp4711p4873.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.