ConvertCSVtoAvro | support for "||" delimiter

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ConvertCSVtoAvro | support for "||" delimiter

shweta
Hi All,

It seems "ConvertCSVtoAvro" only support single character as delimiter in Nifi. Is there a way to specify "||"
delimiter.

Thanks,
Shweta
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

trkurc
Administrator
With that processor alone it doesn't appear so. The validator for that
property requires it to be one character.
On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:

> Hi All,
>
> It seems "ConvertCSVtoAvro" only support single character as delimiter in
> Nifi. Is there a way to specify "||"
> delimiter.
>
> Thanks,
> Shweta
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Joe Witt
Not a direct answer but:
  With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
have a great option in scripting (Lua, Python, Ruby, Groovy,
Javascript) that will let you rapidly get past these hurdles without
having to build your own custom processor until you are sure what you
need.

Thanks
Joe

On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:

> With that processor alone it doesn't appear so. The validator for that
> property requires it to be one character.
> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>
>> Hi All,
>>
>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>> Nifi. Is there a way to specify "||"
>> delimiter.
>>
>> Thanks,
>> Shweta
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Joe Percivall
A more direct work-around would be to use the ReplaceText processor first in order to change instances of "||" to "|" so that it can be used by ConvertCSVtoAvro.
 
Joe- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: [hidden email]




On Thursday, February 4, 2016 9:50 AM, Joe Witt <[hidden email]> wrote:
Not a direct answer but:
  With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
have a great option in scripting (Lua, Python, Ruby, Groovy,
Javascript) that will let you rapidly get past these hurdles without
having to build your own custom processor until you are sure what you
need.

Thanks
Joe


On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:

> With that processor alone it doesn't appear so. The validator for that
> property requires it to be one character.
> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>
>> Hi All,
>>
>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>> Nifi. Is there a way to specify "||"
>> delimiter.
>>
>> Thanks,
>> Shweta
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Ryan Blue
In reply to this post by Joe Witt
The underlying CSV library only supports a single-character delimiter,
so it would be a bit of work to allow multi-char delimiters. Another
solution is to use | as your delimiter and simply account for that in
your file header. Everything is mapped by name, so you'd just have a
bunch of columns named "" and it should work fine otherwise.

That may not work if your delimiter is || because you might have | in
your data, though. If that's the case, then I'd go with the suggestion
from Joe to replace "||" with a single-character delimiter that you
won't see in the data, like ☃.

rb

On 02/04/2016 06:50 AM, Joe Witt wrote:

> Not a direct answer but:
>    With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
> have a great option in scripting (Lua, Python, Ruby, Groovy,
> Javascript) that will let you rapidly get past these hurdles without
> having to build your own custom processor until you are sure what you
> need.
>
> Thanks
> Joe
>
> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:
>> With that processor alone it doesn't appear so. The validator for that
>> property requires it to be one character.
>> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>>
>>> Hi All,
>>>
>>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>>> Nifi. Is there a way to specify "||"
>>> delimiter.
>>>
>>> Thanks,
>>> Shweta
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>>>


--
Ryan Blue
Software Engineer
Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Alan Jackoway
Though I love the concept of ☃ as your separator, my belief is that the
correct way to do this to replace your custom delimiter with the ones that
are defined in ASCII (and therefore extremely unlikely to appear in your
data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

That said, I have not actually tried this with NiFi, so I don't know how
easy it is to specify ASCII character 31 as your separator in the UI.

On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <[hidden email]> wrote:

> The underlying CSV library only supports a single-character delimiter, so
> it would be a bit of work to allow multi-char delimiters. Another solution
> is to use | as your delimiter and simply account for that in your file
> header. Everything is mapped by name, so you'd just have a bunch of columns
> named "" and it should work fine otherwise.
>
> That may not work if your delimiter is || because you might have | in your
> data, though. If that's the case, then I'd go with the suggestion from Joe
> to replace "||" with a single-character delimiter that you won't see in the
> data, like ☃.
>
> rb
>
>
> On 02/04/2016 06:50 AM, Joe Witt wrote:
>
>> Not a direct answer but:
>>    With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>> Javascript) that will let you rapidly get past these hurdles without
>> having to build your own custom processor until you are sure what you
>> need.
>>
>> Thanks
>> Joe
>>
>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:
>>
>>> With that processor alone it doesn't appear so. The validator for that
>>> property requires it to be one character.
>>> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>>>
>>> Hi All,
>>>>
>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>> in
>>>> Nifi. Is there a way to specify "||"
>>>> delimiter.
>>>>
>>>> Thanks,
>>>> Shweta
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Ryan Blue
I didn't know there was a unit separator character, thanks for the
suggestion. I think I have a lot of ☃ to replace.

If you can paste the unit separator character in, then it should work.
The underlying code supports escape sequences, like \t, but the
validation doesn't take those into account yet. That would be a good
starter contribution for someone out there...

rb

On 02/04/2016 12:39 PM, Alan Jackoway wrote:

> Though I love the concept of ☃ as your separator, my belief is that the
> correct way to do this to replace your custom delimiter with the ones that
> are defined in ASCII (and therefore extremely unlikely to appear in your
> data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text
>
> That said, I have not actually tried this with NiFi, so I don't know how
> easy it is to specify ASCII character 31 as your separator in the UI.
>
> On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <[hidden email]> wrote:
>
>> The underlying CSV library only supports a single-character delimiter, so
>> it would be a bit of work to allow multi-char delimiters. Another solution
>> is to use | as your delimiter and simply account for that in your file
>> header. Everything is mapped by name, so you'd just have a bunch of columns
>> named "" and it should work fine otherwise.
>>
>> That may not work if your delimiter is || because you might have | in your
>> data, though. If that's the case, then I'd go with the suggestion from Joe
>> to replace "||" with a single-character delimiter that you won't see in the
>> data, like ☃.
>>
>> rb
>>
>>
>> On 02/04/2016 06:50 AM, Joe Witt wrote:
>>
>>> Not a direct answer but:
>>>     With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>>> Javascript) that will let you rapidly get past these hurdles without
>>> having to build your own custom processor until you are sure what you
>>> need.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:
>>>
>>>> With that processor alone it doesn't appear so. The validator for that
>>>> property requires it to be one character.
>>>> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>>>>
>>>> Hi All,
>>>>>
>>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>>> in
>>>>> Nifi. Is there a way to specify "||"
>>>>> delimiter.
>>>>>
>>>>> Thanks,
>>>>> Shweta
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>


--
Ryan Blue
Software Engineer
Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

trkurc
Administrator
I believe the processor supports \uXXXX notation for a delimiter as part of
a PR for 0.4.X.

On Thu, Feb 4, 2016 at 4:11 PM, Ryan Blue <[hidden email]> wrote:

> I didn't know there was a unit separator character, thanks for the
> suggestion. I think I have a lot of ☃ to replace.
>
> If you can paste the unit separator character in, then it should work. The
> underlying code supports escape sequences, like \t, but the validation
> doesn't take those into account yet. That would be a good starter
> contribution for someone out there...
>
> rb
>
>
> On 02/04/2016 12:39 PM, Alan Jackoway wrote:
>
>> Though I love the concept of ☃ as your separator, my belief is that the
>> correct way to do this to replace your custom delimiter with the ones that
>> are defined in ASCII (and therefore extremely unlikely to appear in your
>> data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text
>>
>> That said, I have not actually tried this with NiFi, so I don't know how
>> easy it is to specify ASCII character 31 as your separator in the UI.
>>
>> On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <[hidden email]> wrote:
>>
>> The underlying CSV library only supports a single-character delimiter, so
>>> it would be a bit of work to allow multi-char delimiters. Another
>>> solution
>>> is to use | as your delimiter and simply account for that in your file
>>> header. Everything is mapped by name, so you'd just have a bunch of
>>> columns
>>> named "" and it should work fine otherwise.
>>>
>>> That may not work if your delimiter is || because you might have | in
>>> your
>>> data, though. If that's the case, then I'd go with the suggestion from
>>> Joe
>>> to replace "||" with a single-character delimiter that you won't see in
>>> the
>>> data, like ☃.
>>>
>>> rb
>>>
>>>
>>> On 02/04/2016 06:50 AM, Joe Witt wrote:
>>>
>>> Not a direct answer but:
>>>>     With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>>>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>>>> Javascript) that will let you rapidly get past these hurdles without
>>>> having to build your own custom processor until you are sure what you
>>>> need.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <[hidden email]> wrote:
>>>>
>>>> With that processor alone it doesn't appear so. The validator for that
>>>>> property requires it to be one character.
>>>>> On Feb 3, 2016 1:01 AM, "shweta" <[hidden email]> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>>>
>>>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>>>> in
>>>>>> Nifi. Is there a way to specify "||"
>>>>>> delimiter.
>>>>>>
>>>>>> Thanks,
>>>>>> Shweta
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>>
>>>>>>
>>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>>
>>>>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>
Reply | Threaded
Open this post in threaded view
|

Re: ConvertCSVtoAvro | support for "||" delimiter

Thad Guidry
Non Printable characters are also great to use for this usecase (The ASCII
Control Characters which were designed for exactly this in the early days
of computing!)  (Just copy and paste from an editor or Notepad... on
Windows you can get Char 2 by holding down ALT and then using Numeric
Keypad to type 002 )

CHAR 2 (traditionally the Start Of Text or STX) is a great delimiter.
 \u0002

http://www.fileformat.info/info/unicode/char/0002/index.htm

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>