MergeContent Demarcator Question

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

MergeContent Demarcator Question

Bryan Bende
I'm trying to use MergeContent to merge json documents. I have the Header.
Demarcator, and Footer properties pointing to files with [ , ]
respectively. I left all other properties the same, and set Max Entries to
5 and Max Bin Age to 10 seconds.

I have a simple flow with ListenUDP -> MergeContent -> PutSolrContentStream
(from the pull request). If I send a bunch of json documents over UDP, most
of them will merge correctly, but I'll see a couple where the demarcator
didn't get inserted between two json documents.

Any thoughts as to why this would happen?

I added a significant amount of logging to the getDescriptorFileContent()
method in MergeContent to see if there was a reason why it would return
null for the demarcator, but nothing obvious is really jumping out at me.
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Joe Witt
Are you sure you're not sending the [ , ] over UDP as well ;-)

Can you create a template of your flow and send it over?  Perhaps just
attach to a JIRA for this.  MergeContent is a powerful and useful
thing so if you're seeing funky behavior we want to sort it out
quickly.

On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]> wrote:

> I'm trying to use MergeContent to merge json documents. I have the Header.
> Demarcator, and Footer properties pointing to files with [ , ]
> respectively. I left all other properties the same, and set Max Entries to
> 5 and Max Bin Age to 10 seconds.
>
> I have a simple flow with ListenUDP -> MergeContent -> PutSolrContentStream
> (from the pull request). If I send a bunch of json documents over UDP, most
> of them will merge correctly, but I'll see a couple where the demarcator
> didn't get inserted between two json documents.
>
> Any thoughts as to why this would happen?
>
> I added a significant amount of logging to the getDescriptorFileContent()
> method in MergeContent to see if there was a reason why it would return
> null for the demarcator, but nothing obvious is really jumping out at me.
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Michael Moser
At first glance, I would suspect ListenUDP is placing more than one UDP
datagram into one flowfile.  It might be worth spending some time checking
if that can happen.

-- Mike


On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:

> Are you sure you're not sending the [ , ] over UDP as well ;-)
>
> Can you create a template of your flow and send it over?  Perhaps just
> attach to a JIRA for this.  MergeContent is a powerful and useful
> thing so if you're seeing funky behavior we want to sort it out
> quickly.
>
> On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]> wrote:
> > I'm trying to use MergeContent to merge json documents. I have the
> Header.
> > Demarcator, and Footer properties pointing to files with [ , ]
> > respectively. I left all other properties the same, and set Max Entries
> to
> > 5 and Max Bin Age to 10 seconds.
> >
> > I have a simple flow with ListenUDP -> MergeContent ->
> PutSolrContentStream
> > (from the pull request). If I send a bunch of json documents over UDP,
> most
> > of them will merge correctly, but I'll see a couple where the demarcator
> > didn't get inserted between two json documents.
> >
> > Any thoughts as to why this would happen?
> >
> > I added a significant amount of logging to the getDescriptorFileContent()
> > method in MergeContent to see if there was a reason why it would return
> > null for the demarcator, but nothing obvious is really jumping out at me.
>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Bryan Bende
Thanks for the suggestions... looks like it is in fact coming out of
ListenUDP like that. I'll try to figure out if this is expected behavior,
or possibly something with how the messages are being sent.

Sorry for the false alarm about MergeContent.

On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]> wrote:

> At first glance, I would suspect ListenUDP is placing more than one UDP
> datagram into one flowfile.  It might be worth spending some time checking
> if that can happen.
>
> -- Mike
>
>
> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
>
> > Are you sure you're not sending the [ , ] over UDP as well ;-)
> >
> > Can you create a template of your flow and send it over?  Perhaps just
> > attach to a JIRA for this.  MergeContent is a powerful and useful
> > thing so if you're seeing funky behavior we want to sort it out
> > quickly.
> >
> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]> wrote:
> > > I'm trying to use MergeContent to merge json documents. I have the
> > Header.
> > > Demarcator, and Footer properties pointing to files with [ , ]
> > > respectively. I left all other properties the same, and set Max Entries
> > to
> > > 5 and Max Bin Age to 10 seconds.
> > >
> > > I have a simple flow with ListenUDP -> MergeContent ->
> > PutSolrContentStream
> > > (from the pull request). If I send a bunch of json documents over UDP,
> > most
> > > of them will merge correctly, but I'll see a couple where the
> demarcator
> > > didn't get inserted between two json documents.
> > >
> > > Any thoughts as to why this would happen?
> > >
> > > I added a significant amount of logging to the
> getDescriptorFileContent()
> > > method in MergeContent to see if there was a reason why it would return
> > > null for the demarcator, but nothing obvious is really jumping out at
> me.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Joe Witt
Mike Moser: Great thinking!

Bryan

Taken from listen udp docs:  "This processor listens for Datagram
Packets on a given port and concatenates the contents of those packets
together generating flow files roughly as often as the internal buffer
fills up or until no more data is currently available."

Quite honestly when this processor was originally built NiFi didn't
have the ability to do the sort of fancy 'slab allocation' mechanism
it supports today when generating a stream of flow files.  So we could
probably pretty easily reimplement this to behave more like you were
thinking it should.  But it is probably worth a bit of
discussion/exploration to see what makes sense.  The case we built it
for was data arriving in UDP packets and it was structured in such a
way that simple binary concatenation was sufficient because the data
was inherently demarcatable/stream processing friendly.  We could,
however, implement it now such that each UDP datagram becomes a flow
file.  But not sure that makes sense either.  This is sort of the
inherent challenge of providing a raw socket listener.  If the 'thing'
being exchanged is not clear then we're not sure what the boundary of
a given flow file should be.

I'll stop rambling: Please if you would describe the use case a bit
more we can think about whether providing a mode of 'datagram =
flowfile' makes sense.

Thanks!
Joe

On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <[hidden email]> wrote:

> Thanks for the suggestions... looks like it is in fact coming out of
> ListenUDP like that. I'll try to figure out if this is expected behavior,
> or possibly something with how the messages are being sent.
>
> Sorry for the false alarm about MergeContent.
>
> On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]> wrote:
>
>> At first glance, I would suspect ListenUDP is placing more than one UDP
>> datagram into one flowfile.  It might be worth spending some time checking
>> if that can happen.
>>
>> -- Mike
>>
>>
>> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
>>
>> > Are you sure you're not sending the [ , ] over UDP as well ;-)
>> >
>> > Can you create a template of your flow and send it over?  Perhaps just
>> > attach to a JIRA for this.  MergeContent is a powerful and useful
>> > thing so if you're seeing funky behavior we want to sort it out
>> > quickly.
>> >
>> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]> wrote:
>> > > I'm trying to use MergeContent to merge json documents. I have the
>> > Header.
>> > > Demarcator, and Footer properties pointing to files with [ , ]
>> > > respectively. I left all other properties the same, and set Max Entries
>> > to
>> > > 5 and Max Bin Age to 10 seconds.
>> > >
>> > > I have a simple flow with ListenUDP -> MergeContent ->
>> > PutSolrContentStream
>> > > (from the pull request). If I send a bunch of json documents over UDP,
>> > most
>> > > of them will merge correctly, but I'll see a couple where the
>> demarcator
>> > > didn't get inserted between two json documents.
>> > >
>> > > Any thoughts as to why this would happen?
>> > >
>> > > I added a significant amount of logging to the
>> getDescriptorFileContent()
>> > > method in MergeContent to see if there was a reason why it would return
>> > > null for the demarcator, but nothing obvious is really jumping out at
>> me.
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Bryan Bende
Joe,

Thanks for the background on ListenUDP.

The use case I was thinking of was log aggregation... most logging
frameworks like logback, log4j, etc., have a UDP appender, and they also
generally have a json format/layout that conforms with the "logstash"
format. I was thinking it would be cool to be able to use NiFi as an
alternative to logstash, flume, and whatever other technologies are being
used to get logs into a central location. There are obviously other options
besides udp, but it seemed easy and well supported.

Maybe a property on the processor could control whether or not it buffered
datagrams vs producing a new FlowFile for each datagram?

-Bryan



On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <[hidden email]> wrote:

> Mike Moser: Great thinking!
>
> Bryan
>
> Taken from listen udp docs:  "This processor listens for Datagram
> Packets on a given port and concatenates the contents of those packets
> together generating flow files roughly as often as the internal buffer
> fills up or until no more data is currently available."
>
> Quite honestly when this processor was originally built NiFi didn't
> have the ability to do the sort of fancy 'slab allocation' mechanism
> it supports today when generating a stream of flow files.  So we could
> probably pretty easily reimplement this to behave more like you were
> thinking it should.  But it is probably worth a bit of
> discussion/exploration to see what makes sense.  The case we built it
> for was data arriving in UDP packets and it was structured in such a
> way that simple binary concatenation was sufficient because the data
> was inherently demarcatable/stream processing friendly.  We could,
> however, implement it now such that each UDP datagram becomes a flow
> file.  But not sure that makes sense either.  This is sort of the
> inherent challenge of providing a raw socket listener.  If the 'thing'
> being exchanged is not clear then we're not sure what the boundary of
> a given flow file should be.
>
> I'll stop rambling: Please if you would describe the use case a bit
> more we can think about whether providing a mode of 'datagram =
> flowfile' makes sense.
>
> Thanks!
> Joe
>
> On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <[hidden email]> wrote:
> > Thanks for the suggestions... looks like it is in fact coming out of
> > ListenUDP like that. I'll try to figure out if this is expected behavior,
> > or possibly something with how the messages are being sent.
> >
> > Sorry for the false alarm about MergeContent.
> >
> > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]>
> wrote:
> >
> >> At first glance, I would suspect ListenUDP is placing more than one UDP
> >> datagram into one flowfile.  It might be worth spending some time
> checking
> >> if that can happen.
> >>
> >> -- Mike
> >>
> >>
> >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
> >>
> >> > Are you sure you're not sending the [ , ] over UDP as well ;-)
> >> >
> >> > Can you create a template of your flow and send it over?  Perhaps just
> >> > attach to a JIRA for this.  MergeContent is a powerful and useful
> >> > thing so if you're seeing funky behavior we want to sort it out
> >> > quickly.
> >> >
> >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]>
> wrote:
> >> > > I'm trying to use MergeContent to merge json documents. I have the
> >> > Header.
> >> > > Demarcator, and Footer properties pointing to files with [ , ]
> >> > > respectively. I left all other properties the same, and set Max
> Entries
> >> > to
> >> > > 5 and Max Bin Age to 10 seconds.
> >> > >
> >> > > I have a simple flow with ListenUDP -> MergeContent ->
> >> > PutSolrContentStream
> >> > > (from the pull request). If I send a bunch of json documents over
> UDP,
> >> > most
> >> > > of them will merge correctly, but I'll see a couple where the
> >> demarcator
> >> > > didn't get inserted between two json documents.
> >> > >
> >> > > Any thoughts as to why this would happen?
> >> > >
> >> > > I added a significant amount of logging to the
> >> getDescriptorFileContent()
> >> > > method in MergeContent to see if there was a reason why it would
> return
> >> > > null for the demarcator, but nothing obvious is really jumping out
> at
> >> me.
> >> >
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Joey Echeverria-2
A syslog processor would be useful for log aggregation. I'm pretty sure
that log4j, etc. have native syslog appenders.

-Joey
On Sat, Apr 25, 2015 at 12:13 Bryan Bende <[hidden email]> wrote:

> Joe,
>
> Thanks for the background on ListenUDP.
>
> The use case I was thinking of was log aggregation... most logging
> frameworks like logback, log4j, etc., have a UDP appender, and they also
> generally have a json format/layout that conforms with the "logstash"
> format. I was thinking it would be cool to be able to use NiFi as an
> alternative to logstash, flume, and whatever other technologies are being
> used to get logs into a central location. There are obviously other options
> besides udp, but it seemed easy and well supported.
>
> Maybe a property on the processor could control whether or not it buffered
> datagrams vs producing a new FlowFile for each datagram?
>
> -Bryan
>
>
>
> On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <[hidden email]> wrote:
>
> > Mike Moser: Great thinking!
> >
> > Bryan
> >
> > Taken from listen udp docs:  "This processor listens for Datagram
> > Packets on a given port and concatenates the contents of those packets
> > together generating flow files roughly as often as the internal buffer
> > fills up or until no more data is currently available."
> >
> > Quite honestly when this processor was originally built NiFi didn't
> > have the ability to do the sort of fancy 'slab allocation' mechanism
> > it supports today when generating a stream of flow files.  So we could
> > probably pretty easily reimplement this to behave more like you were
> > thinking it should.  But it is probably worth a bit of
> > discussion/exploration to see what makes sense.  The case we built it
> > for was data arriving in UDP packets and it was structured in such a
> > way that simple binary concatenation was sufficient because the data
> > was inherently demarcatable/stream processing friendly.  We could,
> > however, implement it now such that each UDP datagram becomes a flow
> > file.  But not sure that makes sense either.  This is sort of the
> > inherent challenge of providing a raw socket listener.  If the 'thing'
> > being exchanged is not clear then we're not sure what the boundary of
> > a given flow file should be.
> >
> > I'll stop rambling: Please if you would describe the use case a bit
> > more we can think about whether providing a mode of 'datagram =
> > flowfile' makes sense.
> >
> > Thanks!
> > Joe
> >
> > On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <[hidden email]> wrote:
> > > Thanks for the suggestions... looks like it is in fact coming out of
> > > ListenUDP like that. I'll try to figure out if this is expected
> behavior,
> > > or possibly something with how the messages are being sent.
> > >
> > > Sorry for the false alarm about MergeContent.
> > >
> > > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]>
> > wrote:
> > >
> > >> At first glance, I would suspect ListenUDP is placing more than one
> UDP
> > >> datagram into one flowfile.  It might be worth spending some time
> > checking
> > >> if that can happen.
> > >>
> > >> -- Mike
> > >>
> > >>
> > >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
> > >>
> > >> > Are you sure you're not sending the [ , ] over UDP as well ;-)
> > >> >
> > >> > Can you create a template of your flow and send it over?  Perhaps
> just
> > >> > attach to a JIRA for this.  MergeContent is a powerful and useful
> > >> > thing so if you're seeing funky behavior we want to sort it out
> > >> > quickly.
> > >> >
> > >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]>
> > wrote:
> > >> > > I'm trying to use MergeContent to merge json documents. I have the
> > >> > Header.
> > >> > > Demarcator, and Footer properties pointing to files with [ , ]
> > >> > > respectively. I left all other properties the same, and set Max
> > Entries
> > >> > to
> > >> > > 5 and Max Bin Age to 10 seconds.
> > >> > >
> > >> > > I have a simple flow with ListenUDP -> MergeContent ->
> > >> > PutSolrContentStream
> > >> > > (from the pull request). If I send a bunch of json documents over
> > UDP,
> > >> > most
> > >> > > of them will merge correctly, but I'll see a couple where the
> > >> demarcator
> > >> > > didn't get inserted between two json documents.
> > >> > >
> > >> > > Any thoughts as to why this would happen?
> > >> > >
> > >> > > I added a significant amount of logging to the
> > >> getDescriptorFileContent()
> > >> > > method in MergeContent to see if there was a reason why it would
> > return
> > >> > > null for the demarcator, but nothing obvious is really jumping out
> > at
> > >> me.
> > >> >
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Joe Witt
Roger that.  And I think adding a property to listen udp to treat
datagrams as flowfiles rather than a set of datagrams as a flowfile
would be very doable.

On Sat, Apr 25, 2015 at 9:12 PM, Joey Echeverria <[hidden email]> wrote:

> A syslog processor would be useful for log aggregation. I'm pretty sure
> that log4j, etc. have native syslog appenders.
>
> -Joey
> On Sat, Apr 25, 2015 at 12:13 Bryan Bende <[hidden email]> wrote:
>
>> Joe,
>>
>> Thanks for the background on ListenUDP.
>>
>> The use case I was thinking of was log aggregation... most logging
>> frameworks like logback, log4j, etc., have a UDP appender, and they also
>> generally have a json format/layout that conforms with the "logstash"
>> format. I was thinking it would be cool to be able to use NiFi as an
>> alternative to logstash, flume, and whatever other technologies are being
>> used to get logs into a central location. There are obviously other options
>> besides udp, but it seemed easy and well supported.
>>
>> Maybe a property on the processor could control whether or not it buffered
>> datagrams vs producing a new FlowFile for each datagram?
>>
>> -Bryan
>>
>>
>>
>> On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <[hidden email]> wrote:
>>
>> > Mike Moser: Great thinking!
>> >
>> > Bryan
>> >
>> > Taken from listen udp docs:  "This processor listens for Datagram
>> > Packets on a given port and concatenates the contents of those packets
>> > together generating flow files roughly as often as the internal buffer
>> > fills up or until no more data is currently available."
>> >
>> > Quite honestly when this processor was originally built NiFi didn't
>> > have the ability to do the sort of fancy 'slab allocation' mechanism
>> > it supports today when generating a stream of flow files.  So we could
>> > probably pretty easily reimplement this to behave more like you were
>> > thinking it should.  But it is probably worth a bit of
>> > discussion/exploration to see what makes sense.  The case we built it
>> > for was data arriving in UDP packets and it was structured in such a
>> > way that simple binary concatenation was sufficient because the data
>> > was inherently demarcatable/stream processing friendly.  We could,
>> > however, implement it now such that each UDP datagram becomes a flow
>> > file.  But not sure that makes sense either.  This is sort of the
>> > inherent challenge of providing a raw socket listener.  If the 'thing'
>> > being exchanged is not clear then we're not sure what the boundary of
>> > a given flow file should be.
>> >
>> > I'll stop rambling: Please if you would describe the use case a bit
>> > more we can think about whether providing a mode of 'datagram =
>> > flowfile' makes sense.
>> >
>> > Thanks!
>> > Joe
>> >
>> > On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <[hidden email]> wrote:
>> > > Thanks for the suggestions... looks like it is in fact coming out of
>> > > ListenUDP like that. I'll try to figure out if this is expected
>> behavior,
>> > > or possibly something with how the messages are being sent.
>> > >
>> > > Sorry for the false alarm about MergeContent.
>> > >
>> > > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]>
>> > wrote:
>> > >
>> > >> At first glance, I would suspect ListenUDP is placing more than one
>> UDP
>> > >> datagram into one flowfile.  It might be worth spending some time
>> > checking
>> > >> if that can happen.
>> > >>
>> > >> -- Mike
>> > >>
>> > >>
>> > >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
>> > >>
>> > >> > Are you sure you're not sending the [ , ] over UDP as well ;-)
>> > >> >
>> > >> > Can you create a template of your flow and send it over?  Perhaps
>> just
>> > >> > attach to a JIRA for this.  MergeContent is a powerful and useful
>> > >> > thing so if you're seeing funky behavior we want to sort it out
>> > >> > quickly.
>> > >> >
>> > >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]>
>> > wrote:
>> > >> > > I'm trying to use MergeContent to merge json documents. I have the
>> > >> > Header.
>> > >> > > Demarcator, and Footer properties pointing to files with [ , ]
>> > >> > > respectively. I left all other properties the same, and set Max
>> > Entries
>> > >> > to
>> > >> > > 5 and Max Bin Age to 10 seconds.
>> > >> > >
>> > >> > > I have a simple flow with ListenUDP -> MergeContent ->
>> > >> > PutSolrContentStream
>> > >> > > (from the pull request). If I send a bunch of json documents over
>> > UDP,
>> > >> > most
>> > >> > > of them will merge correctly, but I'll see a couple where the
>> > >> demarcator
>> > >> > > didn't get inserted between two json documents.
>> > >> > >
>> > >> > > Any thoughts as to why this would happen?
>> > >> > >
>> > >> > > I added a significant amount of logging to the
>> > >> getDescriptorFileContent()
>> > >> > > method in MergeContent to see if there was a reason why it would
>> > return
>> > >> > > null for the demarcator, but nothing obvious is really jumping out
>> > at
>> > >> me.
>> > >> >
>> > >>
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: MergeContent Demarcator Question

Joe Witt
Syslog: https://issues.apache.org/jira/browse/NIFI-274

UDP Update: https://issues.apache.org/jira/browse/NIFI-548

On Sat, Apr 25, 2015 at 9:15 PM, Joe Witt <[hidden email]> wrote:

> Roger that.  And I think adding a property to listen udp to treat
> datagrams as flowfiles rather than a set of datagrams as a flowfile
> would be very doable.
>
> On Sat, Apr 25, 2015 at 9:12 PM, Joey Echeverria <[hidden email]> wrote:
>> A syslog processor would be useful for log aggregation. I'm pretty sure
>> that log4j, etc. have native syslog appenders.
>>
>> -Joey
>> On Sat, Apr 25, 2015 at 12:13 Bryan Bende <[hidden email]> wrote:
>>
>>> Joe,
>>>
>>> Thanks for the background on ListenUDP.
>>>
>>> The use case I was thinking of was log aggregation... most logging
>>> frameworks like logback, log4j, etc., have a UDP appender, and they also
>>> generally have a json format/layout that conforms with the "logstash"
>>> format. I was thinking it would be cool to be able to use NiFi as an
>>> alternative to logstash, flume, and whatever other technologies are being
>>> used to get logs into a central location. There are obviously other options
>>> besides udp, but it seemed easy and well supported.
>>>
>>> Maybe a property on the processor could control whether or not it buffered
>>> datagrams vs producing a new FlowFile for each datagram?
>>>
>>> -Bryan
>>>
>>>
>>>
>>> On Fri, Apr 24, 2015 at 8:45 PM, Joe Witt <[hidden email]> wrote:
>>>
>>> > Mike Moser: Great thinking!
>>> >
>>> > Bryan
>>> >
>>> > Taken from listen udp docs:  "This processor listens for Datagram
>>> > Packets on a given port and concatenates the contents of those packets
>>> > together generating flow files roughly as often as the internal buffer
>>> > fills up or until no more data is currently available."
>>> >
>>> > Quite honestly when this processor was originally built NiFi didn't
>>> > have the ability to do the sort of fancy 'slab allocation' mechanism
>>> > it supports today when generating a stream of flow files.  So we could
>>> > probably pretty easily reimplement this to behave more like you were
>>> > thinking it should.  But it is probably worth a bit of
>>> > discussion/exploration to see what makes sense.  The case we built it
>>> > for was data arriving in UDP packets and it was structured in such a
>>> > way that simple binary concatenation was sufficient because the data
>>> > was inherently demarcatable/stream processing friendly.  We could,
>>> > however, implement it now such that each UDP datagram becomes a flow
>>> > file.  But not sure that makes sense either.  This is sort of the
>>> > inherent challenge of providing a raw socket listener.  If the 'thing'
>>> > being exchanged is not clear then we're not sure what the boundary of
>>> > a given flow file should be.
>>> >
>>> > I'll stop rambling: Please if you would describe the use case a bit
>>> > more we can think about whether providing a mode of 'datagram =
>>> > flowfile' makes sense.
>>> >
>>> > Thanks!
>>> > Joe
>>> >
>>> > On Fri, Apr 24, 2015 at 7:44 PM, Bryan Bende <[hidden email]> wrote:
>>> > > Thanks for the suggestions... looks like it is in fact coming out of
>>> > > ListenUDP like that. I'll try to figure out if this is expected
>>> behavior,
>>> > > or possibly something with how the messages are being sent.
>>> > >
>>> > > Sorry for the false alarm about MergeContent.
>>> > >
>>> > > On Fri, Apr 24, 2015 at 9:48 AM, Michael Moser <[hidden email]>
>>> > wrote:
>>> > >
>>> > >> At first glance, I would suspect ListenUDP is placing more than one
>>> UDP
>>> > >> datagram into one flowfile.  It might be worth spending some time
>>> > checking
>>> > >> if that can happen.
>>> > >>
>>> > >> -- Mike
>>> > >>
>>> > >>
>>> > >> On Thu, Apr 23, 2015 at 9:35 PM, Joe Witt <[hidden email]> wrote:
>>> > >>
>>> > >> > Are you sure you're not sending the [ , ] over UDP as well ;-)
>>> > >> >
>>> > >> > Can you create a template of your flow and send it over?  Perhaps
>>> just
>>> > >> > attach to a JIRA for this.  MergeContent is a powerful and useful
>>> > >> > thing so if you're seeing funky behavior we want to sort it out
>>> > >> > quickly.
>>> > >> >
>>> > >> > On Thu, Apr 23, 2015 at 8:47 PM, Bryan Bende <[hidden email]>
>>> > wrote:
>>> > >> > > I'm trying to use MergeContent to merge json documents. I have the
>>> > >> > Header.
>>> > >> > > Demarcator, and Footer properties pointing to files with [ , ]
>>> > >> > > respectively. I left all other properties the same, and set Max
>>> > Entries
>>> > >> > to
>>> > >> > > 5 and Max Bin Age to 10 seconds.
>>> > >> > >
>>> > >> > > I have a simple flow with ListenUDP -> MergeContent ->
>>> > >> > PutSolrContentStream
>>> > >> > > (from the pull request). If I send a bunch of json documents over
>>> > UDP,
>>> > >> > most
>>> > >> > > of them will merge correctly, but I'll see a couple where the
>>> > >> demarcator
>>> > >> > > didn't get inserted between two json documents.
>>> > >> > >
>>> > >> > > Any thoughts as to why this would happen?
>>> > >> > >
>>> > >> > > I added a significant amount of logging to the
>>> > >> getDescriptorFileContent()
>>> > >> > > method in MergeContent to see if there was a reason why it would
>>> > return
>>> > >> > > null for the demarcator, but nothing obvious is really jumping out
>>> > at
>>> > >> me.
>>> > >> >
>>> > >>
>>> >
>>>