Site to Site not working within process groups

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Site to Site not working within process groups

Ricky Saltzer
I just want to make sure this is a supported feature before I open a JIRA. It appears as if I can't create a Site-to-Site connection within a process group. It's easier to explain visually (see below). Any help would be appreciated, thanks!

Top LevelĀ (works):



Inside Process GroupĀ (remote doesn't work)




--
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Sean Busbey
ASF mailing lists strip attachements. Can you post the images in a pastebin?

On Fri, Apr 3, 2015 at 4:23 PM, Ricky Saltzer <[hidden email]> wrote:

> I just want to make sure this is a supported feature before I open a JIRA.
> It appears as if I can't create a Site-to-Site connection within a process
> group. It's easier to explain visually (see below). Any help would be
> appreciated, thanks!
>
> *Top Level *(works):
>
> [image: Inline image 1]
>
>
> *Inside Process Group *(remote doesn't work)
>
> [image: Inline image 1]
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
>
>


--
Sean
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Ricky Saltzer
Thanks for the heads up!

*Top Level *(works):
https://s3.amazonaws.com/uploads.hipchat.com/108018/846877/Q5x6A0VikUraEpS/upload.png

*Inside Process Group *(remote doesn't work):
https://s3.amazonaws.com/uploads.hipchat.com/108018/846877/iSpaYoUmGRq7dIz/upload.png


On Fri, Apr 3, 2015 at 5:31 PM, Sean Busbey <[hidden email]> wrote:

> ASF mailing lists strip attachements. Can you post the images in a
> pastebin?
>
> On Fri, Apr 3, 2015 at 4:23 PM, Ricky Saltzer <[hidden email]> wrote:
>
> > I just want to make sure this is a supported feature before I open a
> JIRA.
> > It appears as if I can't create a Site-to-Site connection within a
> process
> > group. It's easier to explain visually (see below). Any help would be
> > appreciated, thanks!
> >
> > *Top Level *(works):
> >
> > [image: Inline image 1]
> >
> >
> > *Inside Process Group *(remote doesn't work)
> >
> > [image: Inline image 1]
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
> >
> >
>
>
> --
> Sean
>



--
Ricky Saltzer
http://www.cloudera.com
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Matt Gilman
Ricky,

What your seeing is by design. While the approach can be limiting, especially if your looking to expose an [in|out]put port remotely in a sub group, it was done for consistency and simplicity. Groups can have input and output ports. This facilitates data flow into and out of the groups. When this is done at the root level, it allows us to abstract a NiFi instance as a group to a remote NiFi.

Matt Gilman

Sent from my iPhone

> On Apr 3, 2015, at 5:41 PM, Ricky Saltzer <[hidden email]> wrote:
>
> Thanks for the heads up!
>
> *Top Level *(works):
> https://s3.amazonaws.com/uploads.hipchat.com/108018/846877/Q5x6A0VikUraEpS/upload.png
>
> *Inside Process Group *(remote doesn't work):
> https://s3.amazonaws.com/uploads.hipchat.com/108018/846877/iSpaYoUmGRq7dIz/upload.png
>
>
>> On Fri, Apr 3, 2015 at 5:31 PM, Sean Busbey <[hidden email]> wrote:
>>
>> ASF mailing lists strip attachements. Can you post the images in a
>> pastebin?
>>
>>> On Fri, Apr 3, 2015 at 4:23 PM, Ricky Saltzer <[hidden email]> wrote:
>>>
>>> I just want to make sure this is a supported feature before I open a
>> JIRA.
>>> It appears as if I can't create a Site-to-Site connection within a
>> process
>>> group. It's easier to explain visually (see below). Any help would be
>>> appreciated, thanks!
>>>
>>> *Top Level *(works):
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> *Inside Process Group *(remote doesn't work)
>>>
>>> [image: Inline image 1]
>>>
>>>
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>>
>>
>> --
>> Sean
>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Ryan Blue
On 04/03/2015 03:36 PM, Matt Gilman wrote:
> Ricky,
>
> What your seeing is by design. While the approach can be limiting, especially if your looking to expose an [in|out]put port remotely in a sub group, it was done for consistency and simplicity. Groups can have input and output ports. This facilitates data flow into and out of the groups. When this is done at the root level, it allows us to abstract a NiFi instance as a group to a remote NiFi.
>
> Matt Gilman

I think the problem is that sometimes you want a process group to be
able to use the trick where you send to a local "remote" input port to
load balance. It would be great to be able to hide that detail within a
process group, but the reuse of ports for both purposes prevents it.

Could we add an option to select whether the port is for a process group
or should listen for remote connections? That seems like an easy way to
solve the problem, though I think adding an option to load balance a
connection in cluster mode would solve the problem more cleanly. But
that would be more work, right?

rb


--
Ryan Blue
Software Engineer
Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Mark Payne
Ryan,

We definitely want to get to the point that we have the ability to
load-balance the data in a connection across a cluster. But yes, that is
a fairly large undertaking.

I can definitely appreciate that it would be more convenient to add a
port in the middle of a group, but I would shy away implementing
something like that as a stop-gap when the load-balanced connection is
the true desire. This stop-gap really would be quite a lot of work, as
well, and I think would introduce more confusion.

Thanks
-Mark


------ Original Message ------
From: "Ryan Blue" <[hidden email]>
To: [hidden email]
Sent: 4/6/2015 11:32:12 AM
Subject: Re: Site to Site not working within process groups

>On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>Ricky,
>>
>>What your seeing is by design. While the approach can be limiting,
>>especially if your looking to expose an [in|out]put port remotely in a
>>sub group, it was done for consistency and simplicity. Groups can have
>>input and output ports. This facilitates data flow into and out of the
>>groups. When this is done at the root level, it allows us to abstract
>>a NiFi instance as a group to a remote NiFi.
>>
>>Matt Gilman
>
>I think the problem is that sometimes you want a process group to be
>able to use the trick where you send to a local "remote" input port to
>load balance. It would be great to be able to hide that detail within a
>process group, but the reuse of ports for both purposes prevents it.
>
>Could we add an option to select whether the port is for a process
>group or should listen for remote connections? That seems like an easy
>way to solve the problem, though I think adding an option to load
>balance a connection in cluster mode would solve the problem more
>cleanly. But that would be more work, right?
>
>rb
>
>
>-- Ryan Blue
>Software Engineer
>Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Joe Witt
mark

just spitballin' here.  Why would it be such a large amount of work?
My initial thought here is that you allow the user to indicate whether
a given connection should auto-load-balance at which point nifi would
simply create an implicit site-to-site connection.  It would need to
be smart enough to not transfer data to the same node data is coming
from and to avoid too much rebalancing and such.

Maybe what i mean to say is how much time do you think it would take roughly?

Thanks
Joe

On Thu, Apr 9, 2015 at 5:16 PM, Mark Payne <[hidden email]> wrote:

> Ryan,
>
> We definitely want to get to the point that we have the ability to
> load-balance the data in a connection across a cluster. But yes, that is a
> fairly large undertaking.
>
> I can definitely appreciate that it would be more convenient to add a port
> in the middle of a group, but I would shy away implementing something like
> that as a stop-gap when the load-balanced connection is the true desire.
> This stop-gap really would be quite a lot of work, as well, and I think
> would introduce more confusion.
>
> Thanks
> -Mark
>
>
>
> ------ Original Message ------
> From: "Ryan Blue" <[hidden email]>
> To: [hidden email]
> Sent: 4/6/2015 11:32:12 AM
> Subject: Re: Site to Site not working within process groups
>
>> On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>>
>>> Ricky,
>>>
>>> What your seeing is by design. While the approach can be limiting,
>>> especially if your looking to expose an [in|out]put port remotely in a sub
>>> group, it was done for consistency and simplicity. Groups can have input and
>>> output ports. This facilitates data flow into and out of the groups. When
>>> this is done at the root level, it allows us to abstract a NiFi instance as
>>> a group to a remote NiFi.
>>>
>>> Matt Gilman
>>
>>
>> I think the problem is that sometimes you want a process group to be able
>> to use the trick where you send to a local "remote" input port to load
>> balance. It would be great to be able to hide that detail within a process
>> group, but the reuse of ports for both purposes prevents it.
>>
>> Could we add an option to select whether the port is for a process group
>> or should listen for remote connections? That seems like an easy way to
>> solve the problem, though I think adding an option to load balance a
>> connection in cluster mode would solve the problem more cleanly. But that
>> would be more work, right?
>>
>> rb
>>
>>
>> -- Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Mark Payne
Joe,

So there are a few pieces to the puzzle:

* Deciding when to push data around - this is not trivial. Likely
implemented on NCM
* Deciding where to push the data - not trivial either but possibly
easier
* Building the components to push the data and the component to receive
the data
* Modifying the site-to-site protocol to allow pushing data to a
connection rather than a port... this is reasonably easy but requires
very significant testing.
* Updating Node to receive a "rebalance" command from NCM and initiate
it to happen
* Updating UI, data model, Connection objects to support setting the
flag

I'd estimate at least a week of full-time work to get this done.

------ Original Message ------
From: "Joe Witt" <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: 4/9/2015 9:00:56 PM
Subject: Re: Site to Site not working within process groups

>mark
>
>just spitballin' here. Why would it be such a large amount of work?
>My initial thought here is that you allow the user to indicate whether
>a given connection should auto-load-balance at which point nifi would
>simply create an implicit site-to-site connection. It would need to
>be smart enough to not transfer data to the same node data is coming
>from and to avoid too much rebalancing and such.
>
>Maybe what i mean to say is how much time do you think it would take
>roughly?
>
>Thanks
>Joe
>
>On Thu, Apr 9, 2015 at 5:16 PM, Mark Payne <[hidden email]>
>wrote:
>>  Ryan,
>>
>>  We definitely want to get to the point that we have the ability to
>>  load-balance the data in a connection across a cluster. But yes, that
>>is a
>>  fairly large undertaking.
>>
>>  I can definitely appreciate that it would be more convenient to add a
>>port
>>  in the middle of a group, but I would shy away implementing something
>>like
>>  that as a stop-gap when the load-balanced connection is the true
>>desire.
>>  This stop-gap really would be quite a lot of work, as well, and I
>>think
>>  would introduce more confusion.
>>
>>  Thanks
>>  -Mark
>>
>>
>>
>>  ------ Original Message ------
>>  From: "Ryan Blue" <[hidden email]>
>>  To: [hidden email]
>>  Sent: 4/6/2015 11:32:12 AM
>>  Subject: Re: Site to Site not working within process groups
>>
>>>  On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>>>
>>>>  Ricky,
>>>>
>>>>  What your seeing is by design. While the approach can be limiting,
>>>>  especially if your looking to expose an [in|out]put port remotely
>>>>in a sub
>>>>  group, it was done for consistency and simplicity. Groups can have
>>>>input and
>>>>  output ports. This facilitates data flow into and out of the
>>>>groups. When
>>>>  this is done at the root level, it allows us to abstract a NiFi
>>>>instance as
>>>>  a group to a remote NiFi.
>>>>
>>>>  Matt Gilman
>>>
>>>
>>>  I think the problem is that sometimes you want a process group to be
>>>able
>>>  to use the trick where you send to a local "remote" input port to
>>>load
>>>  balance. It would be great to be able to hide that detail within a
>>>process
>>>  group, but the reuse of ports for both purposes prevents it.
>>>
>>>  Could we add an option to select whether the port is for a process
>>>group
>>>  or should listen for remote connections? That seems like an easy way
>>>to
>>>  solve the problem, though I think adding an option to load balance a
>>>  connection in cluster mode would solve the problem more cleanly. But
>>>that
>>>  would be more work, right?
>>>
>>>  rb
>>>
>>>
>>>  -- Ryan Blue
>>>  Software Engineer
>>>  Cloudera, Inc.
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Adam Taft
Why does this have to be on a "connection?"  In my mind, the solution here
is simply a new UI element that behaves like a site-to-site remote process
group, but automates/hides all the configuration parameters.  From the
backend's point of view, nothing would have to change, since site-to-site
already works. The UI element should look basically like a processor box.

The only possible change to the backend might be to the node selection
algorithm, if you wanted to exclude the current node from receiving the
flowfile in question.  In my mind, though, this might be a misfeature.  If
the current node is loaded more lightly than the other nodes, it's better
to keep the flowfile on it and continue processing.

For efficiencies sake, it might be nice to have a configuration threshold
for this element that won't attempt cluster redistribution if the current
node is not overly loaded. Let the DFM decide at what point to start
pushing files to other nodes, since the overhead for doing so is heavier
than keeping the file local.

Two cents,

Adam




On Fri, Apr 10, 2015 at 8:18 AM, Mark Payne <[hidden email]> wrote:

> Joe,
>
> So there are a few pieces to the puzzle:
>
> * Deciding when to push data around - this is not trivial. Likely
> implemented on NCM
> * Deciding where to push the data - not trivial either but possibly easier
> * Building the components to push the data and the component to receive
> the data
> * Modifying the site-to-site protocol to allow pushing data to a
> connection rather than a port... this is reasonably easy but requires very
> significant testing.
> * Updating Node to receive a "rebalance" command from NCM and initiate it
> to happen
> * Updating UI, data model, Connection objects to support setting the flag
>
> I'd estimate at least a week of full-time work to get this done.
>
>
> ------ Original Message ------
> From: "Joe Witt" <[hidden email]>
> To: "[hidden email]" <[hidden email]>
> Sent: 4/9/2015 9:00:56 PM
> Subject: Re: Site to Site not working within process groups
>
>  mark
>>
>> just spitballin' here. Why would it be such a large amount of work?
>> My initial thought here is that you allow the user to indicate whether
>> a given connection should auto-load-balance at which point nifi would
>> simply create an implicit site-to-site connection. It would need to
>> be smart enough to not transfer data to the same node data is coming
>> from and to avoid too much rebalancing and such.
>>
>> Maybe what i mean to say is how much time do you think it would take
>> roughly?
>>
>> Thanks
>> Joe
>>
>> On Thu, Apr 9, 2015 at 5:16 PM, Mark Payne <[hidden email]> wrote:
>>
>>>  Ryan,
>>>
>>>  We definitely want to get to the point that we have the ability to
>>>  load-balance the data in a connection across a cluster. But yes, that
>>> is a
>>>  fairly large undertaking.
>>>
>>>  I can definitely appreciate that it would be more convenient to add a
>>> port
>>>  in the middle of a group, but I would shy away implementing something
>>> like
>>>  that as a stop-gap when the load-balanced connection is the true desire.
>>>  This stop-gap really would be quite a lot of work, as well, and I think
>>>  would introduce more confusion.
>>>
>>>  Thanks
>>>  -Mark
>>>
>>>
>>>
>>>  ------ Original Message ------
>>>  From: "Ryan Blue" <[hidden email]>
>>>  To: [hidden email]
>>>  Sent: 4/6/2015 11:32:12 AM
>>>  Subject: Re: Site to Site not working within process groups
>>>
>>>   On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>>>
>>>>>
>>>>>  Ricky,
>>>>>
>>>>>  What your seeing is by design. While the approach can be limiting,
>>>>>  especially if your looking to expose an [in|out]put port remotely in
>>>>> a sub
>>>>>  group, it was done for consistency and simplicity. Groups can have
>>>>> input and
>>>>>  output ports. This facilitates data flow into and out of the groups.
>>>>> When
>>>>>  this is done at the root level, it allows us to abstract a NiFi
>>>>> instance as
>>>>>  a group to a remote NiFi.
>>>>>
>>>>>  Matt Gilman
>>>>>
>>>>
>>>>
>>>>  I think the problem is that sometimes you want a process group to be
>>>> able
>>>>  to use the trick where you send to a local "remote" input port to load
>>>>  balance. It would be great to be able to hide that detail within a
>>>> process
>>>>  group, but the reuse of ports for both purposes prevents it.
>>>>
>>>>  Could we add an option to select whether the port is for a process
>>>> group
>>>>  or should listen for remote connections? That seems like an easy way to
>>>>  solve the problem, though I think adding an option to load balance a
>>>>  connection in cluster mode would solve the problem more cleanly. But
>>>> that
>>>>  would be more work, right?
>>>>
>>>>  rb
>>>>
>>>>
>>>>  -- Ryan Blue
>>>>  Software Engineer
>>>>  Cloudera, Inc.
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Adam Taft
One point of clarification.  When I wrote, "the solution here is simply a
new UI element", I didn't mean to imply the work itself was trivial.  I was
just more voting for using UI elements that are already familiar to the
DFM, like a processor box that explicitly says, "load balance here."

I think the principle of making this work without touching the site-to-site
code should be close to achievable. That would probably be ideal from the
KISS perspective.

Adam


On Fri, Apr 10, 2015 at 9:07 AM, Adam Taft <[hidden email]> wrote:

> Why does this have to be on a "connection?"  In my mind, the solution here
> is simply a new UI element that behaves like a site-to-site remote process
> group, but automates/hides all the configuration parameters.  From the
> backend's point of view, nothing would have to change, since site-to-site
> already works. The UI element should look basically like a processor box.
>
> The only possible change to the backend might be to the node selection
> algorithm, if you wanted to exclude the current node from receiving the
> flowfile in question.  In my mind, though, this might be a misfeature.  If
> the current node is loaded more lightly than the other nodes, it's better
> to keep the flowfile on it and continue processing.
>
> For efficiencies sake, it might be nice to have a configuration threshold
> for this element that won't attempt cluster redistribution if the current
> node is not overly loaded. Let the DFM decide at what point to start
> pushing files to other nodes, since the overhead for doing so is heavier
> than keeping the file local.
>
> Two cents,
>
> Adam
>
>
>
>
> On Fri, Apr 10, 2015 at 8:18 AM, Mark Payne <[hidden email]> wrote:
>
>> Joe,
>>
>> So there are a few pieces to the puzzle:
>>
>> * Deciding when to push data around - this is not trivial. Likely
>> implemented on NCM
>> * Deciding where to push the data - not trivial either but possibly easier
>> * Building the components to push the data and the component to receive
>> the data
>> * Modifying the site-to-site protocol to allow pushing data to a
>> connection rather than a port... this is reasonably easy but requires very
>> significant testing.
>> * Updating Node to receive a "rebalance" command from NCM and initiate it
>> to happen
>> * Updating UI, data model, Connection objects to support setting the flag
>>
>> I'd estimate at least a week of full-time work to get this done.
>>
>>
>> ------ Original Message ------
>> From: "Joe Witt" <[hidden email]>
>> To: "[hidden email]" <[hidden email]>
>> Sent: 4/9/2015 9:00:56 PM
>> Subject: Re: Site to Site not working within process groups
>>
>>  mark
>>>
>>> just spitballin' here. Why would it be such a large amount of work?
>>> My initial thought here is that you allow the user to indicate whether
>>> a given connection should auto-load-balance at which point nifi would
>>> simply create an implicit site-to-site connection. It would need to
>>> be smart enough to not transfer data to the same node data is coming
>>> from and to avoid too much rebalancing and such.
>>>
>>> Maybe what i mean to say is how much time do you think it would take
>>> roughly?
>>>
>>> Thanks
>>> Joe
>>>
>>> On Thu, Apr 9, 2015 at 5:16 PM, Mark Payne <[hidden email]> wrote:
>>>
>>>>  Ryan,
>>>>
>>>>  We definitely want to get to the point that we have the ability to
>>>>  load-balance the data in a connection across a cluster. But yes, that
>>>> is a
>>>>  fairly large undertaking.
>>>>
>>>>  I can definitely appreciate that it would be more convenient to add a
>>>> port
>>>>  in the middle of a group, but I would shy away implementing something
>>>> like
>>>>  that as a stop-gap when the load-balanced connection is the true
>>>> desire.
>>>>  This stop-gap really would be quite a lot of work, as well, and I think
>>>>  would introduce more confusion.
>>>>
>>>>  Thanks
>>>>  -Mark
>>>>
>>>>
>>>>
>>>>  ------ Original Message ------
>>>>  From: "Ryan Blue" <[hidden email]>
>>>>  To: [hidden email]
>>>>  Sent: 4/6/2015 11:32:12 AM
>>>>  Subject: Re: Site to Site not working within process groups
>>>>
>>>>   On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>>>>
>>>>>>
>>>>>>  Ricky,
>>>>>>
>>>>>>  What your seeing is by design. While the approach can be limiting,
>>>>>>  especially if your looking to expose an [in|out]put port remotely in
>>>>>> a sub
>>>>>>  group, it was done for consistency and simplicity. Groups can have
>>>>>> input and
>>>>>>  output ports. This facilitates data flow into and out of the groups.
>>>>>> When
>>>>>>  this is done at the root level, it allows us to abstract a NiFi
>>>>>> instance as
>>>>>>  a group to a remote NiFi.
>>>>>>
>>>>>>  Matt Gilman
>>>>>>
>>>>>
>>>>>
>>>>>  I think the problem is that sometimes you want a process group to be
>>>>> able
>>>>>  to use the trick where you send to a local "remote" input port to load
>>>>>  balance. It would be great to be able to hide that detail within a
>>>>> process
>>>>>  group, but the reuse of ports for both purposes prevents it.
>>>>>
>>>>>  Could we add an option to select whether the port is for a process
>>>>> group
>>>>>  or should listen for remote connections? That seems like an easy way
>>>>> to
>>>>>  solve the problem, though I think adding an option to load balance a
>>>>>  connection in cluster mode would solve the problem more cleanly. But
>>>>> that
>>>>>  would be more work, right?
>>>>>
>>>>>  rb
>>>>>
>>>>>
>>>>>  -- Ryan Blue
>>>>>  Software Engineer
>>>>>  Cloudera, Inc.
>>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Site to Site not working within process groups

Mark Payne
In reply to this post by Adam Taft

  Adam,

Interesting - I had not thought about having a new "Load Balance"
component. I'd always imagined doing this from a connection.
It's worth thinking about. Thought I think adding it to a connection is
a lot simpler and cleaner. With a component, if you have a big
backlog in one connection, you'd have to add the component in and move
your graph around to accommodate it, and then potentially
pull it back out if it's only intended to be temporary. If on a
connection, we can mark the connection to auto-rebalance. Or allow a
user
to simply say "Don't do it automatically but rebalance right now."

Definitely, though, it has to be smart about moving data around, because
we dont want to push the data elsewhere when the node that already
has it can handle it. A simple algorithm would be something like "Node
has at least 1 GB of data and more than double the average of all
nodes."



------ Original Message ------
From: "Adam Taft" <[hidden email]>
To: [hidden email]
Sent: 4/10/2015 9:07:19 AM
Subject: Re: Site to Site not working within process groups

>Why does this have to be on a "connection?" In my mind, the solution
>here
>is simply a new UI element that behaves like a site-to-site remote
>process
>group, but automates/hides all the configuration parameters. From the
>backend's point of view, nothing would have to change, since
>site-to-site
>already works. The UI element should look basically like a processor
>box.
>
>The only possible change to the backend might be to the node selection
>algorithm, if you wanted to exclude the current node from receiving the
>flowfile in question. In my mind, though, this might be a misfeature.
>If
>the current node is loaded more lightly than the other nodes, it's
>better
>to keep the flowfile on it and continue processing.
>
>For efficiencies sake, it might be nice to have a configuration
>threshold
>for this element that won't attempt cluster redistribution if the
>current
>node is not overly loaded. Let the DFM decide at what point to start
>pushing files to other nodes, since the overhead for doing so is
>heavier
>than keeping the file local.
>
>Two cents,
>
>Adam
>
>
>
>
>On Fri, Apr 10, 2015 at 8:18 AM, Mark Payne <[hidden email]>
>wrote:
>
>>  Joe,
>>
>>  So there are a few pieces to the puzzle:
>>
>>  * Deciding when to push data around - this is not trivial. Likely
>>  implemented on NCM
>>  * Deciding where to push the data - not trivial either but possibly
>>easier
>>  * Building the components to push the data and the component to
>>receive
>>  the data
>>  * Modifying the site-to-site protocol to allow pushing data to a
>>  connection rather than a port... this is reasonably easy but requires
>>very
>>  significant testing.
>>  * Updating Node to receive a "rebalance" command from NCM and
>>initiate it
>>  to happen
>>  * Updating UI, data model, Connection objects to support setting the
>>flag
>>
>>  I'd estimate at least a week of full-time work to get this done.
>>
>>
>>  ------ Original Message ------
>>  From: "Joe Witt" <[hidden email]>
>>  To: "[hidden email]" <[hidden email]>
>>  Sent: 4/9/2015 9:00:56 PM
>>  Subject: Re: Site to Site not working within process groups
>>
>>   mark
>>>
>>>  just spitballin' here. Why would it be such a large amount of work?
>>>  My initial thought here is that you allow the user to indicate
>>>whether
>>>  a given connection should auto-load-balance at which point nifi
>>>would
>>>  simply create an implicit site-to-site connection. It would need to
>>>  be smart enough to not transfer data to the same node data is coming
>>>  from and to avoid too much rebalancing and such.
>>>
>>>  Maybe what i mean to say is how much time do you think it would take
>>>  roughly?
>>>
>>>  Thanks
>>>  Joe
>>>
>>>  On Thu, Apr 9, 2015 at 5:16 PM, Mark Payne <[hidden email]>
>>>wrote:
>>>
>>>>   Ryan,
>>>>
>>>>   We definitely want to get to the point that we have the ability to
>>>>   load-balance the data in a connection across a cluster. But yes,
>>>>that
>>>>  is a
>>>>   fairly large undertaking.
>>>>
>>>>   I can definitely appreciate that it would be more convenient to
>>>>add a
>>>>  port
>>>>   in the middle of a group, but I would shy away implementing
>>>>something
>>>>  like
>>>>   that as a stop-gap when the load-balanced connection is the true
>>>>desire.
>>>>   This stop-gap really would be quite a lot of work, as well, and I
>>>>think
>>>>   would introduce more confusion.
>>>>
>>>>   Thanks
>>>>   -Mark
>>>>
>>>>
>>>>
>>>>   ------ Original Message ------
>>>>   From: "Ryan Blue" <[hidden email]>
>>>>   To: [hidden email]
>>>>   Sent: 4/6/2015 11:32:12 AM
>>>>   Subject: Re: Site to Site not working within process groups
>>>>
>>>>    On 04/03/2015 03:36 PM, Matt Gilman wrote:
>>>>>
>>>>>>
>>>>>>   Ricky,
>>>>>>
>>>>>>   What your seeing is by design. While the approach can be
>>>>>>limiting,
>>>>>>   especially if your looking to expose an [in|out]put port
>>>>>>remotely in
>>>>>>  a sub
>>>>>>   group, it was done for consistency and simplicity. Groups can
>>>>>>have
>>>>>>  input and
>>>>>>   output ports. This facilitates data flow into and out of the
>>>>>>groups.
>>>>>>  When
>>>>>>   this is done at the root level, it allows us to abstract a NiFi
>>>>>>  instance as
>>>>>>   a group to a remote NiFi.
>>>>>>
>>>>>>   Matt Gilman
>>>>>>
>>>>>
>>>>>
>>>>>   I think the problem is that sometimes you want a process group to
>>>>>be
>>>>>  able
>>>>>   to use the trick where you send to a local "remote" input port to
>>>>>load
>>>>>   balance. It would be great to be able to hide that detail within
>>>>>a
>>>>>  process
>>>>>   group, but the reuse of ports for both purposes prevents it.
>>>>>
>>>>>   Could we add an option to select whether the port is for a
>>>>>process
>>>>>  group
>>>>>   or should listen for remote connections? That seems like an easy
>>>>>way to
>>>>>   solve the problem, though I think adding an option to load
>>>>>balance a
>>>>>   connection in cluster mode would solve the problem more cleanly.
>>>>>But
>>>>>  that
>>>>>   would be more work, right?
>>>>>
>>>>>   rb
>>>>>
>>>>>
>>>>>   -- Ryan Blue
>>>>>   Software Engineer
>>>>>   Cloudera, Inc.
>>>>>
>>>>