determine if instance is a Cluster Coordinator or Primary Node

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

determine if instance is a Cluster Coordinator or Primary Node

Mark Bean
Is there a way to get access to Cluster configuration state? Specifically,
can a Node determine which Node - or simply "itself" - is the Cluster
Coordinator or the Primary Node?

Use case: I have a custom authorizer which includes a background thread to
re-authorize users and policies in case a user's credentials have changed.
This thread can potentially change authorizations.xml and users.xml files
which are kept in sync with ZooKeeper. I do not want each Node to execute
the process making the same changes. It would be desirable to execute this
process on only one Node (Coordinator or Primary) and let ZooKeeper
coordinate the changes across the Cluster.

Thanks,
Mark
Reply | Threaded
Open this post in threaded view
|

Re: determine if instance is a Cluster Coordinator or Primary Node

Bryan Bende
Mark,

I don't believe there is currently anything like this in Authorizer API.

You would likely have to build something similar to what processors have...

In ProcessorInitializationContext they get access to a NodeType which
tells them if they are currently primary or not.

Then they can annotate a method with @PrimaryNodeStateChange to get
notified when primary node changes.

-Bryan



On Tue, Aug 29, 2017 at 8:08 AM, Mark Bean <[hidden email]> wrote:

> Is there a way to get access to Cluster configuration state? Specifically,
> can a Node determine which Node - or simply "itself" - is the Cluster
> Coordinator or the Primary Node?
>
> Use case: I have a custom authorizer which includes a background thread to
> re-authorize users and policies in case a user's credentials have changed.
> This thread can potentially change authorizations.xml and users.xml files
> which are kept in sync with ZooKeeper. I do not want each Node to execute
> the process making the same changes. It would be desirable to execute this
> process on only one Node (Coordinator or Primary) and let ZooKeeper
> coordinate the changes across the Cluster.
>
> Thanks,
> Mark
Reply | Threaded
Open this post in threaded view
|

Re: determine if instance is a Cluster Coordinator or Primary Node

Matt Gilman
In reply to this post by Mark Bean
Mark,

I think you have a couple options. First, just to provide a little more
detail for the basics of NiFi clustering with regards to
users/groups/policies. If you want to support _configurable_
users/groups/policies in NiFi UI then consistency is required. For
instance, if an admin wants to update a user/group/policies then all nodes
must have that same user/group/policies. This consistency is enforced when
a node joins a cluster by invoking inheritFingerprint(...). This is
essentially the same for all components in the dataflow as well.

Option 1 - Externalize user/group/policy management (which sounds like what
you're trying background thread). From a NiFi perspective, you can still
have users/groups/policies in the NiFi UI but they will not be editable.
When merging node responses to return to the UI only an intersection is
returned. Additionally, access decisions are performed considering the
responses of all nodes (via a two phase commit).

Option 2 - Leave user/group/policy management in NiFi. However, use the
results of your background thread only within authorize(...) calls in your
Authorizer. These results are supplemental to the policies that are managed
in NiFi. In this scenario, the users/groups/policies remain consistent and
access decisions are accurate even if one of the nodes receives an update
in your background thread before the others.

Matt

On Tue, Aug 29, 2017 at 8:08 AM, Mark Bean <[hidden email]> wrote:

> Is there a way to get access to Cluster configuration state? Specifically,
> can a Node determine which Node - or simply "itself" - is the Cluster
> Coordinator or the Primary Node?
>
> Use case: I have a custom authorizer which includes a background thread to
> re-authorize users and policies in case a user's credentials have changed.
> This thread can potentially change authorizations.xml and users.xml files
> which are kept in sync with ZooKeeper. I do not want each Node to execute
> the process making the same changes. It would be desirable to execute this
> process on only one Node (Coordinator or Primary) and let ZooKeeper
> coordinate the changes across the Cluster.
>
> Thanks,
> Mark
>
Reply | Threaded
Open this post in threaded view
|

Re: determine if instance is a Cluster Coordinator or Primary Node

Mark Bean
In reply to this post by Bryan Bende
Bryan,

I'm not sure building something similar to the
ProcessorInitializationContext is possible without changes to the framework
itself. The framework is responsible for instantiating the initialization
context - in both processor and authorizer. However, the
AuthorizationInitializationContext is outside the context of the flow -
where cluster/nodes have meaning. On the other hand, the
ProcessorInitializationContext is instantiated from the FlowController
where the cluster/nodes do have meaning.

-Mark

On Tue, Aug 29, 2017 at 9:24 AM, Bryan Bende <[hidden email]> wrote:

> Mark,
>
> I don't believe there is currently anything like this in Authorizer API.
>
> You would likely have to build something similar to what processors have...
>
> In ProcessorInitializationContext they get access to a NodeType which
> tells them if they are currently primary or not.
>
> Then they can annotate a method with @PrimaryNodeStateChange to get
> notified when primary node changes.
>
> -Bryan
>
>
>
> On Tue, Aug 29, 2017 at 8:08 AM, Mark Bean <[hidden email]> wrote:
> > Is there a way to get access to Cluster configuration state?
> Specifically,
> > can a Node determine which Node - or simply "itself" - is the Cluster
> > Coordinator or the Primary Node?
> >
> > Use case: I have a custom authorizer which includes a background thread
> to
> > re-authorize users and policies in case a user's credentials have
> changed.
> > This thread can potentially change authorizations.xml and users.xml files
> > which are kept in sync with ZooKeeper. I do not want each Node to execute
> > the process making the same changes. It would be desirable to execute
> this
> > process on only one Node (Coordinator or Primary) and let ZooKeeper
> > coordinate the changes across the Cluster.
> >
> > Thanks,
> > Mark
>
Reply | Threaded
Open this post in threaded view
|

Re: determine if instance is a Cluster Coordinator or Primary Node

Mark Bean
In reply to this post by Matt Gilman
Matt,

Option 2 is definitely the way we're headed. The custom authorizer utilizes
the file-based authorizer; it provides supplemental authorization when
adding a user to a policy, for example. However, it relies on the
authorizations.xml and users.xml files being correct when making
policy-based decisions such as "is the user allowed to view the flow?"

If I understand your description correctly, you are suggesting placing the
functionality I've described of the background thread in the authorize()
method itself. This is not desirable because the authorization process can
take a (relatively) long time. This is why a background thread runs
periodically which keeps the authorizations.xml and users.xml files up to
date for users previously added to the instance. Then, the authorize method
can rely on the local files rather than a slow, external service.

Now, maybe you were suggesting kicking off the update as a background
thread from the authorize() method. This solves the in-line problem of slow
completion time. However, now updating of authorizations is based on a call
to the authorize method rather than a time-based approach. Also, it would
still be possible for the authorize method - and therefore update thread -
to be executed from multiple Nodes at the same time.

Let me ask a different question. Let's suppose the background thread
executes at the same time on multiple Nodes. As each Node (potentially)
changes the authorizations.xml file, ZooKeeper will synchronize the changes
across all Nodes. Correct? What happens if Node 1 and Node 2 both have
changed the file differently at the point in time when ZK attempts to
synchronize. Regardless of consistency, it seems accuracy is at risk in
such a situation. Hence, it is desirable to perform any changes to these
files on only one Node. Either the Coordinator and Primary Node satisfy
this requirement of guarantying there is only one.

-Mark



On Tue, Aug 29, 2017 at 9:50 AM, Matt Gilman <[hidden email]>
wrote:

> Mark,
>
> I think you have a couple options. First, just to provide a little more
> detail for the basics of NiFi clustering with regards to
> users/groups/policies. If you want to support _configurable_
> users/groups/policies in NiFi UI then consistency is required. For
> instance, if an admin wants to update a user/group/policies then all nodes
> must have that same user/group/policies. This consistency is enforced when
> a node joins a cluster by invoking inheritFingerprint(...). This is
> essentially the same for all components in the dataflow as well.
>
> Option 1 - Externalize user/group/policy management (which sounds like what
> you're trying background thread). From a NiFi perspective, you can still
> have users/groups/policies in the NiFi UI but they will not be editable.
> When merging node responses to return to the UI only an intersection is
> returned. Additionally, access decisions are performed considering the
> responses of all nodes (via a two phase commit).
>
> Option 2 - Leave user/group/policy management in NiFi. However, use the
> results of your background thread only within authorize(...) calls in your
> Authorizer. These results are supplemental to the policies that are managed
> in NiFi. In this scenario, the users/groups/policies remain consistent and
> access decisions are accurate even if one of the nodes receives an update
> in your background thread before the others.
>
> Matt
>
> On Tue, Aug 29, 2017 at 8:08 AM, Mark Bean <[hidden email]> wrote:
>
> > Is there a way to get access to Cluster configuration state?
> Specifically,
> > can a Node determine which Node - or simply "itself" - is the Cluster
> > Coordinator or the Primary Node?
> >
> > Use case: I have a custom authorizer which includes a background thread
> to
> > re-authorize users and policies in case a user's credentials have
> changed.
> > This thread can potentially change authorizations.xml and users.xml files
> > which are kept in sync with ZooKeeper. I do not want each Node to execute
> > the process making the same changes. It would be desirable to execute
> this
> > process on only one Node (Coordinator or Primary) and let ZooKeeper
> > coordinate the changes across the Cluster.
> >
> > Thanks,
> > Mark
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: determine if instance is a Cluster Coordinator or Primary Node

Matt Gilman
Mark,

My suggestion was that the Authoirzer.authorize(...) call utilizes the
results of the background thread.

Your custom authorizer does not utilize the file-based authorizer. I
believe your authorizer utilizes a file-access-policy-provider and/or a
file-user-group-provider. The supplemental authorization is in addition to
the default policy based access check. Authorization occurs when a user
attempts some action for some resource. In the example that you described
when a user is added to a policy, the authorization is whether the admin
performing the request is allowed to modify the policy for 'viewing the
UI'. Whether that user is allowed to 'view the flow' will be authorized
when that user attempts to actually open the UI. This is where the
supplemental authorization goes for that user.

Your understanding of NiFi's clustering model is not correct. When a node
joins the cluster it will go through an inheritance step that ensures that
this node's view of the users/groups/policies is consistent with the rest
of the cluster. From this point forth, all changes are applied to each
node. This is also how the flows remain consistent across the cluster. This
is why consistency is a requirement when you need to support configurable
users/groups/policies. If you externalize this configuration, NiFi will
support inconsistent users/groups/policies by taking an intersection of
those and considering the authorization results of all nodes using a two
phase commit.

Matt

On Tue, Aug 29, 2017 at 11:06 AM, Mark Bean <[hidden email]> wrote:

> Matt,
>
> Option 2 is definitely the way we're headed. The custom authorizer utilizes
> the file-based authorizer; it provides supplemental authorization when
> adding a user to a policy, for example. However, it relies on the
> authorizations.xml and users.xml files being correct when making
> policy-based decisions such as "is the user allowed to view the flow?"
>
> If I understand your description correctly, you are suggesting placing the
> functionality I've described of the background thread in the authorize()
> method itself. This is not desirable because the authorization process can
> take a (relatively) long time. This is why a background thread runs
> periodically which keeps the authorizations.xml and users.xml files up to
> date for users previously added to the instance. Then, the authorize method
> can rely on the local files rather than a slow, external service.
>
> Now, maybe you were suggesting kicking off the update as a background
> thread from the authorize() method. This solves the in-line problem of slow
> completion time. However, now updating of authorizations is based on a call
> to the authorize method rather than a time-based approach. Also, it would
> still be possible for the authorize method - and therefore update thread -
> to be executed from multiple Nodes at the same time.
>
> Let me ask a different question. Let's suppose the background thread
> executes at the same time on multiple Nodes. As each Node (potentially)
> changes the authorizations.xml file, ZooKeeper will synchronize the changes
> across all Nodes. Correct? What happens if Node 1 and Node 2 both have
> changed the file differently at the point in time when ZK attempts to
> synchronize. Regardless of consistency, it seems accuracy is at risk in
> such a situation. Hence, it is desirable to perform any changes to these
> files on only one Node. Either the Coordinator and Primary Node satisfy
> this requirement of guarantying there is only one.
>
> -Mark
>
>
>
> On Tue, Aug 29, 2017 at 9:50 AM, Matt Gilman <[hidden email]>
> wrote:
>
> > Mark,
> >
> > I think you have a couple options. First, just to provide a little more
> > detail for the basics of NiFi clustering with regards to
> > users/groups/policies. If you want to support _configurable_
> > users/groups/policies in NiFi UI then consistency is required. For
> > instance, if an admin wants to update a user/group/policies then all
> nodes
> > must have that same user/group/policies. This consistency is enforced
> when
> > a node joins a cluster by invoking inheritFingerprint(...). This is
> > essentially the same for all components in the dataflow as well.
> >
> > Option 1 - Externalize user/group/policy management (which sounds like
> what
> > you're trying background thread). From a NiFi perspective, you can still
> > have users/groups/policies in the NiFi UI but they will not be editable.
> > When merging node responses to return to the UI only an intersection is
> > returned. Additionally, access decisions are performed considering the
> > responses of all nodes (via a two phase commit).
> >
> > Option 2 - Leave user/group/policy management in NiFi. However, use the
> > results of your background thread only within authorize(...) calls in
> your
> > Authorizer. These results are supplemental to the policies that are
> managed
> > in NiFi. In this scenario, the users/groups/policies remain consistent
> and
> > access decisions are accurate even if one of the nodes receives an update
> > in your background thread before the others.
> >
> > Matt
> >
> > On Tue, Aug 29, 2017 at 8:08 AM, Mark Bean <[hidden email]>
> wrote:
> >
> > > Is there a way to get access to Cluster configuration state?
> > Specifically,
> > > can a Node determine which Node - or simply "itself" - is the Cluster
> > > Coordinator or the Primary Node?
> > >
> > > Use case: I have a custom authorizer which includes a background thread
> > to
> > > re-authorize users and policies in case a user's credentials have
> > changed.
> > > This thread can potentially change authorizations.xml and users.xml
> files
> > > which are kept in sync with ZooKeeper. I do not want each Node to
> execute
> > > the process making the same changes. It would be desirable to execute
> > this
> > > process on only one Node (Coordinator or Primary) and let ZooKeeper
> > > coordinate the changes across the Cluster.
> > >
> > > Thanks,
> > > Mark
> > >
> >
>