Contributing to Nifi

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Contributing to Nifi

Chris Lundeberg
Hi all,

 I hope this message finds everyone well. My company is starting to build a
few custom solutions using Nifi, for a few clients.  We want to be more
involved in the Nifi community and start contributing back some of the work
we have done. We have a few processors that we have created and pushed to
open repos, but would like to try and get some of them built into the base
Nifi distro, if possible.  We are doing a lot of research now to understand
what that looks like and I think are ready to start picking up and creating
Jira tickets.  My main question for this thread is with new processors; if
we have several that we think could be a good addition, is there some kind
of voting process that might help us understand which ones would actually
be of value to the greater community or is that just decided on a PR
basis?  Some of the example processors that we have created / are creating
are:

1. *EncryptValue* - Reads a list of values from an attribute and loops over
the keys within the data.  As it finds the matches, it will hash the value
based on the type that the user selects (we support all the normal ones).
2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
over the keys within the incoming data.  If it finds a match, it will
standardize the value of that key as ISO-8601.
3. *AvroBulkInsert* - We utilize the bulk insert functionality within MSSQL
to insert incoming avro files.
4. *GetColumns* - A user selects the controller service and database type
and we will fetch the columns from the database/schema.table provided and
attach as a comma separated value on an attribute or flowfile.

Any advice/suggestions would be greatly appreciated. Thanks!


Chris Lundeberg
*Modern Data Engineer / Data Engineering Practice Lead*
<https://1904labs.com/>
<https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
<https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
1904labs is proud to be a Top Workplace 2018
<https://1904labs.com/o/TopWorkplace2018>
Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Nifi

Andy LoPresto-2
Hi Chris,

Thanks for getting involved and contributing back to the community. There is no formal voting process to prioritize contributions — the Contributor Guide [1] and Developer Guide [2] have a lot of useful information around this. In general, for conversations like the one you’re asking for, an email to the list is sufficient, and anyone who feels strongly will weigh in here. Once you’ve had a discussion around this, you can prioritize your contributions, open Jira tickets for each, and create the pull requests. A committer will need to provide formal acceptance before the code can be merged, so this may require multiple rounds of comment/discussion/patching. All committers have a lot of responsibilities, so the expectations around time frame to merge may be extended right now. But opening the PRs will definitely get some community feedback, so I encourage you to do that.

To create/assign Jiras, please reply here with your username (and those of your colleagues if applicable), and I will give you the proper permissions in our Jira instance.

Personally I am most interested in the EncryptValue processor, so depending on my tasking for the next few days, that’s where I would likely focus my attention if available.


[1] https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide <https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide>
[2] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html <https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html>

Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 26, 2019, at 7:11 AM, Chris Lundeberg <[hidden email]> wrote:
>
> Hi all,
>
> I hope this message finds everyone well. My company is starting to build a
> few custom solutions using Nifi, for a few clients.  We want to be more
> involved in the Nifi community and start contributing back some of the work
> we have done. We have a few processors that we have created and pushed to
> open repos, but would like to try and get some of them built into the base
> Nifi distro, if possible.  We are doing a lot of research now to understand
> what that looks like and I think are ready to start picking up and creating
> Jira tickets.  My main question for this thread is with new processors; if
> we have several that we think could be a good addition, is there some kind
> of voting process that might help us understand which ones would actually
> be of value to the greater community or is that just decided on a PR
> basis?  Some of the example processors that we have created / are creating
> are:
>
> 1. *EncryptValue* - Reads a list of values from an attribute and loops over
> the keys within the data.  As it finds the matches, it will hash the value
> based on the type that the user selects (we support all the normal ones).
> 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
> over the keys within the incoming data.  If it finds a match, it will
> standardize the value of that key as ISO-8601.
> 3. *AvroBulkInsert* - We utilize the bulk insert functionality within MSSQL
> to insert incoming avro files.
> 4. *GetColumns* - A user selects the controller service and database type
> and we will fetch the columns from the database/schema.table provided and
> attach as a comma separated value on an attribute or flowfile.
>
> Any advice/suggestions would be greatly appreciated. Thanks!
>
>
> Chris Lundeberg
> *Modern Data Engineer / Data Engineering Practice Lead*
> <https://1904labs.com/>
> <https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
> <https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
> 1904labs is proud to be a Top Workplace 2018
> <https://1904labs.com/o/TopWorkplace2018>

Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Nifi

Chris Lundeberg
Perfect - Thank you for the information, Andy.

I will read over the docs you suggested and push forward with the advice
you gave.  I really appreciate your input.

We have two people as of now which would need the necessary Jira access (We
can log in at the moment and see all the tickets):

Myself:  clundeberg
Nathan Bruce (Co-worker):  nathan.bruce

Thanks again!

Chris Lundeberg
*Modern Data Engineer / Data Engineering Practice Lead*
<https://1904labs.com/>
<https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
<https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
1904labs is proud to be a Top Workplace 2018
<https://1904labs.com/o/TopWorkplace2018>


On Tue, Mar 26, 2019 at 10:58 AM Andy LoPresto <[hidden email]> wrote:

> Hi Chris,
>
> Thanks for getting involved and contributing back to the community. There
> is no formal voting process to prioritize contributions — the Contributor
> Guide [1] and Developer Guide [2] have a lot of useful information around
> this. In general, for conversations like the one you’re asking for, an
> email to the list is sufficient, and anyone who feels strongly will weigh
> in here. Once you’ve had a discussion around this, you can prioritize your
> contributions, open Jira tickets for each, and create the pull requests. A
> committer will need to provide formal acceptance before the code can be
> merged, so this may require multiple rounds of comment/discussion/patching.
> All committers have a lot of responsibilities, so the expectations around
> time frame to merge may be extended right now. But opening the PRs will
> definitely get some community feedback, so I encourage you to do that.
>
> To create/assign Jiras, please reply here with your username (and those of
> your colleagues if applicable), and I will give you the proper permissions
> in our Jira instance.
>
> Personally I am most interested in the EncryptValue processor, so
> depending on my tasking for the next few days, that’s where I would likely
> focus my attention if available.
>
>
> [1] https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide <
> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide>
> [2] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html <
> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html>
>
> Andy LoPresto
> [hidden email]
> [hidden email]
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Mar 26, 2019, at 7:11 AM, Chris Lundeberg <[hidden email]>
> wrote:
> >
> > Hi all,
> >
> > I hope this message finds everyone well. My company is starting to build
> a
> > few custom solutions using Nifi, for a few clients.  We want to be more
> > involved in the Nifi community and start contributing back some of the
> work
> > we have done. We have a few processors that we have created and pushed to
> > open repos, but would like to try and get some of them built into the
> base
> > Nifi distro, if possible.  We are doing a lot of research now to
> understand
> > what that looks like and I think are ready to start picking up and
> creating
> > Jira tickets.  My main question for this thread is with new processors;
> if
> > we have several that we think could be a good addition, is there some
> kind
> > of voting process that might help us understand which ones would actually
> > be of value to the greater community or is that just decided on a PR
> > basis?  Some of the example processors that we have created / are
> creating
> > are:
> >
> > 1. *EncryptValue* - Reads a list of values from an attribute and loops
> over
> > the keys within the data.  As it finds the matches, it will hash the
> value
> > based on the type that the user selects (we support all the normal ones).
> > 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
> > over the keys within the incoming data.  If it finds a match, it will
> > standardize the value of that key as ISO-8601.
> > 3. *AvroBulkInsert* - We utilize the bulk insert functionality within
> MSSQL
> > to insert incoming avro files.
> > 4. *GetColumns* - A user selects the controller service and database type
> > and we will fetch the columns from the database/schema.table provided and
> > attach as a comma separated value on an attribute or flowfile.
> >
> > Any advice/suggestions would be greatly appreciated. Thanks!
> >
> >
> > Chris Lundeberg
> > *Modern Data Engineer / Data Engineering Practice Lead*
> > <https://1904labs.com/>
> > <https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
> > <https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
> > 1904labs is proud to be a Top Workplace 2018
> > <https://1904labs.com/o/TopWorkplace2018>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Nifi

Andy LoPresto-2
No worries, Chris. I added both of you to the Contributors role, so you should be ready to go now.

Andy LoPresto
[hidden email]
[hidden email]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Mar 26, 2019, at 9:10 AM, Chris Lundeberg <[hidden email]> wrote:
>
> Perfect - Thank you for the information, Andy.
>
> I will read over the docs you suggested and push forward with the advice
> you gave.  I really appreciate your input.
>
> We have two people as of now which would need the necessary Jira access (We
> can log in at the moment and see all the tickets):
>
> Myself:  clundeberg
> Nathan Bruce (Co-worker):  nathan.bruce
>
> Thanks again!
>
> Chris Lundeberg
> *Modern Data Engineer / Data Engineering Practice Lead*
> <https://1904labs.com/ <https://1904labs.com/>>
> <https://1904labs.com/ <https://1904labs.com/>>  <https://www.linkedin.com/company/1904labs/ <https://www.linkedin.com/company/1904labs/>>
> <https://twitter.com/1904labs <https://twitter.com/1904labs>>  <https://www.facebook.com/1904labs/ <https://www.facebook.com/1904labs/>>
> 1904labs is proud to be a Top Workplace 2018
> <https://1904labs.com/o/TopWorkplace2018 <https://1904labs.com/o/TopWorkplace2018>>
>
>
> On Tue, Mar 26, 2019 at 10:58 AM Andy LoPresto <[hidden email] <mailto:[hidden email]>> wrote:
>
>> Hi Chris,
>>
>> Thanks for getting involved and contributing back to the community. There
>> is no formal voting process to prioritize contributions — the Contributor
>> Guide [1] and Developer Guide [2] have a lot of useful information around
>> this. In general, for conversations like the one you’re asking for, an
>> email to the list is sufficient, and anyone who feels strongly will weigh
>> in here. Once you’ve had a discussion around this, you can prioritize your
>> contributions, open Jira tickets for each, and create the pull requests. A
>> committer will need to provide formal acceptance before the code can be
>> merged, so this may require multiple rounds of comment/discussion/patching.
>> All committers have a lot of responsibilities, so the expectations around
>> time frame to merge may be extended right now. But opening the PRs will
>> definitely get some community feedback, so I encourage you to do that.
>>
>> To create/assign Jiras, please reply here with your username (and those of
>> your colleagues if applicable), and I will give you the proper permissions
>> in our Jira instance.
>>
>> Personally I am most interested in the EncryptValue processor, so
>> depending on my tasking for the next few days, that’s where I would likely
>> focus my attention if available.
>>
>>
>> [1] https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide <
>> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide <https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide>>
>> [2] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html <https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html> <
>> https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html>
>>
>> Andy LoPresto
>> [hidden email]
>> [hidden email]
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>>> On Mar 26, 2019, at 7:11 AM, Chris Lundeberg <[hidden email]>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I hope this message finds everyone well. My company is starting to build
>> a
>>> few custom solutions using Nifi, for a few clients.  We want to be more
>>> involved in the Nifi community and start contributing back some of the
>> work
>>> we have done. We have a few processors that we have created and pushed to
>>> open repos, but would like to try and get some of them built into the
>> base
>>> Nifi distro, if possible.  We are doing a lot of research now to
>> understand
>>> what that looks like and I think are ready to start picking up and
>> creating
>>> Jira tickets.  My main question for this thread is with new processors;
>> if
>>> we have several that we think could be a good addition, is there some
>> kind
>>> of voting process that might help us understand which ones would actually
>>> be of value to the greater community or is that just decided on a PR
>>> basis?  Some of the example processors that we have created / are
>> creating
>>> are:
>>>
>>> 1. *EncryptValue* - Reads a list of values from an attribute and loops
>> over
>>> the keys within the data.  As it finds the matches, it will hash the
>> value
>>> based on the type that the user selects (we support all the normal ones).
>>> 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
>>> over the keys within the incoming data.  If it finds a match, it will
>>> standardize the value of that key as ISO-8601.
>>> 3. *AvroBulkInsert* - We utilize the bulk insert functionality within
>> MSSQL
>>> to insert incoming avro files.
>>> 4. *GetColumns* - A user selects the controller service and database type
>>> and we will fetch the columns from the database/schema.table provided and
>>> attach as a comma separated value on an attribute or flowfile.
>>>
>>> Any advice/suggestions would be greatly appreciated. Thanks!
>>>
>>>
>>> Chris Lundeberg
>>> *Modern Data Engineer / Data Engineering Practice Lead*
>>> <https://1904labs.com/>
>>> <https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
>>> <https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
>>> 1904labs is proud to be a Top Workplace 2018
>>> <https://1904labs.com/o/TopWorkplace2018>

Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Nifi

Mike Thomsen
In reply to this post by Chris Lundeberg
Chris,

Don't know if you've created them yet, but here's a few things you might
want to consider:

> 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
over the keys within the incoming data.  If it finds a match, it will
standardize the value of that key as ISO-8601.

We had to implement something a bit similar and chose to do it as a
LookupService so that you can operate on a record set. If you're working
with large volumes of data and having to standardize dates, a pivot to
using the Record API would be a really good idea.

> *AvroBulkInsert* - We utilize the bulk insert functionality within MSSQL
to insert incoming avro files.

Might want to look at PutDatabaseRecord if you haven't and see if it meets
your use case:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.8.0/org.apache.nifi.processors.standard.PutDatabaseRecord/index.html

Just wanted to throw those out there in case you hadn't considered them
because a lot of us have similar use cases.


On Tue, Mar 26, 2019 at 10:11 AM Chris Lundeberg <[hidden email]>
wrote:

> Hi all,
>
>  I hope this message finds everyone well. My company is starting to build a
> few custom solutions using Nifi, for a few clients.  We want to be more
> involved in the Nifi community and start contributing back some of the work
> we have done. We have a few processors that we have created and pushed to
> open repos, but would like to try and get some of them built into the base
> Nifi distro, if possible.  We are doing a lot of research now to understand
> what that looks like and I think are ready to start picking up and creating
> Jira tickets.  My main question for this thread is with new processors; if
> we have several that we think could be a good addition, is there some kind
> of voting process that might help us understand which ones would actually
> be of value to the greater community or is that just decided on a PR
> basis?  Some of the example processors that we have created / are creating
> are:
>
> 1. *EncryptValue* - Reads a list of values from an attribute and loops over
> the keys within the data.  As it finds the matches, it will hash the value
> based on the type that the user selects (we support all the normal ones).
> 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
> over the keys within the incoming data.  If it finds a match, it will
> standardize the value of that key as ISO-8601.
> 3. *AvroBulkInsert* - We utilize the bulk insert functionality within MSSQL
> to insert incoming avro files.
> 4. *GetColumns* - A user selects the controller service and database type
> and we will fetch the columns from the database/schema.table provided and
> attach as a comma separated value on an attribute or flowfile.
>
> Any advice/suggestions would be greatly appreciated. Thanks!
>
>
> Chris Lundeberg
> *Modern Data Engineer / Data Engineering Practice Lead*
> <https://1904labs.com/>
> <https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
> <https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
> 1904labs is proud to be a Top Workplace 2018
> <https://1904labs.com/o/TopWorkplace2018>
>
Reply | Threaded
Open this post in threaded view
|

Re: Contributing to Nifi

Chris Lundeberg
Hi Mike

Thank you for the heads up. I will check out the processors you have listed
and ensure I don't duplicate efforts. Thanks!


Chris Lundeberg



On Wed, Mar 27, 2019, 8:13 AM Mike Thomsen <[hidden email]> wrote:

> Chris,
>
> Don't know if you've created them yet, but here's a few things you might
> want to consider:
>
> > 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
> over the keys within the incoming data.  If it finds a match, it will
> standardize the value of that key as ISO-8601.
>
> We had to implement something a bit similar and chose to do it as a
> LookupService so that you can operate on a record set. If you're working
> with large volumes of data and having to standardize dates, a pivot to
> using the Record API would be a really good idea.
>
> > *AvroBulkInsert* - We utilize the bulk insert functionality within MSSQL
> to insert incoming avro files.
>
> Might want to look at PutDatabaseRecord if you haven't and see if it meets
> your use case:
>
>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.8.0/org.apache.nifi.processors.standard.PutDatabaseRecord/index.html
>
> Just wanted to throw those out there in case you hadn't considered them
> because a lot of us have similar use cases.
>
>
> On Tue, Mar 26, 2019 at 10:11 AM Chris Lundeberg <[hidden email]>
> wrote:
>
> > Hi all,
> >
> >  I hope this message finds everyone well. My company is starting to
> build a
> > few custom solutions using Nifi, for a few clients.  We want to be more
> > involved in the Nifi community and start contributing back some of the
> work
> > we have done. We have a few processors that we have created and pushed to
> > open repos, but would like to try and get some of them built into the
> base
> > Nifi distro, if possible.  We are doing a lot of research now to
> understand
> > what that looks like and I think are ready to start picking up and
> creating
> > Jira tickets.  My main question for this thread is with new processors;
> if
> > we have several that we think could be a good addition, is there some
> kind
> > of voting process that might help us understand which ones would actually
> > be of value to the greater community or is that just decided on a PR
> > basis?  Some of the example processors that we have created / are
> creating
> > are:
> >
> > 1. *EncryptValue* - Reads a list of values from an attribute and loops
> over
> > the keys within the data.  As it finds the matches, it will hash the
> value
> > based on the type that the user selects (we support all the normal ones).
> > 2. *StandardizeDate* - Reads a key/value pair from an attribute and loops
> > over the keys within the incoming data.  If it finds a match, it will
> > standardize the value of that key as ISO-8601.
> > 3. *AvroBulkInsert* - We utilize the bulk insert functionality within
> MSSQL
> > to insert incoming avro files.
> > 4. *GetColumns* - A user selects the controller service and database type
> > and we will fetch the columns from the database/schema.table provided and
> > attach as a comma separated value on an attribute or flowfile.
> >
> > Any advice/suggestions would be greatly appreciated. Thanks!
> >
> >
> > Chris Lundeberg
> > *Modern Data Engineer / Data Engineering Practice Lead*
> > <https://1904labs.com/>
> > <https://1904labs.com/>  <https://www.linkedin.com/company/1904labs/>
> > <https://twitter.com/1904labs>  <https://www.facebook.com/1904labs/>
> > 1904labs is proud to be a Top Workplace 2018
> > <https://1904labs.com/o/TopWorkplace2018>
> >
>