[PutElasticSearchProcessor] - Put Elastic search Processor with External version

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[PutElasticSearchProcessor] - Put Elastic search Processor with External version

Ulisses Franca
Hi,
We have been using Apache Nifi for a while and its a core part of our
system.
Our main use case is to consume some documents from a apache kafka cluster
and persist it to an Elastic search cluster. The version of these documents
is controlled externally. We use the PutElasticSearch5processor to push
data to elastic search, currently we are interested in the most recent
version of a document, but we insert all the updates in elastic search and
rely on searching to get the latest version of the document.
As you can see this creates a lot of redundant copies of the same document,
which we dont use.
We are exploring options to overcome this problem, and one of them would be
to extend the PutElasticSearch5Processor and add support for external
version of the elastic search documents. We checked and the elastic search
_bulk api supports external versioning.

My question is simple, is there any plans to add support for external
version on the PuElasticSearch5Processor ?

Regards,
Ulisses Franca
Reply | Threaded
Open this post in threaded view
|

Re: [PutElasticSearchProcessor] - Put Elastic search Processor with External version

Matt Burgess-2
Ulisses,

There is a Jira [1] along with a patch [2] to add this feature, but it
appears to have languished for most of the year. I will leave a
comment on the PR to see if the author is interested in continuing the
work; otherwise, perhaps someone else from the community will pick it
up.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-4625
[2] https://github.com/apache/nifi/pull/2287
On Tue, Nov 6, 2018 at 6:59 PM Ulisses Franca
<[hidden email]> wrote:

>
> Hi,
> We have been using Apache Nifi for a while and its a core part of our
> system.
> Our main use case is to consume some documents from a apache kafka cluster
> and persist it to an Elastic search cluster. The version of these documents
> is controlled externally. We use the PutElasticSearch5processor to push
> data to elastic search, currently we are interested in the most recent
> version of a document, but we insert all the updates in elastic search and
> rely on searching to get the latest version of the document.
> As you can see this creates a lot of redundant copies of the same document,
> which we dont use.
> We are exploring options to overcome this problem, and one of them would be
> to extend the PutElasticSearch5Processor and add support for external
> version of the elastic search documents. We checked and the elastic search
> _bulk api supports external versioning.
>
> My question is simple, is there any plans to add support for external
> version on the PuElasticSearch5Processor ?
>
> Regards,
> Ulisses Franca