How to delete the data in the flowfile?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to delete the data in the flowfile?

luby
Hi, All,

We use NiFi to import data from Oracle database to Hive.

The first step is to extract all data from the Oracle database and persist
it into the flowfile
which will then 'flow' into other processors to do further processing.

After persisting the data into the Hive, we found that the data persisted
in the first step were not
deteled. This will occupied a lot of disk spaces.

So is there any way to tell NiFi to delete those data after the next
processor has finished reading the data?

Thanks

Boying



 
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。

 
This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

Re: How to delete the data in the flowfile?

Jeff
Hello Boying,

Once flowfiles have completed processing, they may still be archived within
the content repository for a certain period of time before they age-off. In
the NiFi Admin guide, there is a section on Content Repository properties
[1] you can set in nifi.properties, through which you can tweak how much
space is used to archive, how long flowfiles are archived, or to disable
archiving completely.

Lowering the "nifi.content.repository.archive.max.retention.period" and
"nifi.content.repository.archive.max.usage.percentage" properties can help
limit the amount of disk space the content repository uses for archived
flowfiles.  You can disable content archiving by setting
"nifi.content.repository.archive.enabled" to false if you prefer to have no
archive at all.

If your flow uses a processor like PutFile to place a flowfile in a
temporary directory to do further processing on it, or to allow "backups"
of the flowfile for various stages of processing, then your flow must be
designed to clean up those files after they are no longer needed.  There
are several ways to do this, one of them being Wait/Notify processors.
There's a blog that Koji has written [2] with some examples on how to use
the Wait and Notify processors, and the concepts covered in the blog should
be usable in your case where you might want to use the Wait/Notify
processors to signal that flowfiles that are no longer needed that have
been explicitly archived/copied by processors like "PutFile" can be removed.

Please let me know if neither of these solutions help with disk space
issues while using your flow.  If you provide your flow as an example, we
can take a look at other ways to try to minimize disk usage.

[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#file-system-content-repository-properties
[2]
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify

On Mon, Nov 20, 2017 at 3:16 AM <[hidden email]> wrote:

> Hi, All,
>
> We use NiFi to import data from Oracle database to Hive.
>
> The first step is to extract all data from the Oracle database and persist
> it into the flowfile
> which will then 'flow' into other processors to do further processing.
>
> After persisting the data into the Hive, we found that the data persisted
> in the first step were not
> deteled. This will occupied a lot of disk spaces.
>
> So is there any way to tell NiFi to delete those data after the next
> processor has finished reading the data?
>
> Thanks
>
> Boying
>
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
> 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
> 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
>
>
> This email message may contain confidential and/or privileged information.
> If you are not the intended recipient, please do not read, save, forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender immediately
> by replying this email, and delete this email and any attachment or links
> to this email completely and immediately from your computer system.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

答复: Re: How to delete the data in the flowfile?

luby
Very appreicate for your helpl. It's very helpful. :)



发件人:
Jeff <[hidden email]>
收件人:
[hidden email]
日期:
2017/11/20 22:07
主题:
Re: How to delete the data in the flowfile?



Hello Boying,

Once flowfiles have completed processing, they may still be archived
within
the content repository for a certain period of time before they age-off.
In
the NiFi Admin guide, there is a section on Content Repository properties
[1] you can set in nifi.properties, through which you can tweak how much
space is used to archive, how long flowfiles are archived, or to disable
archiving completely.

Lowering the "nifi.content.repository.archive.max.retention.period" and
"nifi.content.repository.archive.max.usage.percentage" properties can help
limit the amount of disk space the content repository uses for archived
flowfiles.  You can disable content archiving by setting
"nifi.content.repository.archive.enabled" to false if you prefer to have
no
archive at all.

If your flow uses a processor like PutFile to place a flowfile in a
temporary directory to do further processing on it, or to allow "backups"
of the flowfile for various stages of processing, then your flow must be
designed to clean up those files after they are no longer needed.  There
are several ways to do this, one of them being Wait/Notify processors.
There's a blog that Koji has written [2] with some examples on how to use
the Wait and Notify processors, and the concepts covered in the blog
should
be usable in your case where you might want to use the Wait/Notify
processors to signal that flowfiles that are no longer needed that have
been explicitly archived/copied by processors like "PutFile" can be
removed.

Please let me know if neither of these solutions help with disk space
issues while using your flow.  If you provide your flow as an example, we
can take a look at other ways to try to minimize disk usage.

[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#file-system-content-repository-properties

[2]
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify


On Mon, Nov 20, 2017 at 3:16 AM <[hidden email]> wrote:

> Hi, All,
>
> We use NiFi to import data from Oracle database to Hive.
>
> The first step is to extract all data from the Oracle database and
persist
> it into the flowfile
> which will then 'flow' into other processors to do further processing.
>
> After persisting the data into the Hive, we found that the data
persisted

> in the first step were not
> deteled. This will occupied a lot of disk spaces.
>
> So is there any way to tell NiFi to delete those data after the next
> processor has finished reading the data?
>
> Thanks
>
> Boying
>
>
>
>
> 本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对

> 披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发

> 人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。
>
>
> This email message may contain confidential and/or privileged
information.
> If you are not the intended recipient, please do not read, save,
forward,
> disclose or copy the contents of this email or open any file attached to
> this email. We will be grateful if you could advise the sender
immediately
> by replying this email, and delete this email and any attachment or
links
> to this email completely and immediately from your computer system.
>
>
>
>





 
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外
披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件
人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。

 
This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.