FetchFile Cannot Allocate Enough Memory

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FetchFile Cannot Allocate Enough Memory

dale.chang13
This post was updated on .
I have been trying to run my data flow and I have been running into a problem with being unable to read FetchFiles. I will detail my process below and I would like some confirmation of my suspicions.

First I am ingesting an initial file that is fairly large, which contains the path/filename of a ton of text files within another directory. The goal is to read in the content of that large file, then read in the contents of the thousands of text files, and then store the text file content into Solr.

The problem I am having is that the second FetchFile, the one that reads in the smaller text files, frequently reports an error: FileNotFoundException xxx.txt (Cannot allocate memory); routing to failure. This FetchFile runs for about 20000 files and then continuously reports the above error for the rest of the files.

My suspicion is of two concerns: not enough heap space vs. not enough content_repo/flowfile_repo space. Any ideas or questions?



*** EDIT: I am also running this in a cluster of 1 NCM and 2 Nodes on Linux VMs through Hyper-V; the NCM and the Nodes are configured to have 20GB assigned memory, but I have limited the heap space to 4g via Xmx; I have not scheduled any portion of the dataflow to the primary node--i.e. the entire flow runs on both nodes
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

Mark Payne
Dale,

I haven't seen this issue personally. I don't believe it has to do with content/flowfile
repo space. Can you check the logs/nifi-app.log file and give us the exact error message
from the logs, with the stack trace if it is provided?

Thanks
-Mark

> On Apr 29, 2016, at 9:12 AM, dale.chang13 <[hidden email]> wrote:
>
> I have been trying to run my data flow and I have been running into a problem
> with being unable to read FetchFiles. I will detail my process below and I
> would like some confirmation of my suspicions.
>
> First I am ingesting an initial file that is fairly large, which contains
> the path/filename of a ton of text files within another directory. The goal
> is to read in the content of that large file, then read in the contents of
> the thousands of text files, and then store the text file content into Solr.
>
> The problem I am having is that the second FetchFile, the one that reads in
> the smaller text files, frequently reports an error: /FileNotFoundException
> xxx.txt (Cannot allocate memory); routing to failure/. This FetchFile runs
> for about 20000 files and then continuously reports the above error for the
> rest of the files.
>
> My suspicion is of two concerns: not enough heap space vs. not enough
> content_repo/flowfile_repo space. Any ideas or questions?
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

dale.chang13
Mark Payne wrote
Dale,

I haven't seen this issue personally. I don't believe it has to do with content/flowfile
repo space. Can you check the logs/nifi-app.log file and give us the exact error message
from the logs, with the stack trace if it is provided?

Thanks
-Mark
Sure, this is from one of the slave nodes. I hardly provides any information. I supposed I could do a jstat or a df -h.

I've also created a MonitorMemory Reporting Task, but I cannot seem to provide the correct names for the memory pool. The only one that works fine is G1 Old Gen memory pool
app.log wrote
2016-04-29 10:16:28,027 ERROR [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt from file system for StandardFlowFileRecord[uuid=1a2d7918-377e-4256-8610-1b12493eb16e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=47518, length=1268],offset=0,name=FW: FERC Daily News.msg,size=1268] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt (Cannot allocate memory)

2016-04-29 10:16:28,028 ERROR [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt from file system for StandardFlowFileRecord[uuid=3b5fef42-2ded-47cc-aba2-6caf95f04977,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=48786, length=1640],offset=0,name=FW: FERC Docket No. EL01-47:  Removing Obstacles To Increased Eleu0020ctri c Generation And Natural Gas Supply In The Western United States.msg,size=1640] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt (Cannot allocate memory)

2016-04-29 10:16:28,029 ERROR [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt from file system for StandardFlowFileRecord[uuid=71715448-2acd-4f5c-af57-9209461fe62e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=50426, length=1272],offset=0,name=FW: workshop notice.msg,size=1272] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt (Cannot allocate memory)

2016-04-29 10:16:28,030 ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt from file system for StandardFlowFileRecord[uuid=0bf8666b-a9bc-4412-905e-6e4a2b13253d,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=51698, length=1440],offset=0,name=FW: Final Report on Workshop Report to Discuss Alternative Gas Inu0020dice s.msg,size=1440] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt (Cannot allocate memory)

2016-04-29 10:16:28,034 ERROR [Timer-Driven Process Thread-3] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt from file system for StandardFlowFileRecord[uuid=805d4127-d86b-4b4b-a7a0-9f6f300dc13e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=54433, length=1334],offset=0,name=FW: Proposed NARUC resolution on hedging.msg,size=1334] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt (Cannot allocate memory)

2016-04-29 10:16:28,035 ERROR [Timer-Driven Process Thread-3] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt from file system for StandardFlowFileRecord[uuid=38ee3345-fa72-459c-92f0-7fc2d58160ba,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=55767, length=1557],offset=0,name=Merchant Group Memo from Daniel Allegretti.msg,size=1557] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt (Cannot allocate memory)

2016-04-29 10:16:28,036 ERROR [Timer-Driven Process Thread-2] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt from file system for StandardFlowFileRecord[uuid=a6a69ead-1c1e-4e52-9cf1-536f065f4e84,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=57324, length=1450],offset=0,name=Thanksgiving pictures.msg,size=1450] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt (Cannot allocate memory)

2016-04-29 10:16:28,038 ERROR [Timer-Driven Process Thread-2] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt from file system for StandardFlowFileRecord[uuid=c23a560a-519f-4c34-9686-cda228fe3362,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=58774, length=1420],offset=0,name=and more....msg,size=1420] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt (Cannot allocate memory)

2016-04-29 10:16:28,043 ERROR [Timer-Driven Process Thread-2] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt from file system for StandardFlowFileRecord[uuid=96f51398-8b92-485c-8e8f-a791f896ce9f,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=61578, length=1349],offset=0,name=pix....msg,size=1349] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt (Cannot allocate memory)

2016-04-29 10:16:28,044 ERROR [Timer-Driven Process Thread-2] o.a.nifi.processors.standard.FetchFile FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt from file system for StandardFlowFileRecord[uuid=cdafaa63-b694-46da-9103-3da70e2c23ac,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1461938937407-463, container=default, section=463], offset=62927, length=1361],offset=0,name=more of.....msg,size=1361] due to java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt (Cannot allocate memory); routing to failure: java.io.FileNotFoundException: /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt (Cannot allocate memory)
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

Mark Payne
Some googling of "FileNotFoundException cannot allocate memory" indicates that this is
fairly common when running in a VM that has very little RAM, as there is not enough heap
space even to create a linux process. Do you have a reasonable amount of RAM free on the
box?

Thanks
-Mark

> On Apr 29, 2016, at 9:29 AM, dale.chang13 <[hidden email]> wrote:
>
> Mark Payne wrote
>> Dale,
>>
>> I haven't seen this issue personally. I don't believe it has to do with
>> content/flowfile
>> repo space. Can you check the logs/nifi-app.log file and give us the exact
>> error message
>> from the logs, with the stack trace if it is provided?
>>
>> Thanks
>> -Mark
>
> Sure, this is from one of the slave nodes. I hardly provides any
> information. I supposed I could do a jstat or a df -h.
>
> I've also created a MonitorMemory Reporting Task, but I cannot seem to
> provide the correct names for the memory pool. The only one that works fine
> is G1 Old Gen memory pool
>
> app.log wrote
>> 2016-04-29 10:16:28,027 ERROR [Timer-Driven Process Thread-6]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt from
>> file system for
>> StandardFlowFileRecord[uuid=1a2d7918-377e-4256-8610-1b12493eb16e,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=47518,
>> length=1268],offset=0,name=FW: FERC Daily News.msg,size=1268] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188512.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,028 ERROR [Timer-Driven Process Thread-6]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt from
>> file system for
>> StandardFlowFileRecord[uuid=3b5fef42-2ded-47cc-aba2-6caf95f04977,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=48786,
>> length=1640],offset=0,name=FW: FERC Docket No. EL01-47:  Removing
>> Obstacles To Increased Eleu0020ctri c Generation And Natural Gas Supply In
>> The Western United States.msg,size=1640] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188518.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,029 ERROR [Timer-Driven Process Thread-8]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt from
>> file system for
>> StandardFlowFileRecord[uuid=71715448-2acd-4f5c-af57-9209461fe62e,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=50426,
>> length=1272],offset=0,name=FW: workshop notice.msg,size=1272] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188526.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,030 ERROR [Timer-Driven Process Thread-9]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt from
>> file system for
>> StandardFlowFileRecord[uuid=0bf8666b-a9bc-4412-905e-6e4a2b13253d,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=51698,
>> length=1440],offset=0,name=FW: Final Report on Workshop Report to Discuss
>> Alternative Gas Inu0020dice s.msg,size=1440] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188534.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,034 ERROR [Timer-Driven Process Thread-3]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt from
>> file system for
>> StandardFlowFileRecord[uuid=805d4127-d86b-4b4b-a7a0-9f6f300dc13e,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=54433,
>> length=1334],offset=0,name=FW: Proposed NARUC resolution on
>> hedging.msg,size=1334] due to java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188558.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,035 ERROR [Timer-Driven Process Thread-3]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt from
>> file system for
>> StandardFlowFileRecord[uuid=38ee3345-fa72-459c-92f0-7fc2d58160ba,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=55767,
>> length=1557],offset=0,name=Merchant Group Memo from Daniel
>> Allegretti.msg,size=1557] due to java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188570.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,036 ERROR [Timer-Driven Process Thread-2]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt from
>> file system for
>> StandardFlowFileRecord[uuid=a6a69ead-1c1e-4e52-9cf1-536f065f4e84,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=57324,
>> length=1450],offset=0,name=Thanksgiving pictures.msg,size=1450] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188583.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,038 ERROR [Timer-Driven Process Thread-2]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt from
>> file system for
>> StandardFlowFileRecord[uuid=c23a560a-519f-4c34-9686-cda228fe3362,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=58774,
>> length=1420],offset=0,name=and more....msg,size=1420] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188591.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,043 ERROR [Timer-Driven Process Thread-2]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt from
>> file system for
>> StandardFlowFileRecord[uuid=96f51398-8b92-485c-8e8f-a791f896ce9f,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=61578,
>> length=1349],offset=0,name=pix....msg,size=1349] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188601.txt
>> (Cannot allocate memory)
>>
>> 2016-04-29 10:16:28,044 ERROR [Timer-Driven Process Thread-2]
>> o.a.nifi.processors.standard.FetchFile
>> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt from
>> file system for
>> StandardFlowFileRecord[uuid=cdafaa63-b694-46da-9103-3da70e2c23ac,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
>> container=default, section=463], offset=62927,
>> length=1361],offset=0,name=more of.....msg,size=1361] due to
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt
>> (Cannot allocate memory); routing to failure:
>> java.io.FileNotFoundException:
>> /tmp/hddCobrasan/Export1/VOL000001/TEXT/TEXT000001/ENR-00188605.txt
>> (Cannot allocate memory)
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9724.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

dale.chang13
This post was updated on .
Mark Payne wrote
Some googling of "FileNotFoundException cannot allocate memory" indicates that this is
fairly common when running in a VM that has very little RAM, as there is not enough heap
space even to create a linux process. Do you have a reasonable amount of RAM free on the
box?

Thanks
-Mark
Each node is configured for 20 GB of memory, and the bootstrap.conf for each node specifies the JVM heap size to be at a minimum of 8g and a max of 10g:
-Xms8g
-Xmx10g

Checking my Hyper V Manager, the activity of the boxes are fairly low--less than 10% of the 20GB I gave each of them.

I still have the same stack trace pop up.

However, I did notice that my F: drive was pretty much maxed out in performance in Task Manager; note, the F: drive is an HDD and it looks like the I/O might have been maxed out. I'll continue testing and looking for metrics
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

dale.chang13
So I still haven't decrypted this problem, and I am assuming that this is an IOPS problem instead of a RAM issue.

I have monitored the memory of the nodes in my cluster during the flow, before and after the "cannot allocate memory" exception occurs. However, there is no memory leak because the memory used by the JVM remains steady between 50 and 100 MB used using jconsole. As a note, I have allocated 1 GB as a minimum and 4 GB as a maximum for the heap size for each node.

There are also no changes to the number of active threads (35) in jconsole while the NiFi gui shows up to 20 active threads. Additionally the number of classes loaded and CPU usage remains the same throughout the whole NiFi operation.

The only difference I have seen is disk activity on the drive that is configured to read to/write from NiFi.

My question is: does it make sense that this is an IO issue, or a RAM/memory issue?
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

Joe Witt
Dale,

Where there is a fetch file there is usually a list file.  And while
the symptom of memory issues is showing up in fetch file i am curious
if the issue might actually be caused in ListFile.  How many files are
in the directory being listed?

Mark,

Are we using a stream friendly API to list files and do we know if
that API on all platforms really doing things in a stream friendly
way?

THanks
Joe

On Wed, May 4, 2016 at 7:37 AM, dale.chang13 <[hidden email]> wrote:

> So I still haven't decrypted this problem, and I am assuming that this is an
> IOPS problem instead of a RAM issue.
>
> I have monitored the memory of the nodes in my cluster during the flow,
> before and after the "cannot allocate memory" exception occurs. However,
> there is no memory leak because the memory used by the JVM remains steady
> between 50 and 100 MB used using jconsole. As a note, I have allocated 1 GB
> as a minimum and 4 GB as a maximum for the heap size for each node.
>
> There are also no changes to the number of active threads (35) in jconsole
> while the NiFi gui shows up to 20 active threads. Additionally the number of
> classes loaded and CPU usage remains the same throughout the whole NiFi
> operation.
>
> The only difference I have seen is disk activity on the drive that is
> configured to read to/write from NiFi.
>
> My question is: does it make sense that this is an IO issue, or a RAM/memory
> issue?
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9901.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

Mark Payne
ListFile performs a listing using Java's File.listFiles(). This will provide a list of all files in the
directory. I do not believe this to be related, though. Googling indicates that when this error
occurs it is related to the ability to create a native process in order to interact with the file system.
I don't think the issue is related to Java heap but rather available RAM on the box. How much RAM
is actually available on the box? You mentioned IOPS - are you running in a virtual cloud environment?
Using remote storage such as Amazon EBS?

> On May 4, 2016, at 8:56 AM, Joe Witt <[hidden email]> wrote:
>
> Dale,
>
> Where there is a fetch file there is usually a list file.  And while
> the symptom of memory issues is showing up in fetch file i am curious
> if the issue might actually be caused in ListFile.  How many files are
> in the directory being listed?
>
> Mark,
>
> Are we using a stream friendly API to list files and do we know if
> that API on all platforms really doing things in a stream friendly
> way?
>
> THanks
> Joe
>
> On Wed, May 4, 2016 at 7:37 AM, dale.chang13 <[hidden email]> wrote:
>> So I still haven't decrypted this problem, and I am assuming that this is an
>> IOPS problem instead of a RAM issue.
>>
>> I have monitored the memory of the nodes in my cluster during the flow,
>> before and after the "cannot allocate memory" exception occurs. However,
>> there is no memory leak because the memory used by the JVM remains steady
>> between 50 and 100 MB used using jconsole. As a note, I have allocated 1 GB
>> as a minimum and 4 GB as a maximum for the heap size for each node.
>>
>> There are also no changes to the number of active threads (35) in jconsole
>> while the NiFi gui shows up to 20 active threads. Additionally the number of
>> classes loaded and CPU usage remains the same throughout the whole NiFi
>> operation.
>>
>> The only difference I have seen is disk activity on the drive that is
>> configured to read to/write from NiFi.
>>
>> My question is: does it make sense that this is an IO issue, or a RAM/memory
>> issue?
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9901.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

dale.chang13
This post was updated on .
Joe Witt wrote
On May 4, 2016, at 8:56 AM, Joe Witt <[hidden email]> wrote:

Dale,

Where there is a fetch file there is usually a list file.  And while
the symptom of memory issues is showing up in fetch file i am curious
if the issue might actually be caused in ListFile.  How many files are
in the directory being listed?

Mark,

Are we using a stream friendly API to list files and do we know if
that API on all platforms really doing things in a stream friendly
way?

Thanks
Joe
So I will explain my flow first and then I will answer your question of how I am using ListFile and FetchFile.

To begin my process, I am ingesting a CSV file that contains a list of filenames. The first (and only ListFile) starts off the flow and passes it to the first FetchFile to retrieve the contents of the documents. Afterward, I use expression language (ExtractText) to extract all of the file names and put them as attributes to individual FlowFiles. THEN I use a second FetchFile (this is the processor that has trouble allocating memory) and use expression language to use that file name to retrieve a text document.

The CSV file (189 MB) contains metadata and path/filenames for over 200,000 documents, and I am having trouble reading from a directory of about 85,000 documents (second FetchFile, each document is usually a few KB). I get stuck at around 20 MB and then NiFi moves to a crawl.

I can give you a picture of our actual flow if you need it

Mark Payne wrote
ListFile performs a listing using Java's File.listFiles(). This will provide a list of all files in the
directory. I do not believe this to be related, though. Googling indicates that when this error
occurs it is related to the ability to create a native process in order to interact with the file system.
I don't think the issue is related to Java heap but rather available RAM on the box. How much RAM
is actually available on the box? You mentioned IOPS - are you running in a virtual cloud environment?
Using remote storage such as Amazon EBS?
I am running six Linux VMs on a Windows 8 machine. Three VMs (one ncm, two nodes) use NiFi and those VMs have 20 GB assigned to them. Looking through Ambari and monitoring the memory on the nodes, I have a little more than 4 GB free RAM on the nodes. It looks like the free memory dipped severely to about 1.5 GB during my NiFi flow, but no swap memory was used.
Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

Mark Payne
Dale,

I think an image of the flow would be useful. Or better yet, if you can, a template of the flow, so
that we can see all of the configuration being used.

When you said you "get stuck at around 20 MB and then NiFi moves to a crawl" I'm not clear on
what you are saying exactly. After you process 20 MB of the 189 MB CSV file? After you ingest
20 MB worth of files via the second FetchFile?

Also, which directory has 85,000 files? The first directory being polled via ListFile, or the directory
that you are picking up from via the second FetchFile?

Thanks
-Mark


> On May 4, 2016, at 9:01 AM, dale.chang13 <[hidden email]> wrote:
>
> Joe Witt wrote
>> On May 4, 2016, at 8:56 AM, Joe Witt &lt;
>
>> joe.witt@
>
>> &gt; wrote:
>>
>> Dale,
>>
>> Where there is a fetch file there is usually a list file.  And while
>> the symptom of memory issues is showing up in fetch file i am curious
>> if the issue might actually be caused in ListFile.  How many files are
>> in the directory being listed?
>>
>> Mark,
>>
>> Are we using a stream friendly API to list files and do we know if
>> that API on all platforms really doing things in a stream friendly
>> way?
>>
>> Thanks
>> Joe
>
> So I will explain my flow first and then I will answer your question of how
> I am using ListFile and FetchFile.
>
> To begin my process, I am ingesting a CSV file that contains a list of
> filenames. The first (and only ListFile) starts off the flow and passes it
> to the first FetchFile to retrieve the contents of the documents. Afterward,
> I use expression language (ExtractText) to extract all of the file names and
> put them as attributes to individual FlowFiles. THEN I use a second
> FetchFile (this is the processor that has trouble allocating memory) and use
> expression language to use that file name to retrieve a text document.
>
> The CSV file (189 MB) contains metadata and path/filenames for over 200,000
> documents, and I am having trouble reading from a directory of about 85,000
> documents (second FetchFile, each document is usually a few KB). I get stuck
> at around 20 MB and then NiFi moves to a crawl.
>
> I can give you a picture of our actual flow if you need it
>
>
> Mark Payne wrote
>> ListFile performs a listing using Java's File.listFiles(). This will
>> provide a list of all files in the
>> directory. I do not believe this to be related, though. Googling indicates
>> that when this error
>> occurs it is related to the ability to create a native process in order to
>> interact with the file system.
>> I don't think the issue is related to Java heap but rather available RAM
>> on the box. How much RAM
>> is actually available on the box? You mentioned IOPS - are you running in
>> a virtual cloud environment?
>> Using remote storage such as Amazon EBS?
>
> I am running six Linux VMs on a Windows 8 machine. Three VMs (one ncm, two
> nodes) use NiFi and those VMs have 20 GB assigned to them. Looking through
> Ambari and monitoring the memory on the nodes, I have a little more than 4
> GB free RAM on the nodes. It looks like the free memory dipped severely
> during my NiFi flow, but no swap memory was used.
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9911.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: FetchFile Cannot Allocate Enough Memory

dale.chang13
Mark Payne wrote
Dale,

I think an image of the flow would be useful. Or better yet, if you can, a template of the flow, so
that we can see all of the configuration being used.

When you said you "get stuck at around 20 MB and then NiFi moves to a crawl" I'm not clear on
what you are saying exactly. After you process 20 MB of the 189 MB CSV file? After you ingest
20 MB worth of files via the second FetchFile?

Also, which directory has 85,000 files? The first directory being polled via ListFile, or the directory
that you are picking up from via the second FetchFile?

Thanks
-Mark
Attached below you should find the template the flow in question. I had to remove or alter some information, but that does not impact the workflow.

We have to ingest a CSV (called a DAT file) with variable delimiters and qualifiers (so it's not really a CSV). The CSV has headers and many lines. Each line corresponds to one text document. Each line also contains metadata about and a URI to that document.
There are several folders which contain text files that are described by the big CSV file. Two FetchFiles later in the flow will read attempt to find the text documents corresponding to the URIs.

Here's a description of the directory structure:
/dat contains the gigantic CSV
/directory1 contains thousands of text documents that are described by the CSV
/directory2 contains additional documents described by the CSV
/directory3... and so on

Here are the steps to my flow:

1) The first List-Fetch File you will find in the first Process Group named "Find and Read DATFile". The DAT file reads the CSV that contains hundreds of thousands of lines.

2) The Split DATFile Process Group chunks the CSV file into individual FlowFiles.

3) In the Clean/Extract Metadata Process Group, we have to use regular expressions via ExtractText to write the metadata to FlowFile attributes, to then use AttributesToJSON and then store those JSON documents to Solr. The Processors in this group use regular expressions to clean and validate the later generated JSON document.

4) The Read Extracted Text Process Group contains the second FetchFile that reads in files according to the URIs listed in the CSV. This is where the read/write speed dips ("NiFi moves to a crawl") once 20-30 MB of text files have been read through the second FetchFile.

5) The Store in Solr Process Group batches up JSON documents and stores them to SolrCloud.

Document_Ingestion_Redacted.xml