Ideal hardware for NiFi

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Ideal hardware for NiFi

Phil H
Hi all,

I've been asked to spec some hardware for a NiFi installation. Does anyone
have any advice? My gut feel is lots of processor cores and RAM, with less
emphasis on storage (small fast disks). Are there any limitations on how
many cores the JRE/NiFi can actually make use of, or any other
considerations like that I should be aware of?

Most likely will be pairs of servers in a cluster, but again any advice to
the contrary would be appreciated.

Cheers,
Phil
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Sivaprasanna Sethuraman
Phil,

The hardware requirements are driven by the nature of the dataflow you are
developing. If you're looking to play around with NiFi and gain some
hands-on experience, go for a 4 core 8GB RAM i.e. any modern
laptops/computer would do the job. In my case, where I'm having 100s of
dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and 4(8)
cores. I went with SSDs of smaller size because my flows are involved in
writing to object stores like Google Cloud Storage, Azure Blob and Amazon
S3 and NoSQL DBs. Hope this helps.

-
Sivaprasanna

On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:

> Hi all,
>
> I've been asked to spec some hardware for a NiFi installation. Does anyone
> have any advice? My gut feel is lots of processor cores and RAM, with less
> emphasis on storage (small fast disks). Are there any limitations on how
> many cores the JRE/NiFi can actually make use of, or any other
> considerations like that I should be aware of?
>
> Most likely will be pairs of servers in a cluster, but again any advice to
> the contrary would be appreciated.
>
> Cheers,
> Phil
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
Thanks for that. Sorry I should have been more specific - we have a flow
running already on non-dedicated hardware. Looking to identify any
limitations in NiFi/JVM that would limit how much parallelism it can take
advantage of

On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
wrote:

> Phil,
>
> The hardware requirements are driven by the nature of the dataflow you are
> developing. If you're looking to play around with NiFi and gain some
> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> laptops/computer would do the job. In my case, where I'm having 100s of
> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and 4(8)
> cores. I went with SSDs of smaller size because my flows are involved in
> writing to object stores like Google Cloud Storage, Azure Blob and Amazon
> S3 and NoSQL DBs. Hope this helps.
>
> -
> Sivaprasanna
>
> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
>
> > Hi all,
> >
> > I've been asked to spec some hardware for a NiFi installation. Does
> anyone
> > have any advice? My gut feel is lots of processor cores and RAM, with
> less
> > emphasis on storage (small fast disks). Are there any limitations on how
> > many cores the JRE/NiFi can actually make use of, or any other
> > considerations like that I should be aware of?
> >
> > Most likely will be pairs of servers in a cluster, but again any advice
> to
> > the contrary would be appreciated.
> >
> > Cheers,
> > Phil
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Mark Payne
Phil,

As Sivaprasanna mentioned, your bottleneck will certainly depend on your flow.
There's nothing inherent about NiFi or the JVM, AFAIK that would limit you. I've
seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare metal
on servers containing 96+ cores. Most often, I see people with a lot of CPU cores
but insufficient disk, so if you're running several cores ensure that you're using
SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi does a good
job of spanning the content and FlowFile repositories across multiple disks to take
full advantage of the hardware, and scales the CPU vertically by way of multiple
Processors and multiple concurrent tasks (threads) on a given Processor.

It really comes down to what you're doing in your flow, though. If you've got 96 cores and
you're trying to perform 5 dozen transformations against a large number of FlowFiles
but have only a single spinning disk, then those 96 cores will likely go to waste, because
your disk will bottleneck you.

Likewise, if you have 10 SSD's and only 8 cores you're likely going to waste a lot of
disk because you won't have the CPU needed to reach the disks' full potential.
 So you'll need to strike the correct balance for your use case.Since you have the
flow running right now, I would recommend looking at things like `top` and `iostat` in order
to understand if you're reaching your limit on CPU, disk, etc.

As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for the heap. However,
more RAM means that your operating system can make better use of disk caching, which
can certainly speed things up, especially if you're reading the content several times for
each FlowFile.

Does this help at all?

Thanks
-Mark


> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
>
> Thanks for that. Sorry I should have been more specific - we have a flow
> running already on non-dedicated hardware. Looking to identify any
> limitations in NiFi/JVM that would limit how much parallelism it can take
> advantage of
>
> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
> wrote:
>
>> Phil,
>>
>> The hardware requirements are driven by the nature of the dataflow you are
>> developing. If you're looking to play around with NiFi and gain some
>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
>> laptops/computer would do the job. In my case, where I'm having 100s of
>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and 4(8)
>> cores. I went with SSDs of smaller size because my flows are involved in
>> writing to object stores like Google Cloud Storage, Azure Blob and Amazon
>> S3 and NoSQL DBs. Hope this helps.
>>
>> -
>> Sivaprasanna
>>
>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
>>
>>> Hi all,
>>>
>>> I've been asked to spec some hardware for a NiFi installation. Does
>> anyone
>>> have any advice? My gut feel is lots of processor cores and RAM, with
>> less
>>> emphasis on storage (small fast disks). Are there any limitations on how
>>> many cores the JRE/NiFi can actually make use of, or any other
>>> considerations like that I should be aware of?
>>>
>>> Most likely will be pairs of servers in a cluster, but again any advice
>> to
>>> the contrary would be appreciated.
>>>
>>> Cheers,
>>> Phil
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
Thanks Mark, this is great advice.

Disk access is certainly an issue with the current set up. I will certainly
shoot for NVMe disks in the build. How does NiFi get configured to span
it's repositories across multiple physical disks?

Thanks,
Phil

On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]> wrote:

> Phil,
>
> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
> flow.
> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
> you. I've
> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
> metal
> on servers containing 96+ cores. Most often, I see people with a lot of
> CPU cores
> but insufficient disk, so if you're running several cores ensure that
> you're using
> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi does
> a good
> job of spanning the content and FlowFile repositories across multiple
> disks to take
> full advantage of the hardware, and scales the CPU vertically by way of
> multiple
> Processors and multiple concurrent tasks (threads) on a given Processor.
>
> It really comes down to what you're doing in your flow, though. If you've
> got 96 cores and
> you're trying to perform 5 dozen transformations against a large number of
> FlowFiles
> but have only a single spinning disk, then those 96 cores will likely go
> to waste, because
> your disk will bottleneck you.
>
> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
> waste a lot of
> disk because you won't have the CPU needed to reach the disks' full
> potential.
>  So you'll need to strike the correct balance for your use case.Since you
> have the
> flow running right now, I would recommend looking at things like `top` and
> `iostat` in order
> to understand if you're reaching your limit on CPU, disk, etc.
>
> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
> the heap. However,
> more RAM means that your operating system can make better use of disk
> caching, which
> can certainly speed things up, especially if you're reading the content
> several times for
> each FlowFile.
>
> Does this help at all?
>
> Thanks
> -Mark
>
>
> > On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> >
> > Thanks for that. Sorry I should have been more specific - we have a flow
> > running already on non-dedicated hardware. Looking to identify any
> > limitations in NiFi/JVM that would limit how much parallelism it can take
> > advantage of
> >
> > On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
> > wrote:
> >
> >> Phil,
> >>
> >> The hardware requirements are driven by the nature of the dataflow you
> are
> >> developing. If you're looking to play around with NiFi and gain some
> >> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> >> laptops/computer would do the job. In my case, where I'm having 100s of
> >> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
> 4(8)
> >> cores. I went with SSDs of smaller size because my flows are involved in
> >> writing to object stores like Google Cloud Storage, Azure Blob and
> Amazon
> >> S3 and NoSQL DBs. Hope this helps.
> >>
> >> -
> >> Sivaprasanna
> >>
> >> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I've been asked to spec some hardware for a NiFi installation. Does
> >> anyone
> >>> have any advice? My gut feel is lots of processor cores and RAM, with
> >> less
> >>> emphasis on storage (small fast disks). Are there any limitations on
> how
> >>> many cores the JRE/NiFi can actually make use of, or any other
> >>> considerations like that I should be aware of?
> >>>
> >>> Most likely will be pairs of servers in a cluster, but again any advice
> >> to
> >>> the contrary would be appreciated.
> >>>
> >>> Cheers,
> >>> Phil
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Mark Payne
Phil,

For the content repository, you can configure the directory by changing the value of
the "nifi.content.repository.directory.default" property in nifi.properties. The suffix here,
"default" is the name of this "container". You can have multiple containers by adding extra
properties. So, for example, you could set:

nifi.content.repository.directory.content1=/nifi/repos/content-1
nifi.content.repository.directory.content2=/nifi/repos/content-2
nifi.content.repository.directory.content3=/nifi/repos/content-3
nifi.content.repository.directory.content4=/nifi/repos/content-4

Similarly, the Provenance Repo property is named "nifi.provenance.repository.directory.default"
and can have any number of "containers":

nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4

When NiFi writes to these, it does a Round Robin so that if you're writing to 4 Flow Files'
content simultaneously with different threads, you're able to get the full throughput of each
disk. (So if you have 4 disks for your content repo, each capable of writing 100 MB/sec, then
your effective write rate to the content repo is 400 MB/sec). Similar with Provenance Repository.

Doing this also will allow you to hold a larger 'archive' of content and provenance data, because
it will span the archive across all of the listed directories, as well.

Thanks
-Mark



> On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
>
> Thanks Mark, this is great advice.
>
> Disk access is certainly an issue with the current set up. I will certainly
> shoot for NVMe disks in the build. How does NiFi get configured to span
> it's repositories across multiple physical disks?
>
> Thanks,
> Phil
>
> On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]> wrote:
>
>> Phil,
>>
>> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
>> flow.
>> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
>> you. I've
>> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
>> metal
>> on servers containing 96+ cores. Most often, I see people with a lot of
>> CPU cores
>> but insufficient disk, so if you're running several cores ensure that
>> you're using
>> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi does
>> a good
>> job of spanning the content and FlowFile repositories across multiple
>> disks to take
>> full advantage of the hardware, and scales the CPU vertically by way of
>> multiple
>> Processors and multiple concurrent tasks (threads) on a given Processor.
>>
>> It really comes down to what you're doing in your flow, though. If you've
>> got 96 cores and
>> you're trying to perform 5 dozen transformations against a large number of
>> FlowFiles
>> but have only a single spinning disk, then those 96 cores will likely go
>> to waste, because
>> your disk will bottleneck you.
>>
>> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
>> waste a lot of
>> disk because you won't have the CPU needed to reach the disks' full
>> potential.
>> So you'll need to strike the correct balance for your use case.Since you
>> have the
>> flow running right now, I would recommend looking at things like `top` and
>> `iostat` in order
>> to understand if you're reaching your limit on CPU, disk, etc.
>>
>> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
>> the heap. However,
>> more RAM means that your operating system can make better use of disk
>> caching, which
>> can certainly speed things up, especially if you're reading the content
>> several times for
>> each FlowFile.
>>
>> Does this help at all?
>>
>> Thanks
>> -Mark
>>
>>
>>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
>>>
>>> Thanks for that. Sorry I should have been more specific - we have a flow
>>> running already on non-dedicated hardware. Looking to identify any
>>> limitations in NiFi/JVM that would limit how much parallelism it can take
>>> advantage of
>>>
>>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
>>> wrote:
>>>
>>>> Phil,
>>>>
>>>> The hardware requirements are driven by the nature of the dataflow you
>> are
>>>> developing. If you're looking to play around with NiFi and gain some
>>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
>>>> laptops/computer would do the job. In my case, where I'm having 100s of
>>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
>> 4(8)
>>>> cores. I went with SSDs of smaller size because my flows are involved in
>>>> writing to object stores like Google Cloud Storage, Azure Blob and
>> Amazon
>>>> S3 and NoSQL DBs. Hope this helps.
>>>>
>>>> -
>>>> Sivaprasanna
>>>>
>>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've been asked to spec some hardware for a NiFi installation. Does
>>>> anyone
>>>>> have any advice? My gut feel is lots of processor cores and RAM, with
>>>> less
>>>>> emphasis on storage (small fast disks). Are there any limitations on
>> how
>>>>> many cores the JRE/NiFi can actually make use of, or any other
>>>>> considerations like that I should be aware of?
>>>>>
>>>>> Most likely will be pairs of servers in a cluster, but again any advice
>>>> to
>>>>> the contrary would be appreciated.
>>>>>
>>>>> Cheers,
>>>>> Phil
>>>>>
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
Fantastic, thanks Mark

On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]> wrote:

> Phil,
>
> For the content repository, you can configure the directory by changing
> the value of
> the "nifi.content.repository.directory.default" property in
> nifi.properties. The suffix here,
> "default" is the name of this "container". You can have multiple
> containers by adding extra
> properties. So, for example, you could set:
>
> nifi.content.repository.directory.content1=/nifi/repos/content-1
> nifi.content.repository.directory.content2=/nifi/repos/content-2
> nifi.content.repository.directory.content3=/nifi/repos/content-3
> nifi.content.repository.directory.content4=/nifi/repos/content-4
>
> Similarly, the Provenance Repo property is named
> "nifi.provenance.repository.directory.default"
> and can have any number of "containers":
>
> nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
>
> When NiFi writes to these, it does a Round Robin so that if you're writing
> to 4 Flow Files'
> content simultaneously with different threads, you're able to get the full
> throughput of each
> disk. (So if you have 4 disks for your content repo, each capable of
> writing 100 MB/sec, then
> your effective write rate to the content repo is 400 MB/sec). Similar with
> Provenance Repository.
>
> Doing this also will allow you to hold a larger 'archive' of content and
> provenance data, because
> it will span the archive across all of the listed directories, as well.
>
> Thanks
> -Mark
>
>
>
> > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> >
> > Thanks Mark, this is great advice.
> >
> > Disk access is certainly an issue with the current set up. I will
> certainly
> > shoot for NVMe disks in the build. How does NiFi get configured to span
> > it's repositories across multiple physical disks?
> >
> > Thanks,
> > Phil
> >
> > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]> wrote:
> >
> >> Phil,
> >>
> >> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
> >> flow.
> >> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
> >> you. I've
> >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
> >> metal
> >> on servers containing 96+ cores. Most often, I see people with a lot of
> >> CPU cores
> >> but insufficient disk, so if you're running several cores ensure that
> >> you're using
> >> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi
> does
> >> a good
> >> job of spanning the content and FlowFile repositories across multiple
> >> disks to take
> >> full advantage of the hardware, and scales the CPU vertically by way of
> >> multiple
> >> Processors and multiple concurrent tasks (threads) on a given Processor.
> >>
> >> It really comes down to what you're doing in your flow, though. If
> you've
> >> got 96 cores and
> >> you're trying to perform 5 dozen transformations against a large number
> of
> >> FlowFiles
> >> but have only a single spinning disk, then those 96 cores will likely go
> >> to waste, because
> >> your disk will bottleneck you.
> >>
> >> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
> >> waste a lot of
> >> disk because you won't have the CPU needed to reach the disks' full
> >> potential.
> >> So you'll need to strike the correct balance for your use case.Since you
> >> have the
> >> flow running right now, I would recommend looking at things like `top`
> and
> >> `iostat` in order
> >> to understand if you're reaching your limit on CPU, disk, etc.
> >>
> >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
> >> the heap. However,
> >> more RAM means that your operating system can make better use of disk
> >> caching, which
> >> can certainly speed things up, especially if you're reading the content
> >> several times for
> >> each FlowFile.
> >>
> >> Does this help at all?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> >>>
> >>> Thanks for that. Sorry I should have been more specific - we have a
> flow
> >>> running already on non-dedicated hardware. Looking to identify any
> >>> limitations in NiFi/JVM that would limit how much parallelism it can
> take
> >>> advantage of
> >>>
> >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
> >>> wrote:
> >>>
> >>>> Phil,
> >>>>
> >>>> The hardware requirements are driven by the nature of the dataflow you
> >> are
> >>>> developing. If you're looking to play around with NiFi and gain some
> >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> >>>> laptops/computer would do the job. In my case, where I'm having 100s
> of
> >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
> >> 4(8)
> >>>> cores. I went with SSDs of smaller size because my flows are involved
> in
> >>>> writing to object stores like Google Cloud Storage, Azure Blob and
> >> Amazon
> >>>> S3 and NoSQL DBs. Hope this helps.
> >>>>
> >>>> -
> >>>> Sivaprasanna
> >>>>
> >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I've been asked to spec some hardware for a NiFi installation. Does
> >>>> anyone
> >>>>> have any advice? My gut feel is lots of processor cores and RAM, with
> >>>> less
> >>>>> emphasis on storage (small fast disks). Are there any limitations on
> >> how
> >>>>> many cores the JRE/NiFi can actually make use of, or any other
> >>>>> considerations like that I should be aware of?
> >>>>>
> >>>>> Most likely will be pairs of servers in a cluster, but again any
> advice
> >>>> to
> >>>>> the contrary would be appreciated.
> >>>>>
> >>>>> Cheers,
> >>>>> Phil
> >>>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
In reply to this post by Mark Payne
Follow up question - how do I transition to this new structure? Should I
shut down NiFi and move the contents of the legacy single directories into
one of the new ones? For example:

mv /usr/nifi/content_repository
/nifi/repos/content-1

TIA
Phil


On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]> wrote:

> Phil,
>
> For the content repository, you can configure the directory by changing
> the value of
> the "nifi.content.repository.directory.default" property in
> nifi.properties. The suffix here,
> "default" is the name of this "container". You can have multiple
> containers by adding extra
> properties. So, for example, you could set:
>
> nifi.content.repository.directory.content1=
> /nifi/repos/content-1
>
> nifi.content.repository.directory.content2=/nifi/repos/content-2
> nifi.content.repository.directory.content3=/nifi/repos/content-3
> nifi.content.repository.directory.content4=/nifi/repos/content-4
>
> Similarly, the Provenance Repo property is named
> "nifi.provenance.repository.directory.default"
> and can have any number of "containers":
>
> nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
>
> When NiFi writes to these, it does a Round Robin so that if you're writing
> to 4 Flow Files'
> content simultaneously with different threads, you're able to get the full
> throughput of each
> disk. (So if you have 4 disks for your content repo, each capable of
> writing 100 MB/sec, then
> your effective write rate to the content repo is 400 MB/sec). Similar with
> Provenance Repository.
>
> Doing this also will allow you to hold a larger 'archive' of content and
> provenance data, because
> it will span the archive across all of the listed directories, as well.
>
> Thanks
> -Mark
>
>
>
> > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> >
> > Thanks Mark, this is great advice.
> >
> > Disk access is certainly an issue with the current set up. I will
> certainly
> > shoot for NVMe disks in the build. How does NiFi get configured to span
> > it's repositories across multiple physical disks?
> >
> > Thanks,
> > Phil
> >
> > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]> wrote:
> >
> >> Phil,
> >>
> >> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
> >> flow.
> >> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
> >> you. I've
> >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
> >> metal
> >> on servers containing 96+ cores. Most often, I see people with a lot of
> >> CPU cores
> >> but insufficient disk, so if you're running several cores ensure that
> >> you're using
> >> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi
> does
> >> a good
> >> job of spanning the content and FlowFile repositories across multiple
> >> disks to take
> >> full advantage of the hardware, and scales the CPU vertically by way of
> >> multiple
> >> Processors and multiple concurrent tasks (threads) on a given Processor.
> >>
> >> It really comes down to what you're doing in your flow, though. If
> you've
> >> got 96 cores and
> >> you're trying to perform 5 dozen transformations against a large number
> of
> >> FlowFiles
> >> but have only a single spinning disk, then those 96 cores will likely go
> >> to waste, because
> >> your disk will bottleneck you.
> >>
> >> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
> >> waste a lot of
> >> disk because you won't have the CPU needed to reach the disks' full
> >> potential.
> >> So you'll need to strike the correct balance for your use case.Since you
> >> have the
> >> flow running right now, I would recommend looking at things like `top`
> and
> >> `iostat` in order
> >> to understand if you're reaching your limit on CPU, disk, etc.
> >>
> >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
> >> the heap. However,
> >> more RAM means that your operating system can make better use of disk
> >> caching, which
> >> can certainly speed things up, especially if you're reading the content
> >> several times for
> >> each FlowFile.
> >>
> >> Does this help at all?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> >>>
> >>> Thanks for that. Sorry I should have been more specific - we have a
> flow
> >>> running already on non-dedicated hardware. Looking to identify any
> >>> limitations in NiFi/JVM that would limit how much parallelism it can
> take
> >>> advantage of
> >>>
> >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <[hidden email]>
> >>> wrote:
> >>>
> >>>> Phil,
> >>>>
> >>>> The hardware requirements are driven by the nature of the dataflow you
> >> are
> >>>> developing. If you're looking to play around with NiFi and gain some
> >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> >>>> laptops/computer would do the job. In my case, where I'm having 100s
> of
> >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
> >> 4(8)
> >>>> cores. I went with SSDs of smaller size because my flows are involved
> in
> >>>> writing to object stores like Google Cloud Storage, Azure Blob and
> >> Amazon
> >>>> S3 and NoSQL DBs. Hope this helps.
> >>>>
> >>>> -
> >>>> Sivaprasanna
> >>>>
> >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I've been asked to spec some hardware for a NiFi installation. Does
> >>>> anyone
> >>>>> have any advice? My gut feel is lots of processor cores and RAM, with
> >>>> less
> >>>>> emphasis on storage (small fast disks). Are there any limitations on
> >> how
> >>>>> many cores the JRE/NiFi can actually make use of, or any other
> >>>>> considerations like that I should be aware of?
> >>>>>
> >>>>> Most likely will be pairs of servers in a cluster, but again any
> advice
> >>>> to
> >>>>> the contrary would be appreciated.
> >>>>>
> >>>>> Cheers,
> >>>>> Phil
> >>>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Joe Witt
phil,

as you add dirs it will just start using them.  if you want to no longer
use the current dir it might be more involved.

does that help?

thanks

On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:

> Follow up question - how do I transition to this new structure? Should I
> shut down NiFi and move the contents of the legacy single directories into
> one of the new ones? For example:
>
> mv /usr/nifi/content_repository
> /nifi/repos/content-1
>
> TIA
> Phil
>
>
> On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]> wrote:
>
> > Phil,
> >
> > For the content repository, you can configure the directory by changing
> > the value of
> > the "nifi.content.repository.directory.default" property in
> > nifi.properties. The suffix here,
> > "default" is the name of this "container". You can have multiple
> > containers by adding extra
> > properties. So, for example, you could set:
> >
> > nifi.content.repository.directory.content1=
> > /nifi/repos/content-1
> >
> > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > nifi.content.repository.directory.content4=/nifi/repos/content-4
> >
> > Similarly, the Provenance Repo property is named
> > "nifi.provenance.repository.directory.default"
> > and can have any number of "containers":
> >
> > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> >
> > When NiFi writes to these, it does a Round Robin so that if you're
> writing
> > to 4 Flow Files'
> > content simultaneously with different threads, you're able to get the
> full
> > throughput of each
> > disk. (So if you have 4 disks for your content repo, each capable of
> > writing 100 MB/sec, then
> > your effective write rate to the content repo is 400 MB/sec). Similar
> with
> > Provenance Repository.
> >
> > Doing this also will allow you to hold a larger 'archive' of content and
> > provenance data, because
> > it will span the archive across all of the listed directories, as well.
> >
> > Thanks
> > -Mark
> >
> >
> >
> > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> > >
> > > Thanks Mark, this is great advice.
> > >
> > > Disk access is certainly an issue with the current set up. I will
> > certainly
> > > shoot for NVMe disks in the build. How does NiFi get configured to span
> > > it's repositories across multiple physical disks?
> > >
> > > Thanks,
> > > Phil
> > >
> > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]> wrote:
> > >
> > >> Phil,
> > >>
> > >> As Sivaprasanna mentioned, your bottleneck will certainly depend on
> your
> > >> flow.
> > >> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
> > >> you. I've
> > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on
> bare
> > >> metal
> > >> on servers containing 96+ cores. Most often, I see people with a lot
> of
> > >> CPU cores
> > >> but insufficient disk, so if you're running several cores ensure that
> > >> you're using
> > >> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi
> > does
> > >> a good
> > >> job of spanning the content and FlowFile repositories across multiple
> > >> disks to take
> > >> full advantage of the hardware, and scales the CPU vertically by way
> of
> > >> multiple
> > >> Processors and multiple concurrent tasks (threads) on a given
> Processor.
> > >>
> > >> It really comes down to what you're doing in your flow, though. If
> > you've
> > >> got 96 cores and
> > >> you're trying to perform 5 dozen transformations against a large
> number
> > of
> > >> FlowFiles
> > >> but have only a single spinning disk, then those 96 cores will likely
> go
> > >> to waste, because
> > >> your disk will bottleneck you.
> > >>
> > >> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
> > >> waste a lot of
> > >> disk because you won't have the CPU needed to reach the disks' full
> > >> potential.
> > >> So you'll need to strike the correct balance for your use case.Since
> you
> > >> have the
> > >> flow running right now, I would recommend looking at things like `top`
> > and
> > >> `iostat` in order
> > >> to understand if you're reaching your limit on CPU, disk, etc.
> > >>
> > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram
> for
> > >> the heap. However,
> > >> more RAM means that your operating system can make better use of disk
> > >> caching, which
> > >> can certainly speed things up, especially if you're reading the
> content
> > >> several times for
> > >> each FlowFile.
> > >>
> > >> Does this help at all?
> > >>
> > >> Thanks
> > >> -Mark
> > >>
> > >>
> > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> > >>>
> > >>> Thanks for that. Sorry I should have been more specific - we have a
> > flow
> > >>> running already on non-dedicated hardware. Looking to identify any
> > >>> limitations in NiFi/JVM that would limit how much parallelism it can
> > take
> > >>> advantage of
> > >>>
> > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> [hidden email]>
> > >>> wrote:
> > >>>
> > >>>> Phil,
> > >>>>
> > >>>> The hardware requirements are driven by the nature of the dataflow
> you
> > >> are
> > >>>> developing. If you're looking to play around with NiFi and gain some
> > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > >>>> laptops/computer would do the job. In my case, where I'm having 100s
> > of
> > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM
> and
> > >> 4(8)
> > >>>> cores. I went with SSDs of smaller size because my flows are
> involved
> > in
> > >>>> writing to object stores like Google Cloud Storage, Azure Blob and
> > >> Amazon
> > >>>> S3 and NoSQL DBs. Hope this helps.
> > >>>>
> > >>>> -
> > >>>> Sivaprasanna
> > >>>>
> > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I've been asked to spec some hardware for a NiFi installation. Does
> > >>>> anyone
> > >>>>> have any advice? My gut feel is lots of processor cores and RAM,
> with
> > >>>> less
> > >>>>> emphasis on storage (small fast disks). Are there any limitations
> on
> > >> how
> > >>>>> many cores the JRE/NiFi can actually make use of, or any other
> > >>>>> considerations like that I should be aware of?
> > >>>>>
> > >>>>> Most likely will be pairs of servers in a cluster, but again any
> > advice
> > >>>> to
> > >>>>> the contrary would be appreciated.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Phil
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
Potentially. We're looking to see how the multiple disks help before
committing to spending money on new hardware :)

On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:

> phil,
>
> as you add dirs it will just start using them.  if you want to no longer
> use the current dir it might be more involved.
>
> does that help?
>
> thanks
>
> On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
>
> > Follow up question - how do I transition to this new structure? Should I
> > shut down NiFi and move the contents of the legacy single directories
> into
> > one of the new ones? For example:
> >
> > mv /usr/nifi/content_repository
> > /nifi/repos/content-1
> >
> > TIA
> > Phil
> >
> >
> > On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]> wrote:
> >
> > > Phil,
> > >
> > > For the content repository, you can configure the directory by changing
> > > the value of
> > > the "nifi.content.repository.directory.default" property in
> > > nifi.properties. The suffix here,
> > > "default" is the name of this "container". You can have multiple
> > > containers by adding extra
> > > properties. So, for example, you could set:
> > >
> > > nifi.content.repository.directory.content1=
> > > /nifi/repos/content-1
> > >
> > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > >
> > > Similarly, the Provenance Repo property is named
> > > "nifi.provenance.repository.directory.default"
> > > and can have any number of "containers":
> > >
> > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > >
> > > When NiFi writes to these, it does a Round Robin so that if you're
> > writing
> > > to 4 Flow Files'
> > > content simultaneously with different threads, you're able to get the
> > full
> > > throughput of each
> > > disk. (So if you have 4 disks for your content repo, each capable of
> > > writing 100 MB/sec, then
> > > your effective write rate to the content repo is 400 MB/sec). Similar
> > with
> > > Provenance Repository.
> > >
> > > Doing this also will allow you to hold a larger 'archive' of content
> and
> > > provenance data, because
> > > it will span the archive across all of the listed directories, as well.
> > >
> > > Thanks
> > > -Mark
> > >
> > >
> > >
> > > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> > > >
> > > > Thanks Mark, this is great advice.
> > > >
> > > > Disk access is certainly an issue with the current set up. I will
> > > certainly
> > > > shoot for NVMe disks in the build. How does NiFi get configured to
> span
> > > > it's repositories across multiple physical disks?
> > > >
> > > > Thanks,
> > > > Phil
> > > >
> > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]>
> wrote:
> > > >
> > > >> Phil,
> > > >>
> > > >> As Sivaprasanna mentioned, your bottleneck will certainly depend on
> > your
> > > >> flow.
> > > >> There's nothing inherent about NiFi or the JVM, AFAIK that would
> limit
> > > >> you. I've
> > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on
> > bare
> > > >> metal
> > > >> on servers containing 96+ cores. Most often, I see people with a lot
> > of
> > > >> CPU cores
> > > >> but insufficient disk, so if you're running several cores ensure
> that
> > > >> you're using
> > > >> SSD's / NVMe's or enough spinning disks to accommodate the flow.
> NiFi
> > > does
> > > >> a good
> > > >> job of spanning the content and FlowFile repositories across
> multiple
> > > >> disks to take
> > > >> full advantage of the hardware, and scales the CPU vertically by way
> > of
> > > >> multiple
> > > >> Processors and multiple concurrent tasks (threads) on a given
> > Processor.
> > > >>
> > > >> It really comes down to what you're doing in your flow, though. If
> > > you've
> > > >> got 96 cores and
> > > >> you're trying to perform 5 dozen transformations against a large
> > number
> > > of
> > > >> FlowFiles
> > > >> but have only a single spinning disk, then those 96 cores will
> likely
> > go
> > > >> to waste, because
> > > >> your disk will bottleneck you.
> > > >>
> > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely going
> to
> > > >> waste a lot of
> > > >> disk because you won't have the CPU needed to reach the disks' full
> > > >> potential.
> > > >> So you'll need to strike the correct balance for your use case.Since
> > you
> > > >> have the
> > > >> flow running right now, I would recommend looking at things like
> `top`
> > > and
> > > >> `iostat` in order
> > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > >>
> > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram
> > for
> > > >> the heap. However,
> > > >> more RAM means that your operating system can make better use of
> disk
> > > >> caching, which
> > > >> can certainly speed things up, especially if you're reading the
> > content
> > > >> several times for
> > > >> each FlowFile.
> > > >>
> > > >> Does this help at all?
> > > >>
> > > >> Thanks
> > > >> -Mark
> > > >>
> > > >>
> > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> > > >>>
> > > >>> Thanks for that. Sorry I should have been more specific - we have a
> > > flow
> > > >>> running already on non-dedicated hardware. Looking to identify any
> > > >>> limitations in NiFi/JVM that would limit how much parallelism it
> can
> > > take
> > > >>> advantage of
> > > >>>
> > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > [hidden email]>
> > > >>> wrote:
> > > >>>
> > > >>>> Phil,
> > > >>>>
> > > >>>> The hardware requirements are driven by the nature of the dataflow
> > you
> > > >> are
> > > >>>> developing. If you're looking to play around with NiFi and gain
> some
> > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > >>>> laptops/computer would do the job. In my case, where I'm having
> 100s
> > > of
> > > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM
> > and
> > > >> 4(8)
> > > >>>> cores. I went with SSDs of smaller size because my flows are
> > involved
> > > in
> > > >>>> writing to object stores like Google Cloud Storage, Azure Blob and
> > > >> Amazon
> > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > >>>>
> > > >>>> -
> > > >>>> Sivaprasanna
> > > >>>>
> > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]>
> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I've been asked to spec some hardware for a NiFi installation.
> Does
> > > >>>> anyone
> > > >>>>> have any advice? My gut feel is lots of processor cores and RAM,
> > with
> > > >>>> less
> > > >>>>> emphasis on storage (small fast disks). Are there any limitations
> > on
> > > >> how
> > > >>>>> many cores the JRE/NiFi can actually make use of, or any other
> > > >>>>> considerations like that I should be aware of?
> > > >>>>>
> > > >>>>> Most likely will be pairs of servers in a cluster, but again any
> > > advice
> > > >>>> to
> > > >>>>> the contrary would be appreciated.
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>> Phil
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Joe Witt
if they are physically seperate the diff should be quite noticable.

On Thu, Sep 13, 2018, 7:36 PM Phil H <[hidden email]> wrote:

> Potentially. We're looking to see how the multiple disks help before
> committing to spending money on new hardware :)
>
> On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:
>
> > phil,
> >
> > as you add dirs it will just start using them.  if you want to no longer
> > use the current dir it might be more involved.
> >
> > does that help?
> >
> > thanks
> >
> > On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
> >
> > > Follow up question - how do I transition to this new structure? Should
> I
> > > shut down NiFi and move the contents of the legacy single directories
> > into
> > > one of the new ones? For example:
> > >
> > > mv /usr/nifi/content_repository
> > > /nifi/repos/content-1
> > >
> > > TIA
> > > Phil
> > >
> > >
> > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]> wrote:
> > >
> > > > Phil,
> > > >
> > > > For the content repository, you can configure the directory by
> changing
> > > > the value of
> > > > the "nifi.content.repository.directory.default" property in
> > > > nifi.properties. The suffix here,
> > > > "default" is the name of this "container". You can have multiple
> > > > containers by adding extra
> > > > properties. So, for example, you could set:
> > > >
> > > > nifi.content.repository.directory.content1=
> > > > /nifi/repos/content-1
> > > >
> > > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > > >
> > > > Similarly, the Provenance Repo property is named
> > > > "nifi.provenance.repository.directory.default"
> > > > and can have any number of "containers":
> > > >
> > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > > >
> > > > When NiFi writes to these, it does a Round Robin so that if you're
> > > writing
> > > > to 4 Flow Files'
> > > > content simultaneously with different threads, you're able to get the
> > > full
> > > > throughput of each
> > > > disk. (So if you have 4 disks for your content repo, each capable of
> > > > writing 100 MB/sec, then
> > > > your effective write rate to the content repo is 400 MB/sec). Similar
> > > with
> > > > Provenance Repository.
> > > >
> > > > Doing this also will allow you to hold a larger 'archive' of content
> > and
> > > > provenance data, because
> > > > it will span the archive across all of the listed directories, as
> well.
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > >
> > > >
> > > > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> > > > >
> > > > > Thanks Mark, this is great advice.
> > > > >
> > > > > Disk access is certainly an issue with the current set up. I will
> > > > certainly
> > > > > shoot for NVMe disks in the build. How does NiFi get configured to
> > span
> > > > > it's repositories across multiple physical disks?
> > > > >
> > > > > Thanks,
> > > > > Phil
> > > > >
> > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]>
> > wrote:
> > > > >
> > > > >> Phil,
> > > > >>
> > > > >> As Sivaprasanna mentioned, your bottleneck will certainly depend
> on
> > > your
> > > > >> flow.
> > > > >> There's nothing inherent about NiFi or the JVM, AFAIK that would
> > limit
> > > > >> you. I've
> > > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run
> on
> > > bare
> > > > >> metal
> > > > >> on servers containing 96+ cores. Most often, I see people with a
> lot
> > > of
> > > > >> CPU cores
> > > > >> but insufficient disk, so if you're running several cores ensure
> > that
> > > > >> you're using
> > > > >> SSD's / NVMe's or enough spinning disks to accommodate the flow.
> > NiFi
> > > > does
> > > > >> a good
> > > > >> job of spanning the content and FlowFile repositories across
> > multiple
> > > > >> disks to take
> > > > >> full advantage of the hardware, and scales the CPU vertically by
> way
> > > of
> > > > >> multiple
> > > > >> Processors and multiple concurrent tasks (threads) on a given
> > > Processor.
> > > > >>
> > > > >> It really comes down to what you're doing in your flow, though. If
> > > > you've
> > > > >> got 96 cores and
> > > > >> you're trying to perform 5 dozen transformations against a large
> > > number
> > > > of
> > > > >> FlowFiles
> > > > >> but have only a single spinning disk, then those 96 cores will
> > likely
> > > go
> > > > >> to waste, because
> > > > >> your disk will bottleneck you.
> > > > >>
> > > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely
> going
> > to
> > > > >> waste a lot of
> > > > >> disk because you won't have the CPU needed to reach the disks'
> full
> > > > >> potential.
> > > > >> So you'll need to strike the correct balance for your use
> case.Since
> > > you
> > > > >> have the
> > > > >> flow running right now, I would recommend looking at things like
> > `top`
> > > > and
> > > > >> `iostat` in order
> > > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > > >>
> > > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of
> ram
> > > for
> > > > >> the heap. However,
> > > > >> more RAM means that your operating system can make better use of
> > disk
> > > > >> caching, which
> > > > >> can certainly speed things up, especially if you're reading the
> > > content
> > > > >> several times for
> > > > >> each FlowFile.
> > > > >>
> > > > >> Does this help at all?
> > > > >>
> > > > >> Thanks
> > > > >> -Mark
> > > > >>
> > > > >>
> > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]> wrote:
> > > > >>>
> > > > >>> Thanks for that. Sorry I should have been more specific - we
> have a
> > > > flow
> > > > >>> running already on non-dedicated hardware. Looking to identify
> any
> > > > >>> limitations in NiFi/JVM that would limit how much parallelism it
> > can
> > > > take
> > > > >>> advantage of
> > > > >>>
> > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > > [hidden email]>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Phil,
> > > > >>>>
> > > > >>>> The hardware requirements are driven by the nature of the
> dataflow
> > > you
> > > > >> are
> > > > >>>> developing. If you're looking to play around with NiFi and gain
> > some
> > > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > > >>>> laptops/computer would do the job. In my case, where I'm having
> > 100s
> > > > of
> > > > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB
> RAM
> > > and
> > > > >> 4(8)
> > > > >>>> cores. I went with SSDs of smaller size because my flows are
> > > involved
> > > > in
> > > > >>>> writing to object stores like Google Cloud Storage, Azure Blob
> and
> > > > >> Amazon
> > > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > > >>>>
> > > > >>>> -
> > > > >>>> Sivaprasanna
> > > > >>>>
> > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]>
> > wrote:
> > > > >>>>
> > > > >>>>> Hi all,
> > > > >>>>>
> > > > >>>>> I've been asked to spec some hardware for a NiFi installation.
> > Does
> > > > >>>> anyone
> > > > >>>>> have any advice? My gut feel is lots of processor cores and
> RAM,
> > > with
> > > > >>>> less
> > > > >>>>> emphasis on storage (small fast disks). Are there any
> limitations
> > > on
> > > > >> how
> > > > >>>>> many cores the JRE/NiFi can actually make use of, or any other
> > > > >>>>> considerations like that I should be aware of?
> > > > >>>>>
> > > > >>>>> Most likely will be pairs of servers in a cluster, but again
> any
> > > > advice
> > > > >>>> to
> > > > >>>>> the contrary would be appreciated.
> > > > >>>>>
> > > > >>>>> Cheers,
> > > > >>>>> Phil
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Phil H
Hi joe,

I moved the content and providence repositories off to two new disks, but
it seems like the vast majority of the writes are still occurring on the
disk where the flowfile and database repositories are. I note they don't
appear to be able to be split across disks in the same way?

On Fri, 14 Sep 2018 at 12:37, Joe Witt <[hidden email]> wrote:

> if they are physically seperate the diff should be quite noticable.
>
> On Thu, Sep 13, 2018, 7:36 PM Phil H <[hidden email]> wrote:
>
> > Potentially. We're looking to see how the multiple disks help before
> > committing to spending money on new hardware :)
> >
> > On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:
> >
> > > phil,
> > >
> > > as you add dirs it will just start using them.  if you want to no
> longer
> > > use the current dir it might be more involved.
> > >
> > > does that help?
> > >
> > > thanks
> > >
> > > On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
> > >
> > > > Follow up question - how do I transition to this new structure?
> Should
> > I
> > > > shut down NiFi and move the contents of the legacy single directories
> > > into
> > > > one of the new ones? For example:
> > > >
> > > > mv /usr/nifi/content_repository
> > > > /nifi/repos/content-1
> > > >
> > > > TIA
> > > > Phil
> > > >
> > > >
> > > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]>
> wrote:
> > > >
> > > > > Phil,
> > > > >
> > > > > For the content repository, you can configure the directory by
> > changing
> > > > > the value of
> > > > > the "nifi.content.repository.directory.default" property in
> > > > > nifi.properties. The suffix here,
> > > > > "default" is the name of this "container". You can have multiple
> > > > > containers by adding extra
> > > > > properties. So, for example, you could set:
> > > > >
> > > > > nifi.content.repository.directory.content1=
> > > > > /nifi/repos/content-1
> > > > >
> > > > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > > > >
> > > > > Similarly, the Provenance Repo property is named
> > > > > "nifi.provenance.repository.directory.default"
> > > > > and can have any number of "containers":
> > > > >
> > > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > > > >
> > > > > When NiFi writes to these, it does a Round Robin so that if you're
> > > > writing
> > > > > to 4 Flow Files'
> > > > > content simultaneously with different threads, you're able to get
> the
> > > > full
> > > > > throughput of each
> > > > > disk. (So if you have 4 disks for your content repo, each capable
> of
> > > > > writing 100 MB/sec, then
> > > > > your effective write rate to the content repo is 400 MB/sec).
> Similar
> > > > with
> > > > > Provenance Repository.
> > > > >
> > > > > Doing this also will allow you to hold a larger 'archive' of
> content
> > > and
> > > > > provenance data, because
> > > > > it will span the archive across all of the listed directories, as
> > well.
> > > > >
> > > > > Thanks
> > > > > -Mark
> > > > >
> > > > >
> > > > >
> > > > > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]> wrote:
> > > > > >
> > > > > > Thanks Mark, this is great advice.
> > > > > >
> > > > > > Disk access is certainly an issue with the current set up. I will
> > > > > certainly
> > > > > > shoot for NVMe disks in the build. How does NiFi get configured
> to
> > > span
> > > > > > it's repositories across multiple physical disks?
> > > > > >
> > > > > > Thanks,
> > > > > > Phil
> > > > > >
> > > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]>
> > > wrote:
> > > > > >
> > > > > >> Phil,
> > > > > >>
> > > > > >> As Sivaprasanna mentioned, your bottleneck will certainly depend
> > on
> > > > your
> > > > > >> flow.
> > > > > >> There's nothing inherent about NiFi or the JVM, AFAIK that would
> > > limit
> > > > > >> you. I've
> > > > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run
> > on
> > > > bare
> > > > > >> metal
> > > > > >> on servers containing 96+ cores. Most often, I see people with a
> > lot
> > > > of
> > > > > >> CPU cores
> > > > > >> but insufficient disk, so if you're running several cores ensure
> > > that
> > > > > >> you're using
> > > > > >> SSD's / NVMe's or enough spinning disks to accommodate the flow.
> > > NiFi
> > > > > does
> > > > > >> a good
> > > > > >> job of spanning the content and FlowFile repositories across
> > > multiple
> > > > > >> disks to take
> > > > > >> full advantage of the hardware, and scales the CPU vertically by
> > way
> > > > of
> > > > > >> multiple
> > > > > >> Processors and multiple concurrent tasks (threads) on a given
> > > > Processor.
> > > > > >>
> > > > > >> It really comes down to what you're doing in your flow, though.
> If
> > > > > you've
> > > > > >> got 96 cores and
> > > > > >> you're trying to perform 5 dozen transformations against a large
> > > > number
> > > > > of
> > > > > >> FlowFiles
> > > > > >> but have only a single spinning disk, then those 96 cores will
> > > likely
> > > > go
> > > > > >> to waste, because
> > > > > >> your disk will bottleneck you.
> > > > > >>
> > > > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely
> > going
> > > to
> > > > > >> waste a lot of
> > > > > >> disk because you won't have the CPU needed to reach the disks'
> > full
> > > > > >> potential.
> > > > > >> So you'll need to strike the correct balance for your use
> > case.Since
> > > > you
> > > > > >> have the
> > > > > >> flow running right now, I would recommend looking at things like
> > > `top`
> > > > > and
> > > > > >> `iostat` in order
> > > > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > > > >>
> > > > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of
> > ram
> > > > for
> > > > > >> the heap. However,
> > > > > >> more RAM means that your operating system can make better use of
> > > disk
> > > > > >> caching, which
> > > > > >> can certainly speed things up, especially if you're reading the
> > > > content
> > > > > >> several times for
> > > > > >> each FlowFile.
> > > > > >>
> > > > > >> Does this help at all?
> > > > > >>
> > > > > >> Thanks
> > > > > >> -Mark
> > > > > >>
> > > > > >>
> > > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]>
> wrote:
> > > > > >>>
> > > > > >>> Thanks for that. Sorry I should have been more specific - we
> > have a
> > > > > flow
> > > > > >>> running already on non-dedicated hardware. Looking to identify
> > any
> > > > > >>> limitations in NiFi/JVM that would limit how much parallelism
> it
> > > can
> > > > > take
> > > > > >>> advantage of
> > > > > >>>
> > > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > > > [hidden email]>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Phil,
> > > > > >>>>
> > > > > >>>> The hardware requirements are driven by the nature of the
> > dataflow
> > > > you
> > > > > >> are
> > > > > >>>> developing. If you're looking to play around with NiFi and
> gain
> > > some
> > > > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > > > >>>> laptops/computer would do the job. In my case, where I'm
> having
> > > 100s
> > > > > of
> > > > > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB
> > RAM
> > > > and
> > > > > >> 4(8)
> > > > > >>>> cores. I went with SSDs of smaller size because my flows are
> > > > involved
> > > > > in
> > > > > >>>> writing to object stores like Google Cloud Storage, Azure Blob
> > and
> > > > > >> Amazon
> > > > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > > > >>>>
> > > > > >>>> -
> > > > > >>>> Sivaprasanna
> > > > > >>>>
> > > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]>
> > > wrote:
> > > > > >>>>
> > > > > >>>>> Hi all,
> > > > > >>>>>
> > > > > >>>>> I've been asked to spec some hardware for a NiFi
> installation.
> > > Does
> > > > > >>>> anyone
> > > > > >>>>> have any advice? My gut feel is lots of processor cores and
> > RAM,
> > > > with
> > > > > >>>> less
> > > > > >>>>> emphasis on storage (small fast disks). Are there any
> > limitations
> > > > on
> > > > > >> how
> > > > > >>>>> many cores the JRE/NiFi can actually make use of, or any
> other
> > > > > >>>>> considerations like that I should be aware of?
> > > > > >>>>>
> > > > > >>>>> Most likely will be pairs of servers in a cluster, but again
> > any
> > > > > advice
> > > > > >>>> to
> > > > > >>>>> the contrary would be appreciated.
> > > > > >>>>>
> > > > > >>>>> Cheers,
> > > > > >>>>> Phil
> > > > > >>>>>
> > > > > >>>>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Joe Witt
The ff disk needs to be the quickest disk and should have no other
contention just like a db trans log would request.

The prov repo should also have its pwn disk.

The content repo can have one or more physical disks.

The best case is each repo is on physically separate disks/underlying
storage.  Not always an option i realize but for maximize performance it
matters.

then its about proper config and optimal flow design.

Its ok if your ff repo disk is always busy...thats a good thing.  If iostat
shows always 100% usage then prob things arent ideal.

thanks

On Thu, Sep 13, 2018, 9:16 PM Phil H <[hidden email]> wrote:

> Hi joe,
>
> I moved the content and providence repositories off to two new disks, but
> it seems like the vast majority of the writes are still occurring on the
> disk where the flowfile and database repositories are. I note they don't
> appear to be able to be split across disks in the same way?
>
> On Fri, 14 Sep 2018 at 12:37, Joe Witt <[hidden email]> wrote:
>
> > if they are physically seperate the diff should be quite noticable.
> >
> > On Thu, Sep 13, 2018, 7:36 PM Phil H <[hidden email]> wrote:
> >
> > > Potentially. We're looking to see how the multiple disks help before
> > > committing to spending money on new hardware :)
> > >
> > > On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:
> > >
> > > > phil,
> > > >
> > > > as you add dirs it will just start using them.  if you want to no
> > longer
> > > > use the current dir it might be more involved.
> > > >
> > > > does that help?
> > > >
> > > > thanks
> > > >
> > > > On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
> > > >
> > > > > Follow up question - how do I transition to this new structure?
> > Should
> > > I
> > > > > shut down NiFi and move the contents of the legacy single
> directories
> > > > into
> > > > > one of the new ones? For example:
> > > > >
> > > > > mv /usr/nifi/content_repository
> > > > > /nifi/repos/content-1
> > > > >
> > > > > TIA
> > > > > Phil
> > > > >
> > > > >
> > > > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]>
> > wrote:
> > > > >
> > > > > > Phil,
> > > > > >
> > > > > > For the content repository, you can configure the directory by
> > > changing
> > > > > > the value of
> > > > > > the "nifi.content.repository.directory.default" property in
> > > > > > nifi.properties. The suffix here,
> > > > > > "default" is the name of this "container". You can have multiple
> > > > > > containers by adding extra
> > > > > > properties. So, for example, you could set:
> > > > > >
> > > > > > nifi.content.repository.directory.content1=
> > > > > > /nifi/repos/content-1
> > > > > >
> > > > > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > > > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > > > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > > > > >
> > > > > > Similarly, the Provenance Repo property is named
> > > > > > "nifi.provenance.repository.directory.default"
> > > > > > and can have any number of "containers":
> > > > > >
> > > > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > > > > >
> > > > > > When NiFi writes to these, it does a Round Robin so that if
> you're
> > > > > writing
> > > > > > to 4 Flow Files'
> > > > > > content simultaneously with different threads, you're able to get
> > the
> > > > > full
> > > > > > throughput of each
> > > > > > disk. (So if you have 4 disks for your content repo, each capable
> > of
> > > > > > writing 100 MB/sec, then
> > > > > > your effective write rate to the content repo is 400 MB/sec).
> > Similar
> > > > > with
> > > > > > Provenance Repository.
> > > > > >
> > > > > > Doing this also will allow you to hold a larger 'archive' of
> > content
> > > > and
> > > > > > provenance data, because
> > > > > > it will span the archive across all of the listed directories, as
> > > well.
> > > > > >
> > > > > > Thanks
> > > > > > -Mark
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]>
> wrote:
> > > > > > >
> > > > > > > Thanks Mark, this is great advice.
> > > > > > >
> > > > > > > Disk access is certainly an issue with the current set up. I
> will
> > > > > > certainly
> > > > > > > shoot for NVMe disks in the build. How does NiFi get configured
> > to
> > > > span
> > > > > > > it's repositories across multiple physical disks?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Phil
> > > > > > >
> > > > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Phil,
> > > > > > >>
> > > > > > >> As Sivaprasanna mentioned, your bottleneck will certainly
> depend
> > > on
> > > > > your
> > > > > > >> flow.
> > > > > > >> There's nothing inherent about NiFi or the JVM, AFAIK that
> would
> > > > limit
> > > > > > >> you. I've
> > > > > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it
> run
> > > on
> > > > > bare
> > > > > > >> metal
> > > > > > >> on servers containing 96+ cores. Most often, I see people
> with a
> > > lot
> > > > > of
> > > > > > >> CPU cores
> > > > > > >> but insufficient disk, so if you're running several cores
> ensure
> > > > that
> > > > > > >> you're using
> > > > > > >> SSD's / NVMe's or enough spinning disks to accommodate the
> flow.
> > > > NiFi
> > > > > > does
> > > > > > >> a good
> > > > > > >> job of spanning the content and FlowFile repositories across
> > > > multiple
> > > > > > >> disks to take
> > > > > > >> full advantage of the hardware, and scales the CPU vertically
> by
> > > way
> > > > > of
> > > > > > >> multiple
> > > > > > >> Processors and multiple concurrent tasks (threads) on a given
> > > > > Processor.
> > > > > > >>
> > > > > > >> It really comes down to what you're doing in your flow,
> though.
> > If
> > > > > > you've
> > > > > > >> got 96 cores and
> > > > > > >> you're trying to perform 5 dozen transformations against a
> large
> > > > > number
> > > > > > of
> > > > > > >> FlowFiles
> > > > > > >> but have only a single spinning disk, then those 96 cores will
> > > > likely
> > > > > go
> > > > > > >> to waste, because
> > > > > > >> your disk will bottleneck you.
> > > > > > >>
> > > > > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely
> > > going
> > > > to
> > > > > > >> waste a lot of
> > > > > > >> disk because you won't have the CPU needed to reach the disks'
> > > full
> > > > > > >> potential.
> > > > > > >> So you'll need to strike the correct balance for your use
> > > case.Since
> > > > > you
> > > > > > >> have the
> > > > > > >> flow running right now, I would recommend looking at things
> like
> > > > `top`
> > > > > > and
> > > > > > >> `iostat` in order
> > > > > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > > > > >>
> > > > > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB
> of
> > > ram
> > > > > for
> > > > > > >> the heap. However,
> > > > > > >> more RAM means that your operating system can make better use
> of
> > > > disk
> > > > > > >> caching, which
> > > > > > >> can certainly speed things up, especially if you're reading
> the
> > > > > content
> > > > > > >> several times for
> > > > > > >> each FlowFile.
> > > > > > >>
> > > > > > >> Does this help at all?
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >> -Mark
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]>
> > wrote:
> > > > > > >>>
> > > > > > >>> Thanks for that. Sorry I should have been more specific - we
> > > have a
> > > > > > flow
> > > > > > >>> running already on non-dedicated hardware. Looking to
> identify
> > > any
> > > > > > >>> limitations in NiFi/JVM that would limit how much parallelism
> > it
> > > > can
> > > > > > take
> > > > > > >>> advantage of
> > > > > > >>>
> > > > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > > > > [hidden email]>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Phil,
> > > > > > >>>>
> > > > > > >>>> The hardware requirements are driven by the nature of the
> > > dataflow
> > > > > you
> > > > > > >> are
> > > > > > >>>> developing. If you're looking to play around with NiFi and
> > gain
> > > > some
> > > > > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > > > > >>>> laptops/computer would do the job. In my case, where I'm
> > having
> > > > 100s
> > > > > > of
> > > > > > >>>> dataflows, I have it clustered with 3 nodes. Each having
> 16GB
> > > RAM
> > > > > and
> > > > > > >> 4(8)
> > > > > > >>>> cores. I went with SSDs of smaller size because my flows are
> > > > > involved
> > > > > > in
> > > > > > >>>> writing to object stores like Google Cloud Storage, Azure
> Blob
> > > and
> > > > > > >> Amazon
> > > > > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > > > > >>>>
> > > > > > >>>> -
> > > > > > >>>> Sivaprasanna
> > > > > > >>>>
> > > > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]
> >
> > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi all,
> > > > > > >>>>>
> > > > > > >>>>> I've been asked to spec some hardware for a NiFi
> > installation.
> > > > Does
> > > > > > >>>> anyone
> > > > > > >>>>> have any advice? My gut feel is lots of processor cores and
> > > RAM,
> > > > > with
> > > > > > >>>> less
> > > > > > >>>>> emphasis on storage (small fast disks). Are there any
> > > limitations
> > > > > on
> > > > > > >> how
> > > > > > >>>>> many cores the JRE/NiFi can actually make use of, or any
> > other
> > > > > > >>>>> considerations like that I should be aware of?
> > > > > > >>>>>
> > > > > > >>>>> Most likely will be pairs of servers in a cluster, but
> again
> > > any
> > > > > > advice
> > > > > > >>>> to
> > > > > > >>>>> the contrary would be appreciated.
> > > > > > >>>>>
> > > > > > >>>>> Cheers,
> > > > > > >>>>> Phil
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Ideal hardware for NiFi

Phil H
Thanks Joe,

We are just trying to get a sense of load.

Out of interest, given the work to spread the other repo across disks, is there a reason why the FF repo isn’t split-able?  Seems like that is going to be our bottleneck going forward

Cheers,
Phil

Sent from Mail for Windows 10

From: Joe Witt
Sent: Friday, 14 September 2018 1:37 PM
To: [hidden email]
Subject: Re: Ideal hardware for NiFi

The ff disk needs to be the quickest disk and should have no other
contention just like a db trans log would request.

The prov repo should also have its pwn disk.

The content repo can have one or more physical disks.

The best case is each repo is on physically separate disks/underlying
storage.  Not always an option i realize but for maximize performance it
matters.

then its about proper config and optimal flow design.

Its ok if your ff repo disk is always busy...thats a good thing.  If iostat
shows always 100% usage then prob things arent ideal.

thanks

On Thu, Sep 13, 2018, 9:16 PM Phil H <[hidden email]> wrote:

> Hi joe,
>
> I moved the content and providence repositories off to two new disks, but
> it seems like the vast majority of the writes are still occurring on the
> disk where the flowfile and database repositories are. I note they don't
> appear to be able to be split across disks in the same way?
>
> On Fri, 14 Sep 2018 at 12:37, Joe Witt <[hidden email]> wrote:
>
> > if they are physically seperate the diff should be quite noticable.
> >
> > On Thu, Sep 13, 2018, 7:36 PM Phil H <[hidden email]> wrote:
> >
> > > Potentially. We're looking to see how the multiple disks help before
> > > committing to spending money on new hardware :)
> > >
> > > On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:
> > >
> > > > phil,
> > > >
> > > > as you add dirs it will just start using them.  if you want to no
> > longer
> > > > use the current dir it might be more involved.
> > > >
> > > > does that help?
> > > >
> > > > thanks
> > > >
> > > > On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
> > > >
> > > > > Follow up question - how do I transition to this new structure?
> > Should
> > > I
> > > > > shut down NiFi and move the contents of the legacy single
> directories
> > > > into
> > > > > one of the new ones? For example:
> > > > >
> > > > > mv /usr/nifi/content_repository
> > > > > /nifi/repos/content-1
> > > > >
> > > > > TIA
> > > > > Phil
> > > > >
> > > > >
> > > > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]>
> > wrote:
> > > > >
> > > > > > Phil,
> > > > > >
> > > > > > For the content repository, you can configure the directory by
> > > changing
> > > > > > the value of
> > > > > > the "nifi.content.repository.directory.default" property in
> > > > > > nifi.properties. The suffix here,
> > > > > > "default" is the name of this "container". You can have multiple
> > > > > > containers by adding extra
> > > > > > properties. So, for example, you could set:
> > > > > >
> > > > > > nifi.content.repository.directory.content1=
> > > > > > /nifi/repos/content-1
> > > > > >
> > > > > > nifi.content.repository.directory.content2=/nifi/repos/content-2
> > > > > > nifi.content.repository.directory.content3=/nifi/repos/content-3
> > > > > > nifi.content.repository.directory.content4=/nifi/repos/content-4
> > > > > >
> > > > > > Similarly, the Provenance Repo property is named
> > > > > > "nifi.provenance.repository.directory.default"
> > > > > > and can have any number of "containers":
> > > > > >
> > > > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> > > > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> > > > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> > > > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
> > > > > >
> > > > > > When NiFi writes to these, it does a Round Robin so that if
> you're
> > > > > writing
> > > > > > to 4 Flow Files'
> > > > > > content simultaneously with different threads, you're able to get
> > the
> > > > > full
> > > > > > throughput of each
> > > > > > disk. (So if you have 4 disks for your content repo, each capable
> > of
> > > > > > writing 100 MB/sec, then
> > > > > > your effective write rate to the content repo is 400 MB/sec).
> > Similar
> > > > > with
> > > > > > Provenance Repository.
> > > > > >
> > > > > > Doing this also will allow you to hold a larger 'archive' of
> > content
> > > > and
> > > > > > provenance data, because
> > > > > > it will span the archive across all of the listed directories, as
> > > well.
> > > > > >
> > > > > > Thanks
> > > > > > -Mark
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]>
> wrote:
> > > > > > >
> > > > > > > Thanks Mark, this is great advice.
> > > > > > >
> > > > > > > Disk access is certainly an issue with the current set up. I
> will
> > > > > > certainly
> > > > > > > shoot for NVMe disks in the build. How does NiFi get configured
> > to
> > > > span
> > > > > > > it's repositories across multiple physical disks?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Phil
> > > > > > >
> > > > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Phil,
> > > > > > >>
> > > > > > >> As Sivaprasanna mentioned, your bottleneck will certainly
> depend
> > > on
> > > > > your
> > > > > > >> flow.
> > > > > > >> There's nothing inherent about NiFi or the JVM, AFAIK that
> would
> > > > limit
> > > > > > >> you. I've
> > > > > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it
> run
> > > on
> > > > > bare
> > > > > > >> metal
> > > > > > >> on servers containing 96+ cores. Most often, I see people
> with a
> > > lot
> > > > > of
> > > > > > >> CPU cores
> > > > > > >> but insufficient disk, so if you're running several cores
> ensure
> > > > that
> > > > > > >> you're using
> > > > > > >> SSD's / NVMe's or enough spinning disks to accommodate the
> flow.
> > > > NiFi
> > > > > > does
> > > > > > >> a good
> > > > > > >> job of spanning the content and FlowFile repositories across
> > > > multiple
> > > > > > >> disks to take
> > > > > > >> full advantage of the hardware, and scales the CPU vertically
> by
> > > way
> > > > > of
> > > > > > >> multiple
> > > > > > >> Processors and multiple concurrent tasks (threads) on a given
> > > > > Processor.
> > > > > > >>
> > > > > > >> It really comes down to what you're doing in your flow,
> though.
> > If
> > > > > > you've
> > > > > > >> got 96 cores and
> > > > > > >> you're trying to perform 5 dozen transformations against a
> large
> > > > > number
> > > > > > of
> > > > > > >> FlowFiles
> > > > > > >> but have only a single spinning disk, then those 96 cores will
> > > > likely
> > > > > go
> > > > > > >> to waste, because
> > > > > > >> your disk will bottleneck you.
> > > > > > >>
> > > > > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely
> > > going
> > > > to
> > > > > > >> waste a lot of
> > > > > > >> disk because you won't have the CPU needed to reach the disks'
> > > full
> > > > > > >> potential.
> > > > > > >> So you'll need to strike the correct balance for your use
> > > case.Since
> > > > > you
> > > > > > >> have the
> > > > > > >> flow running right now, I would recommend looking at things
> like
> > > > `top`
> > > > > > and
> > > > > > >> `iostat` in order
> > > > > > >> to understand if you're reaching your limit on CPU, disk, etc.
> > > > > > >>
> > > > > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB
> of
> > > ram
> > > > > for
> > > > > > >> the heap. However,
> > > > > > >> more RAM means that your operating system can make better use
> of
> > > > disk
> > > > > > >> caching, which
> > > > > > >> can certainly speed things up, especially if you're reading
> the
> > > > > content
> > > > > > >> several times for
> > > > > > >> each FlowFile.
> > > > > > >>
> > > > > > >> Does this help at all?
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >> -Mark
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]>
> > wrote:
> > > > > > >>>
> > > > > > >>> Thanks for that. Sorry I should have been more specific - we
> > > have a
> > > > > > flow
> > > > > > >>> running already on non-dedicated hardware. Looking to
> identify
> > > any
> > > > > > >>> limitations in NiFi/JVM that would limit how much parallelism
> > it
> > > > can
> > > > > > take
> > > > > > >>> advantage of
> > > > > > >>>
> > > > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
> > > > > [hidden email]>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Phil,
> > > > > > >>>>
> > > > > > >>>> The hardware requirements are driven by the nature of the
> > > dataflow
> > > > > you
> > > > > > >> are
> > > > > > >>>> developing. If you're looking to play around with NiFi and
> > gain
> > > > some
> > > > > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> > > > > > >>>> laptops/computer would do the job. In my case, where I'm
> > having
> > > > 100s
> > > > > > of
> > > > > > >>>> dataflows, I have it clustered with 3 nodes. Each having
> 16GB
> > > RAM
> > > > > and
> > > > > > >> 4(8)
> > > > > > >>>> cores. I went with SSDs of smaller size because my flows are
> > > > > involved
> > > > > > in
> > > > > > >>>> writing to object stores like Google Cloud Storage, Azure
> Blob
> > > and
> > > > > > >> Amazon
> > > > > > >>>> S3 and NoSQL DBs. Hope this helps.
> > > > > > >>>>
> > > > > > >>>> -
> > > > > > >>>> Sivaprasanna
> > > > > > >>>>
> > > > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]
> >
> > > > wrote:
> > > > > > >>>>
> > > > > > >>>>> Hi all,
> > > > > > >>>>>
> > > > > > >>>>> I've been asked to spec some hardware for a NiFi
> > installation.
> > > > Does
> > > > > > >>>> anyone
> > > > > > >>>>> have any advice? My gut feel is lots of processor cores and
> > > RAM,
> > > > > with
> > > > > > >>>> less
> > > > > > >>>>> emphasis on storage (small fast disks). Are there any
> > > limitations
> > > > > on
> > > > > > >> how
> > > > > > >>>>> many cores the JRE/NiFi can actually make use of, or any
> > other
> > > > > > >>>>> considerations like that I should be aware of?
> > > > > > >>>>>
> > > > > > >>>>> Most likely will be pairs of servers in a cluster, but
> again
> > > any
> > > > > > advice
> > > > > > >>>> to
> > > > > > >>>>> the contrary would be appreciated.
> > > > > > >>>>>
> > > > > > >>>>> Cheers,
> > > > > > >>>>> Phil
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: Ideal hardware for NiFi

Mark Payne
Phil,

Each of the repositories is designed and implemented to meet specific goals. The FlowFile Repository is
designed in such a way that each update to the repo must be replayed in exactly the same order when it is
read from disk. Due to the nature of disk caching and operating system caching, this can lead to problems
if the content is striped across multiple files, as some files may be fully flushed to disk while others are not,
resulting in either data loss or incorrect ordering. So the FlowFile repo does span multiple disks.

That being said, it's also the least expensive repo to update and can handle often be updated hundreds of
thousands of times per second. In short - I've never seen it become the bottleneck on any system I've run :)

-Mark


> On Sep 14, 2018, at 12:43 AM, Phil H <[hidden email]> wrote:
>
> Thanks Joe,
>
> We are just trying to get a sense of load.
>
> Out of interest, given the work to spread the other repo across disks, is there a reason why the FF repo isn’t split-able?  Seems like that is going to be our bottleneck going forward
>
> Cheers,
> Phil
>
> Sent from Mail for Windows 10
>
> From: Joe Witt
> Sent: Friday, 14 September 2018 1:37 PM
> To: [hidden email]
> Subject: Re: Ideal hardware for NiFi
>
> The ff disk needs to be the quickest disk and should have no other
> contention just like a db trans log would request.
>
> The prov repo should also have its pwn disk.
>
> The content repo can have one or more physical disks.
>
> The best case is each repo is on physically separate disks/underlying
> storage.  Not always an option i realize but for maximize performance it
> matters.
>
> then its about proper config and optimal flow design.
>
> Its ok if your ff repo disk is always busy...thats a good thing.  If iostat
> shows always 100% usage then prob things arent ideal.
>
> thanks
>
> On Thu, Sep 13, 2018, 9:16 PM Phil H <[hidden email]> wrote:
>
>> Hi joe,
>>
>> I moved the content and providence repositories off to two new disks, but
>> it seems like the vast majority of the writes are still occurring on the
>> disk where the flowfile and database repositories are. I note they don't
>> appear to be able to be split across disks in the same way?
>>
>> On Fri, 14 Sep 2018 at 12:37, Joe Witt <[hidden email]> wrote:
>>
>>> if they are physically seperate the diff should be quite noticable.
>>>
>>> On Thu, Sep 13, 2018, 7:36 PM Phil H <[hidden email]> wrote:
>>>
>>>> Potentially. We're looking to see how the multiple disks help before
>>>> committing to spending money on new hardware :)
>>>>
>>>> On Fri, 14 Sep 2018 at 10:48, Joe Witt <[hidden email]> wrote:
>>>>
>>>>> phil,
>>>>>
>>>>> as you add dirs it will just start using them.  if you want to no
>>> longer
>>>>> use the current dir it might be more involved.
>>>>>
>>>>> does that help?
>>>>>
>>>>> thanks
>>>>>
>>>>> On Thu, Sep 13, 2018, 4:36 PM Phil H <[hidden email]> wrote:
>>>>>
>>>>>> Follow up question - how do I transition to this new structure?
>>> Should
>>>> I
>>>>>> shut down NiFi and move the contents of the legacy single
>> directories
>>>>> into
>>>>>> one of the new ones? For example:
>>>>>>
>>>>>> mv /usr/nifi/content_repository
>>>>>> /nifi/repos/content-1
>>>>>>
>>>>>> TIA
>>>>>> Phil
>>>>>>
>>>>>>
>>>>>> On Wed, 12 Sep 2018 at 06:15, Mark Payne <[hidden email]>
>>> wrote:
>>>>>>
>>>>>>> Phil,
>>>>>>>
>>>>>>> For the content repository, you can configure the directory by
>>>> changing
>>>>>>> the value of
>>>>>>> the "nifi.content.repository.directory.default" property in
>>>>>>> nifi.properties. The suffix here,
>>>>>>> "default" is the name of this "container". You can have multiple
>>>>>>> containers by adding extra
>>>>>>> properties. So, for example, you could set:
>>>>>>>
>>>>>>> nifi.content.repository.directory.content1=
>>>>>>> /nifi/repos/content-1
>>>>>>>
>>>>>>> nifi.content.repository.directory.content2=/nifi/repos/content-2
>>>>>>> nifi.content.repository.directory.content3=/nifi/repos/content-3
>>>>>>> nifi.content.repository.directory.content4=/nifi/repos/content-4
>>>>>>>
>>>>>>> Similarly, the Provenance Repo property is named
>>>>>>> "nifi.provenance.repository.directory.default"
>>>>>>> and can have any number of "containers":
>>>>>>>
>>>>>>> nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
>>>>>>> nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
>>>>>>> nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
>>>>>>> nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
>>>>>>>
>>>>>>> When NiFi writes to these, it does a Round Robin so that if
>> you're
>>>>>> writing
>>>>>>> to 4 Flow Files'
>>>>>>> content simultaneously with different threads, you're able to get
>>> the
>>>>>> full
>>>>>>> throughput of each
>>>>>>> disk. (So if you have 4 disks for your content repo, each capable
>>> of
>>>>>>> writing 100 MB/sec, then
>>>>>>> your effective write rate to the content repo is 400 MB/sec).
>>> Similar
>>>>>> with
>>>>>>> Provenance Repository.
>>>>>>>
>>>>>>> Doing this also will allow you to hold a larger 'archive' of
>>> content
>>>>> and
>>>>>>> provenance data, because
>>>>>>> it will span the archive across all of the listed directories, as
>>>> well.
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Mark
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 11, 2018, at 3:35 PM, Phil H <[hidden email]>
>> wrote:
>>>>>>>>
>>>>>>>> Thanks Mark, this is great advice.
>>>>>>>>
>>>>>>>> Disk access is certainly an issue with the current set up. I
>> will
>>>>>>> certainly
>>>>>>>> shoot for NVMe disks in the build. How does NiFi get configured
>>> to
>>>>> span
>>>>>>>> it's repositories across multiple physical disks?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Phil
>>>>>>>>
>>>>>>>> On Wed, 12 Sep 2018 at 01:32, Mark Payne <[hidden email]
>>>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Phil,
>>>>>>>>>
>>>>>>>>> As Sivaprasanna mentioned, your bottleneck will certainly
>> depend
>>>> on
>>>>>> your
>>>>>>>>> flow.
>>>>>>>>> There's nothing inherent about NiFi or the JVM, AFAIK that
>> would
>>>>> limit
>>>>>>>>> you. I've
>>>>>>>>> seen NiFi run on VM's containing 4-8 cores, and I've seen it
>> run
>>>> on
>>>>>> bare
>>>>>>>>> metal
>>>>>>>>> on servers containing 96+ cores. Most often, I see people
>> with a
>>>> lot
>>>>>> of
>>>>>>>>> CPU cores
>>>>>>>>> but insufficient disk, so if you're running several cores
>> ensure
>>>>> that
>>>>>>>>> you're using
>>>>>>>>> SSD's / NVMe's or enough spinning disks to accommodate the
>> flow.
>>>>> NiFi
>>>>>>> does
>>>>>>>>> a good
>>>>>>>>> job of spanning the content and FlowFile repositories across
>>>>> multiple
>>>>>>>>> disks to take
>>>>>>>>> full advantage of the hardware, and scales the CPU vertically
>> by
>>>> way
>>>>>> of
>>>>>>>>> multiple
>>>>>>>>> Processors and multiple concurrent tasks (threads) on a given
>>>>>> Processor.
>>>>>>>>>
>>>>>>>>> It really comes down to what you're doing in your flow,
>> though.
>>> If
>>>>>>> you've
>>>>>>>>> got 96 cores and
>>>>>>>>> you're trying to perform 5 dozen transformations against a
>> large
>>>>>> number
>>>>>>> of
>>>>>>>>> FlowFiles
>>>>>>>>> but have only a single spinning disk, then those 96 cores will
>>>>> likely
>>>>>> go
>>>>>>>>> to waste, because
>>>>>>>>> your disk will bottleneck you.
>>>>>>>>>
>>>>>>>>> Likewise, if you have 10 SSD's and only 8 cores you're likely
>>>> going
>>>>> to
>>>>>>>>> waste a lot of
>>>>>>>>> disk because you won't have the CPU needed to reach the disks'
>>>> full
>>>>>>>>> potential.
>>>>>>>>> So you'll need to strike the correct balance for your use
>>>> case.Since
>>>>>> you
>>>>>>>>> have the
>>>>>>>>> flow running right now, I would recommend looking at things
>> like
>>>>> `top`
>>>>>>> and
>>>>>>>>> `iostat` in order
>>>>>>>>> to understand if you're reaching your limit on CPU, disk, etc.
>>>>>>>>>
>>>>>>>>> As far as RAM is concerned, NiFI typically only needs 4-8 GB
>> of
>>>> ram
>>>>>> for
>>>>>>>>> the heap. However,
>>>>>>>>> more RAM means that your operating system can make better use
>> of
>>>>> disk
>>>>>>>>> caching, which
>>>>>>>>> can certainly speed things up, especially if you're reading
>> the
>>>>>> content
>>>>>>>>> several times for
>>>>>>>>> each FlowFile.
>>>>>>>>>
>>>>>>>>> Does this help at all?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> -Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Sep 10, 2018, at 6:05 AM, Phil H <[hidden email]>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for that. Sorry I should have been more specific - we
>>>> have a
>>>>>>> flow
>>>>>>>>>> running already on non-dedicated hardware. Looking to
>> identify
>>>> any
>>>>>>>>>> limitations in NiFi/JVM that would limit how much parallelism
>>> it
>>>>> can
>>>>>>> take
>>>>>>>>>> advantage of
>>>>>>>>>>
>>>>>>>>>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <
>>>>>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Phil,
>>>>>>>>>>>
>>>>>>>>>>> The hardware requirements are driven by the nature of the
>>>> dataflow
>>>>>> you
>>>>>>>>> are
>>>>>>>>>>> developing. If you're looking to play around with NiFi and
>>> gain
>>>>> some
>>>>>>>>>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
>>>>>>>>>>> laptops/computer would do the job. In my case, where I'm
>>> having
>>>>> 100s
>>>>>>> of
>>>>>>>>>>> dataflows, I have it clustered with 3 nodes. Each having
>> 16GB
>>>> RAM
>>>>>> and
>>>>>>>>> 4(8)
>>>>>>>>>>> cores. I went with SSDs of smaller size because my flows are
>>>>>> involved
>>>>>>> in
>>>>>>>>>>> writing to object stores like Google Cloud Storage, Azure
>> Blob
>>>> and
>>>>>>>>> Amazon
>>>>>>>>>>> S3 and NoSQL DBs. Hope this helps.
>>>>>>>>>>>
>>>>>>>>>>> -
>>>>>>>>>>> Sivaprasanna
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <[hidden email]
>>>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I've been asked to spec some hardware for a NiFi
>>> installation.
>>>>> Does
>>>>>>>>>>> anyone
>>>>>>>>>>>> have any advice? My gut feel is lots of processor cores and
>>>> RAM,
>>>>>> with
>>>>>>>>>>> less
>>>>>>>>>>>> emphasis on storage (small fast disks). Are there any
>>>> limitations
>>>>>> on
>>>>>>>>> how
>>>>>>>>>>>> many cores the JRE/NiFi can actually make use of, or any
>>> other
>>>>>>>>>>>> considerations like that I should be aware of?
>>>>>>>>>>>>
>>>>>>>>>>>> Most likely will be pairs of servers in a cluster, but
>> again
>>>> any
>>>>>>> advice
>>>>>>>>>>> to
>>>>>>>>>>>> the contrary would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Phil
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>