Apache NiFi vs Spring XD

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache NiFi vs Spring XD

Ryan Riddel
Hi,

I've been studying possible products for GUI-based pipeline creation, and
the three applications that I've come across are Apache Nifi, Spring XD,
and Cask CDAP.  Cask runs Spark under the hood, and so its too slow, but
the differences between NiFi and Spring XD seem to me more subtle.  They
are both performant enough to handle my requirements: <25ms end-end latency
for a simple pipeline, with 600GB/day of throughput (500Mbps peak)

I work at a prop (trading)  shop, and my goal is to make a platform with
which traders can implement their own algorithms without writing a line of
code.  NiFi and Spring XD seem very similar, except XD seems to be slightly
more powerful (where NiFi can't do joining and complex windowing, XD can).

I've trawled both mailing lists, but haven't found such a comparison.
Would anyone care to add some points of comparison between Spring XD and
NiFi?  I'd be eager to contribute to the conversation with whatever stuff
I've learned.

Ryan
Reply | Threaded
Open this post in threaded view
|

Re: Apache NiFi vs Spring XD

Joe Witt
Ryan - it is way too easy/lazy to throw around FUD for other
projects/communities so we just try to avoid that and focus on what we
know here which is Apache NiFi.

So, you'll probably not find a lot of help here in the apache nifi
mailing lists for a real comparison but we can do is talk about your
specific requirements and whether NiFi would be a good fit now and
going forward.

First, NiFi's goal is not really about the direct execution of
streaming analytics.  For that you want a system that offers full
complex event processing capabilities and things like windowing
capabilities for temporal and spatial correlation (for example).
There are good options out there for specifically that purpose such as
Apache Storm, Apache Flink, and others.  Now, Apache NiFi can be used
for a great range of powerful streaming data processing cases and in
traditional terms these would be simple event processing cases.  There
are a ton of data transformations of format/schema, feature
extraction, etc.. that are done in NiFi all the time.  We've stopped
short of going into the complex analytics or providing first class
support for windowing.

There are some really important and differentiated features in NiFi
that you'll want to consider for your case.  If they're not important
for you then it is probably not worth using NiFi.  First, NiFi offers
an interactive command and control model that works on both single
node and clusters of NiFi systems whereby changes you make to the
system are actually happening.  This is a very powerful construct that
allows authorized users to make live changes to the flow as data is
flowing.  This makes for highly rapid evolutions from development
phases through production.  We made branching data extremely cheap and
easy thanks to the underlying content repository approach we have.
Second, we have an extremely fine grained data tracking/data
provenance capability.  This drives replay, click-to-content, and
troubleshooting to be an extremely powerful part of the tool.  It
shows you how data came into nifi, what we learned about it, did to
it, where we sent it, dropped it, etc.. It works even across complex
flow graphs with branching, merging, etc.. You really have to check
that part out.

The other point i'll build on is when I mentioned the content
repository.  NiFi is a system which can handle extremely large objects
right next to really small objects.  That is the case because of how
its repositories and API works.  Other systems tend to load data into
memory and become very memory sensitive for even really easy cases or
have questionable data guarantees unless flows are build in very
simple linear chains.  We talk about the repositories a lot more here
[1]

Be sure to take a look at activity of the community and the projects
overall as you think through these things.

Hopefully this helps a bit.

Thanks
Joe

[1] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html

On Tue, Sep 5, 2017 at 5:58 PM, Ryan Riddel <[hidden email]> wrote:

> Hi,
>
> I've been studying possible products for GUI-based pipeline creation, and
> the three applications that I've come across are Apache Nifi, Spring XD,
> and Cask CDAP.  Cask runs Spark under the hood, and so its too slow, but
> the differences between NiFi and Spring XD seem to me more subtle.  They
> are both performant enough to handle my requirements: <25ms end-end latency
> for a simple pipeline, with 600GB/day of throughput (500Mbps peak)
>
> I work at a prop (trading)  shop, and my goal is to make a platform with
> which traders can implement their own algorithms without writing a line of
> code.  NiFi and Spring XD seem very similar, except XD seems to be slightly
> more powerful (where NiFi can't do joining and complex windowing, XD can).
>
> I've trawled both mailing lists, but haven't found such a comparison.
> Would anyone care to add some points of comparison between Spring XD and
> NiFi?  I'd be eager to contribute to the conversation with whatever stuff
> I've learned.
>
> Ryan