Need help regarding "How to skip inserting null value in Hive during merging/updating to those column whose field is missing from input"

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Need help regarding "How to skip inserting null value in Hive during merging/updating to those column whose field is missing from input"

Dinesh Goyal
Hey,
Need little help. I am streaming incremental web-log through Kafka and then
storing into HDFS further we are inserting data into Elastic-search
incrementally as soon as it reach in HDFS by pulling the data through HIVE
Select Query (HDFS).
My web log contains partial information ie. if i have 10 columns in my hive
table schema i may receiving only 6 fields through logs whose value get
modified. Now During updating/merging into hive table 6 fields get modified
by new value and remaining 4 is replaced by Null. Currently I am exploring
Kylo/ Nifi framework for these task.What i'm looking for a way in Hive that
those 4 column will remains same as it was in previous state.

let's take an example:

*Our use case is let's assume our web-logs looks like this *
{"id": 1, "first_name": "A", "last_name": "B"}
{"id": 2, "first_name": "C", "last_name": "D"}
{"id": 1, "last_name": "E"}
{"id": 2, "first_name": "F"}

*After storing above data into HDFS, HIVE select query look like this:*

*in case of no merging:*

*id*

*first_name*

*last_name*

1

A

B

2

C

D

1

NULL

E

2

F

NULL


*in case of merging:*

*id*

*first_name*

*last_name*

1

NULL

E

2

F

NULL


In above cases when we insert data into ES we get resultant data as
(Assuming id column represent unique id in ES index)

*id*

*first_name*

*last_name*

1

NULL

E

2

F

NULL


*What we expecting is o/p something like *

*id*

*first_name*

*last_name*

1

A

E

2

F

D


We don't want to store null in case key is not present in logs during
merging. Can you guide us the right approach for merging?
Note: We are using Kylo (Nifi processor) for executing all of these task.
--
--
Thanks
Dinesh Goyal
Senior Software Developer