A function for converting Elasticsearch docs into R data.tables. It
uses fromJSON
with flatten = TRUE
to convert a
JSON into an R data.frame, and formats it into a data.table.
Arguments
- hits_json
A character vector. If its length is greater than 1, its elements will be pasted together. This can contain a JSON returned from a
search
query in Elasticsearch, or a filepath or URL pointing at one.- keep_nested_data_cols
a boolean (default TRUE); whether to keep columns that are nested arrays in the original JSON. A warning will be given if these columns are deleted.
Examples
# A sample raw result from a hits query:
result <- '[{"_source":{"timestamp":"2017-01-01","cust_name":"Austin","details":{
"cust_class":"big_spender","location":"chicago","pastPurchases":[{"film":"The Notebook",
"pmt_amount":6.25},{"film":"The Town","pmt_amount":8.00},{"film":"Zootopia","pmt_amount":7.50,
"matinee":true}]}}},{"_source":{"timestamp":"2017-02-02","cust_name":"James","details":{
"cust_class":"peasant","location":"chicago","pastPurchases":[{"film":"Minions",
"pmt_amount":6.25,"matinee":true},{"film":"Rogue One","pmt_amount":10.25},{"film":"Bridesmaids",
"pmt_amount":8.75},{"film":"Bridesmaids","pmt_amount":6.25,"matinee":true}]}}},{"_source":{
"timestamp":"2017-03-03","cust_name":"Nick","details":{"cust_class":"critic","location":"cannes",
"pastPurchases":[{"film":"Aala Kaf Ifrit","pmt_amount":0,"matinee":true},{
"film":"Dopo la guerra (Apres la Guerre)","pmt_amount":0,"matinee":true},{
"film":"Avengers: Infinity War","pmt_amount":12.75}]}}}]'
# Chomp into a data.table
sampleChompedDT <- chomp_hits(hits_json = result, keep_nested_data_cols = TRUE)
#> INFO [2025-02-10 16:21:07] Keeping the following nested data columns. Consider using unpack_nested_data for one:
#> details.pastPurchases
print(sampleChompedDT)
#> timestamp cust_name details.cust_class details.location
#> <char> <char> <char> <char>
#> 1: 2017-01-01 Austin big_spender chicago
#> 2: 2017-02-02 James peasant chicago
#> 3: 2017-03-03 Nick critic cannes
#> details.pastPurchases
#> <list>
#> 1: <data.frame[3x3]>
#> 2: <data.frame[4x3]>
#> 3: <data.frame[3x3]>
# (Note: use es_search() to get here in one step)
# Unpack by details.pastPurchases
unpackedDT <- unpack_nested_data(chomped_df = sampleChompedDT
, col_to_unpack = "details.pastPurchases")
print(unpackedDT)
#> timestamp cust_name details.cust_class details.location
#> <char> <char> <char> <char>
#> 1: 2017-01-01 Austin big_spender chicago
#> 2: 2017-01-01 Austin big_spender chicago
#> 3: 2017-01-01 Austin big_spender chicago
#> 4: 2017-02-02 James peasant chicago
#> 5: 2017-02-02 James peasant chicago
#> 6: 2017-02-02 James peasant chicago
#> 7: 2017-02-02 James peasant chicago
#> 8: 2017-03-03 Nick critic cannes
#> 9: 2017-03-03 Nick critic cannes
#> 10: 2017-03-03 Nick critic cannes
#> film pmt_amount matinee
#> <char> <num> <lgcl>
#> 1: The Notebook 6.25 NA
#> 2: The Town 8.00 NA
#> 3: Zootopia 7.50 TRUE
#> 4: Minions 6.25 TRUE
#> 5: Rogue One 10.25 NA
#> 6: Bridesmaids 8.75 NA
#> 7: Bridesmaids 6.25 TRUE
#> 8: Aala Kaf Ifrit 0.00 TRUE
#> 9: Dopo la guerra (Apres la Guerre) 0.00 TRUE
#> 10: Avengers: Infinity War 12.75 NA