Unpack a nested data.table — unpack_nested

After calling a chomp_* function or es_search, if you had a nested array in the JSON, its corresponding column in the resulting data.table is a data.frame itself (or a list of vectors). This function expands that nested column out, adding its data to the original data.table, and duplicating metadata down the rows as necessary.

This is a side-effect-free function: it returns a new data.table and the input data.table is unmodified.

Usage

unpack_nested_data(chomped_df, col_to_unpack)

Arguments

chomped_df: a data.table
col_to_unpack: a character vector of length one: the column name to unpack

Examples

# A sample raw result from a hits query:
result <- '[{"_source":{"timestamp":"2017-01-01","cust_name":"Austin","details":{
"cust_class":"big_spender","location":"chicago","pastPurchases":[{"film":"The Notebook",
"pmt_amount":6.25},{"film":"The Town","pmt_amount":8.00},{"film":"Zootopia","pmt_amount":7.50,
"matinee":true}]}}},{"_source":{"timestamp":"2017-02-02","cust_name":"James","details":{
"cust_class":"peasant","location":"chicago","pastPurchases":[{"film":"Minions",
"pmt_amount":6.25,"matinee":true},{"film":"Rogue One","pmt_amount":10.25},{"film":"Bridesmaids",
"pmt_amount":8.75},{"film":"Bridesmaids","pmt_amount":6.25,"matinee":true}]}}},{"_source":{
"timestamp":"2017-03-03","cust_name":"Nick","details":{"cust_class":"critic","location":"cannes",
"pastPurchases":[{"film":"Aala Kaf Ifrit","pmt_amount":0,"matinee":true},{
"film":"Dopo la guerra (Apres la Guerre)","pmt_amount":0,"matinee":true},{
"film":"Avengers: Infinity War","pmt_amount":12.75}]}}}]'

# Chomp into a data.table
sampleChompedDT <- chomp_hits(hits_json = result, keep_nested_data_cols = TRUE)
#> INFO [2025-02-10 16:21:10] Keeping the following nested data columns. Consider using unpack_nested_data for one:
#>  details.pastPurchases
print(sampleChompedDT)
#>     timestamp cust_name details.cust_class details.location
#>        <char>    <char>             <char>           <char>
#> 1: 2017-01-01    Austin        big_spender          chicago
#> 2: 2017-02-02     James            peasant          chicago
#> 3: 2017-03-03      Nick             critic           cannes
#>    details.pastPurchases
#>                   <list>
#> 1:     <data.frame[3x3]>
#> 2:     <data.frame[4x3]>
#> 3:     <data.frame[3x3]>

# (Note: use es_search() to get here in one step)

# Unpack by details.pastPurchases
unpackedDT <- unpack_nested_data(chomped_df = sampleChompedDT
                                 , col_to_unpack = "details.pastPurchases")
print(unpackedDT)
#>      timestamp cust_name details.cust_class details.location
#>         <char>    <char>             <char>           <char>
#>  1: 2017-01-01    Austin        big_spender          chicago
#>  2: 2017-01-01    Austin        big_spender          chicago
#>  3: 2017-01-01    Austin        big_spender          chicago
#>  4: 2017-02-02     James            peasant          chicago
#>  5: 2017-02-02     James            peasant          chicago
#>  6: 2017-02-02     James            peasant          chicago
#>  7: 2017-02-02     James            peasant          chicago
#>  8: 2017-03-03      Nick             critic           cannes
#>  9: 2017-03-03      Nick             critic           cannes
#> 10: 2017-03-03      Nick             critic           cannes
#>                                 film pmt_amount matinee
#>                               <char>      <num>  <lgcl>
#>  1:                     The Notebook       6.25      NA
#>  2:                         The Town       8.00      NA
#>  3:                         Zootopia       7.50    TRUE
#>  4:                          Minions       6.25    TRUE
#>  5:                        Rogue One      10.25      NA
#>  6:                      Bridesmaids       8.75      NA
#>  7:                      Bridesmaids       6.25    TRUE
#>  8:                   Aala Kaf Ifrit       0.00    TRUE
#>  9: Dopo la guerra (Apres la Guerre)       0.00    TRUE
#> 10:           Avengers: Infinity War      12.75      NA