Main function
-
es_search()
executes an Elasticsearch query and gets a data.table
NEWS.md
_all
as a way to reference all indices, changing the response format of hits.total
into an object like {"hits": {"total": 50}}
, and restricting all indices to have a single type of document. More details can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html.closeAllConnections()
in unit tests because they were superfluous and causing problems on certain operating systems in the CRAN check farm.unique(outDT)
to unique(outDT, by = "_id")
. This was prompted by Rdatatable/data.table#3332 (changes in data.table
1.12.0), but it’s actually faster and safer anyway!Content-Type
header. Previous versions of Elasticsearch tried to guess the Content-Type
when none was declareduptasticsearch
will now hit the cluster to try to figure out which version of Elasticsearch it is running, then use the appropriate scrolling strategy.get_fields()
when your index has no aliasesget_fields()
broke on some legacy versions of Elasticsearch where no aliases had been created. The response on the _cat/aliases
endpoint has changed from major version to major version. #66 fixed this for all major versions of Elasticsearch from 1.0 to 6.2get_fields()
when your index has multiple aliasesget_fields()
would only return one of those. As of #73, mappings for the underlying physical index will now be duplicated once per alias in the table returned by get_fields()
.uptasticsearch
attempts to query the Elasticsearch host to figure out what major version of Elasticsearch is running there. Implementation errors in that PR led to versions being parsed incorrectly but silently passing tests. This was fixed in #66. NOTE: this only impacted the dev version of the library on Github.ignore_scroll_restriction
not being respecteduptasticsearch
, the value passed to es_search()
for ignore_scroll_restriction
was not actually respected. This was possible because an internal function had defaults specified, so we never caught the fact that that value wasn’t getting passed through. #66 instituted the practice of not specifying defaults on function arguments in internal functions, so similar bugs won’t be able to silently get through testing in the future.get_counts()
. This function was outside the core mission of the package and exposed us unnecessarily to changes in the Elasticsearch DSLunpack_nested_data()
httr::RETRY
instead of one-shot POST
or GET
callsget_fields()
returns a data.table with the names and types of all indexed fields across one or more indiceses_search()
now accepts an intermediates_dir
parameter, giving users control over the directory used for temporary I/O at query timees_search()
executes an Elasticsearch query and gets a data.tablechomp_aggs()
converts a raw aggs JSON to data.tablechomp_hits()
converts a raw hits JSON to data.tableunpack_nested_data()
deals with nested Elasticsearch data not in a tabular formatparse_date_time()
parses date-times from Elasticsearch records