Welcome to CEDA Facet Scanner’s documentation!¶
This documentation describes the CEDA facet scanner. This is the package which is used to extract facets from collections of datasets which can then be fed into OpenSearch.
The extracted data is fed into elasticsearch.
Installation¶
Install the requirements:
pip install -r requirements.txt
Install the library:
pip install git+https://github.com/cedadev/facet-scanner
Basic Usage¶
This code can be used to bulk process a dataset for testing and initialisation:
usage: facet_scanner [-h] [--rerun] [--num-files NUM_FILES] [--conf CONF]
path processing_path
Process path for facets and update the index
positional arguments:
path Path to process
processing_path Path to output intermediate files
optional arguments:
-h, --help show this help message and exit
--rerun Disable paging to disk on rerun
--num-files NUM_FILES
Number of files per lotus job
--conf CONF
The script uses your supplied path and queries elasticsearch for all the files under this point. The --num-files
flag sets the page size and determines how many files end up in each lotus batch job.
Contents: