Collection Handlers¶
Collection handlers handle the extraction of facets from a set of files which share a common structure.
Base¶
-
class
facet_scanner.collection_handlers.base.CollectionHandler(*args, **kwargs)¶ Base Class for all collection handlers.
- Parameters
kwargs – Passed to ElasticsearchConnection class
- Attr extensions
File extensions to include. If none provided will default to all
- Attr filters
Additional filters to go as part of the must_not clause of the elasticsearch query when retrieving the initial file list.
-
export_facets(path, index, processing_path, lotus=True, rerun=False, batch_size=500)¶ Dumps the list of files to process and calls lotus to add to index
- Parameters
path – directory root of the collection
index – index to add the facets to
processing_path – directory to place the elasticsearch pages for processing by lotus
lotus – Boolean. True will set processes to run on lotus
batch_size – Size of pages to send for processing
-
get_facets(path)¶ Each collection handler must specify the method for extracting the facets
- Parameters
path – File path
- Returns
dict Facet:value pairs
-
property
project_name¶ Make the setting of a project name mandatory. Abstract property for name of the project eg. opensearch
-
update_facets(path, index)¶ Take a file containing elasticsearch documents and update the index with facets at these locations
- Parameters
path – Path to elasticsearch input file
index – Index to update
CCI¶
-
class
facet_scanner.collection_handlers.cci.CCI(*args, **kwargs)¶ Collection Handler for the CCI project
- Parameters
collection_root – Used when building the root object for this collection
facet_json – Used?
- Attr collection_id
The collection id for root collection
- Attr collection_title
The collection Title for root collection
- Attr project_name
The project to attach the metadata to a
- Attr extensions
File extension filters
- Attr filters
Additional filters
- Attr facets
Facet mappings for use in get_facets method
-
get_facets(path)¶ Extract the facets from the file path
- Parameters
path – File path
- Returns
Dict Facet:value pairs
Utils¶
Collection Map¶
A Python dictionary which contains the mapping from path to collection handler.py
The key pairs are formed:
'/path/to/dataset: dict(handler='facet_scanner.collection_handlers.<handler_module>.<handler_class>')
Facet Factory¶
-
class
facet_scanner.collection_handlers.utils.facet_factory.FacetFactory¶ Factory Class to return the correct collection handler based on the given filepath.
-
get_collection_map(path)¶ Takes an arbitrary path and returns a collection path
- Parameters
path (str) – Path to the data of interest
- Returns
The value from the map object
- Return type
str, str
-
get_handler(path: str) → Tuple[Optional[facet_scanner.collection_handlers.base.CollectionHandler], Optional[str]]¶ Takes a system path and returns the correct handler for the collection.
- Parameters
path (str) – Filepath
- Returns
handler class, collection path
- Return type
CollectionHandler, str
-
Moles Datasets¶
-
class
facet_scanner.collection_handlers.utils.moles_datasets.CatalogueDatasets(moles_base='http://api.catalogue.ceda.ac.uk')¶ Class to map a filepath to the relate MOLES record
- Parameters
moles_base (str) – Base URL to the MOLES api server (default: http://api.catalogue.ceda.ac.uk).
-
get_moles_record_metadata(path)¶ Try and find metadata for a MOLES record associated with the path.
Example API response:
{ "title": "ESA Fire Climate Change Initiative Project (Fire CCI)", "url": "http://catalogue.ceda.ac.uk/uuid/6c3584d985bd484e8beb23ff0df91292", "record_type": "Project", "record_path": "", "publication_state": "published" }
- Parameters
path (str) – Directory path
- Returns
Dictionary containing MOLES title, url and record_type
- Return type
dict