Collection Handlers

Collection handlers handle the extraction of facets from a set of files which share a common structure.

Base

class facet_scanner.collection_handlers.base.CollectionHandler(*args, **kwargs)

Base Class for all collection handlers.

Parameters

kwargs – Passed to ElasticsearchConnection class

Attr extensions

File extensions to include. If none provided will default to all

Attr filters

Additional filters to go as part of the must_not clause of the elasticsearch query when retrieving the initial file list.

export_facets(path, index, processing_path, lotus=True, rerun=False, batch_size=500)

Dumps the list of files to process and calls lotus to add to index

Parameters
  • path – directory root of the collection

  • index – index to add the facets to

  • processing_path – directory to place the elasticsearch pages for processing by lotus

  • lotus – Boolean. True will set processes to run on lotus

  • batch_size – Size of pages to send for processing

get_facets(path)

Each collection handler must specify the method for extracting the facets

Parameters

path – File path

Returns

dict Facet:value pairs

property project_name

Make the setting of a project name mandatory. Abstract property for name of the project eg. opensearch

update_facets(path, index)

Take a file containing elasticsearch documents and update the index with facets at these locations

Parameters
  • path – Path to elasticsearch input file

  • index – Index to update

CCI

class facet_scanner.collection_handlers.cci.CCI(*args, **kwargs)

Collection Handler for the CCI project

Parameters
  • collection_root – Used when building the root object for this collection

  • facet_json – Used?

Attr collection_id

The collection id for root collection

Attr collection_title

The collection Title for root collection

Attr project_name

The project to attach the metadata to a

Attr extensions

File extension filters

Attr filters

Additional filters

Attr facets

Facet mappings for use in get_facets method

get_facets(path)

Extract the facets from the file path

Parameters

path – File path

Returns

Dict Facet:value pairs

CMIP5

class facet_scanner.collection_handlers.cmip5.CMIP5(*args, **kwargs)
get_facets(path)

Extract the facets from the file path :param path: File path :return: Dict Facet:value pairs

Utils

Collection Map

A Python dictionary which contains the mapping from path to collection handler.py

The key pairs are formed:

'/path/to/dataset: dict(handler='facet_scanner.collection_handlers.<handler_module>.<handler_class>')

Facet Factory

class facet_scanner.collection_handlers.utils.facet_factory.FacetFactory

Factory Class to return the correct collection handler based on the given filepath.

get_collection_map(path)

Takes an arbitrary path and returns a collection path

Parameters

path (str) – Path to the data of interest

Returns

The value from the map object

Return type

str, str

get_handler(path: str) → Tuple[Optional[facet_scanner.collection_handlers.base.CollectionHandler], Optional[str]]

Takes a system path and returns the correct handler for the collection.

Parameters

path (str) – Filepath

Returns

handler class, collection path

Return type

CollectionHandler, str

Moles Datasets

class facet_scanner.collection_handlers.utils.moles_datasets.CatalogueDatasets(moles_base='http://api.catalogue.ceda.ac.uk')

Class to map a filepath to the relate MOLES record

Parameters

moles_base (str) – Base URL to the MOLES api server (default: http://api.catalogue.ceda.ac.uk).

get_moles_record_metadata(path)

Try and find metadata for a MOLES record associated with the path.

Example API response:

{
    "title": "ESA Fire Climate Change Initiative Project  (Fire CCI)",
    "url": "http://catalogue.ceda.ac.uk/uuid/6c3584d985bd484e8beb23ff0df91292",
    "record_type": "Project",
    "record_path": "",
    "publication_state": "published"
}
Parameters

path (str) – Directory path

Returns

Dictionary containing MOLES title, url and record_type

Return type

dict