LIDOInspector guide#
The basic functionality of LIDOInspector is to analyse metadata records for objects and works based on the LIDO metadata standard for formal data quality aspects. Inspections are performed in the following steps:
Read metadata records for objects and works
Customise inspection settings
Carry out inspections
Further process or output the results
The inspection process is mainly determined by the settings made. It makes sense to use different configuration files for different purposes.
In addition, LIDOInspector provides numerous methods for advanced users to design their own inspection processes.
Quick start#
Import LIDOInspector:
from nfdinspector.lido_inspector import LIDOInspector
Initialize a LIDOInspector. You can specify a language (currently available: “en” or “de”) for the error messages:
lido_inspector = LIDOInspector(error_lang="de")
Read LIDO files you want to inspect:
lido_inspector.read_lido_files("files_path")
Refer to a configuration file (optional). Without this step the inspections are executed with a default configuration:
lido_inspector.config_file("file_path")
Execute the inspections:
lido_inspector.inspect()
Save the inspections as a JSON file. You can specify the indention (default: None):
lido_inspector.to_json("file_path", indent=4)
Save the inspections as a CSV file. You can specify a field separator/delimiter (default: “,”):
lido_inspector.to_csv("file_path", delimiter=";")
Reading metadata records#
LIDO metadata records can be read either as files or as XML strings. The data must be valid LIDO XML in order for subsequent inspections to work correctly.
nfdinspector.lido_inspector.LIDOInspector.read_lido_files()
can be used to read multiple LIDO files at once and nfdinspector.lido_inspector.LIDOInspector.read_lido_file()
to read a single LIDO file.
nfdinspector.lido_inspector.LIDOInspector.read_lido()
again reads an XML string.
The data is stored in nfdinspector.lido_inspector.LIDOInspector.lido_objects
.
Note that the data read will always overwrite itself.
Examples:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
# Read multiple LIDO files
lido_inspector.read_lido_files("files_path")
# Read single LIDO file
lido_inspector.read_lido_file("file_path")
# Read LIDO as an XML string
lido_inspector.read_lido("xml_string")
Configuration#
The configuration of LIDOInspector is stored in the file nfdinspector.lido_inspector.LIDOInspector.configuration
.
This is a dict
that lists data fields for which specific settings can be made.
For example, for the data field ‘title’ the settings ‘inspect’, ‘unique’, ‘distinct_from_type’, ‘min_word_num’ and ‘max_word_num’ can be specified.
setting |
dtype |
description |
---|---|---|
inspect |
|
specifies if a data field should be inspected |
ref |
|
specifies if a reference to a vocabulary or similar should be given |
unique |
|
specifies if an appellation should be unique in the records |
distinct_from_type |
|
specifies if an appellation should be differnt from the object-/worktype |
min_word_num |
|
specifies the minimum word number of a text |
max_word_num |
|
specifies the maximum word number of a text |
min_num |
|
specifies the minimum number of terms |
pattern |
|
specifies a valid pattern based on regular expressions |
patterns |
|
specifies valid patterns based on regular expressions |
The settings available depend on the data field.
data field |
settings |
---|---|
work_id |
pattern |
title |
inspect, unique, distinct_from_type, min_word_num, max_word_num |
category |
inspect, ref, patterns |
object_work_type |
inspect, ref, patterns |
classification |
inspect, ref, patterns |
object_description |
inspect, unique, min_word_num, max_word_num |
materials_tech |
inspect, ref |
object_measurements |
inspect |
event |
inspect, ref |
subject_concept |
inspect, ref, min_num |
resource |
inspect |
record_type |
inspect, ref, patterns |
repository_name |
inspect, ref |
record_source |
inspect, ref |
record_rights |
inspect, ref, patterns |
record_info |
inspect |
It is recommended that you output the nfdinspector.lido_inspector.LIDOInspector.configuration
as a JSON file to familiarise yourself with the structure.
This JSON file can also be used as the basis for a new configuration file:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
with open("default_config.json", "w") as outfile:
json.dump(lido_inspector.configuration, outfile, indent=4)
The easiest way to configure LIDOInspector is to read a JSON configuration file with nfdinspector.lido_inspector.LIDOInspector.config_file()
.
The structure of the JSON file must match nfdinspector.lido_inspector.LIDOInspector.configuration
.
Changes to nfdinspector.lido_inspector.LIDOInspector.configuration
can also be made using nfdinspector.lido_inspector.LIDOInspector.configure()
.
Examples:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
# Read a configuration file
lido_inspector.config_file("file_path")
# Change specific configurations
lido_inspector.configure({
"title": {
"inspect": True,
"unique": False,
"distinct_from_type": True,
"min_word_num": 3,
"max_word_num": 12,
}
})
Patterns#
Since version 0.2 it is possible to specify patterns based on regular expressions for some fields. If they do not match, an error message is returned.
For example, you can specify a pattern for the “workID” field. In this case, the pattern must be a sequence of digits with a length of 12:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
lido_inspector.configure({
"work_id": {
"pattern": "^\d{12}$",
}
})
For fields that refer to concepts/entities, patterns can be specified for both the label and the reference. In the following example, the label in the “category” field must be “Human-made object” and the reference must be “http://terminology.lido-schema.org/lido00096”:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
lido_inspector.configure({
"category": {
"patterns": {
"label": "Human-made object"
"ref": "http://terminology.lido-schema.org/lido00096",
}
}
})
Inspections#
Inspections are performed using nfdinspector.lido_inspector.LIDOInspector.inspect()
based on the data read in and the configurations made.
The results are stored in nfdinspector.metadata_inspector.MetadataInspector.inspections
and can be processed further.
Example:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
# Read multiple LIDO files
lido_inspector.read_lido_files("files_path")
# Read a configuration file
lido_inspector.config_file("file_path")
# Perform inspections
lido_inspector.inspect()
nfdinspector.lido_inspector.LIDOInspector.inspect()
performs collective inspections of all configured data fields.
In principle, methods like nfdinspector.lido_inspector.LIDOInspector.inspect_title()
can be used to inspect a specific field directly.
The results are returned and not stored in nfdinspector.metadata_inspector.MetadataInspector.inspections
.
File output#
The results of the inspections can be output as a JSON file using nfdinspector.metadata_inspector.MetadataInspector.to_json()
.
The indentation level can be determined.
They can also be output as a CSV file using nfdinspector.metadata_inspector.MetadataInspector.to_csv()
.
The delimiter can be specified here.
Examples:
from nfdinspector.lido_inspector import LIDOInspector
lido_inspector = LIDOInspector()
# Read multiple LIDO files
lido_inspector.read_lido_files("files_path")
# Read a configuration file
lido_inspector.config_file("file_path")
# Perform inspections
lido_inspector.inspect()
# Output as JSON file
lido_inspector.to_json("file_path", indent=4)
# Output as CSV file
lido_inspector.to_csv("file_path", delimiter=";")