Reference#
nfdinspector.ead_inspector module#
- class nfdinspector.ead_inspector.EADInspector(error_lang: str = 'en')#
Bases:
MetadataInspector
Class for inspectors that examine records in EAD-XML.
- config_file(file_path: str) None #
Read a configuration file and alter the default configurations of an inspector.
- Parameters:
file_path (str) – File path to a JSON file with configurations in the required syntax
- property configuration: dict#
Get or set the configuration. The inspection is carried out based on the configuration.
- configure(config: dict) None #
Alter the default configurations of an inspector.
- Parameters:
config (dict) – Dict of configurations with the syntax of the default configurations
- configure_level(setting: str, change: dict | list) None #
Alter a specific level in the configurations of an inspector.
- Parameters:
setting (str) – Name of the setting which should be altered
change (dict | list) – New configuartions for the specific setting
- configure_setting(setting: str, level: str, change: list | dict) None #
Alter a specific setting in the configurations of an inspector.
- Parameters:
setting (str) – Name of the setting which should be altered
level (str) – Name of the level which should be altered
change (dict | list) – New configuartions for the specific setting
- property cs: list[object]#
Get or set the list of EAD components. These components are examined during the inspection.
- property ead_namespace: str#
Get the EAD namespace when needed for reading attributes.
- inspect() None #
Carry out an inspection based on the read-in EAD components.
- inspect_abstract(c, level: str) list | None #
Inspect abstract.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_daogrp(daogrp) list #
Inspect a digital archival object.
- Parameters:
c (etree._Element) – Component of an EAD record
- Returns:
List of error messages, None if there are no errors
- Return type:
list
- inspect_daos(c, level: str) list | None #
Inspect digital archival objects.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_date(date) list #
Inspect date.
- Parameters:
date (etree._Element) – Date to inspect
- Returns:
List of error messages
- Return type:
list
- inspect_dates(dates: list) list #
Inspect multiple dates.
- Parameters:
dates (list) – Dates to inspect
- Returns:
List of error messages
- Return type:
list
- inspect_dimensions(c, level: str) list | None #
Inspect dimensions.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_extent(c, level: str) list | None #
Inspect extent.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_genreform(c, level: str) list | None #
Inspect genreform.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_id(c) str #
Inspect component ID.
- Parameters:
c (etree._Element) – Component of an EAD record
- Returns:
Component ID or error message if missing
- Return type:
str
- inspect_index(c, level: str) list | None #
Inspect index.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_indexentry(indexentry, level: str) list #
Inspect an index entry.
- Parameters:
indexentry (etree._Element) – XML element of an index entry
level (str) – Level of the inspected EAD component
- Returns:
List of error messages
- Return type:
list
- inspect_language(c, level: str) list | None #
Inspect language.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_materialspec(c, level: str) list | None #
Inspect materialspec.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_origination(origination, level: str) list #
Inspect origination.
- Parameters:
origination (etree._Element) – XML element of origination
level (str) – Level of the inspected EAD component
- Returns:
List of error messages
- Return type:
list
- inspect_originations(c, level: str) list | None #
Inspect originations.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_scopecontent(c, level: str) list | None #
Inspect scope content.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_sub_dating(dates: list, sub_unitid: str, sub_dating: dict) list #
Inspect dates in comparison to sub dates.
- Parameters:
dates (list) – Dates of the inspected component
sub_unitid (str) – Unit id of subordinate component
sub_dating (str) – Dating of subordinate component
- Returns:
List of error messages
- Return type:
list
- inspect_text(element, config: dict) list | None #
Inspect a text element.
- Parameters:
element (etree._Element) – XML element with supposed text
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_unitdates(c, level: str) list | None #
Inspect unit dates.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_unitdates_consistency(unitdates: list, c) list #
Inspect consistency of unit dates.
- Parameters:
unitdates (list) – Unit dates of the inspected component
c (etree._Element) – Component of an EAD record
- Returns:
List of error messages
- Return type:
list
- inspect_unitid(c) str #
Inspect unit ID.
- Parameters:
c (etree._Element) – Component of an EAD record
- Returns:
Unit ID or error message if missing
- Return type:
str
- inspect_unittitle(c, level: str) list | None #
Inspect unit title.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_userestrict(c, level: str) list | None #
Inspect use restrict.
- Parameters:
c (etree._Element) – Component of an EAD record
level (str) – Level of the inspected EAD component
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- is_consistent_date(sub_date: dict, dates: list) bool #
Check if sub date (earliest and latest) is consistent.
- Parameters:
sub_date (dict) – Sub date of a subordinate component
dates (list) – Dates of the superordinate component
- Returns:
True if date is consistent, False if not
- Return type:
bool
- is_consistent_earliest_date(sub_date: dict, date: dict) bool #
Check if sub date (earliest) is consistent.
- Parameters:
sub_date (dict) – Sub date of a subordinate component
date (dict) – Date of the superordinate component
- Returns:
True if date is consistent, False if not
- Return type:
bool
- is_consistent_latest_date(sub_date: dict, date: dict) bool #
Check if sub date (latest) is consistent.
- Parameters:
sub_date (dict) – Sub date of a subordinate component
date (dict) – Date of the superordinate component
- Returns:
True if date is consistent, False if not
- Return type:
bool
- is_future(norm_date: dict) bool #
Check if a date is in the future.
- Parameters:
norm_date (dict) – Normalized form of the inspected date
- Returns:
True if date is in the future, False if not
- Return type:
bool
- normal_date_range(date) dict #
Get normalized date range.
- Parameters:
date (etree._Element) – Date to normalize
- Returns:
Normalized date range
- Return type:
dict
- normalized_unitdates(unitdates: list) list #
Normalize unit dates.
- Parameters:
unitdates (list) – List of unit dates
- Returns:
List of normalized dates
- Return type:
list
- read_ead(xml_str: str) None #
Parse EAD-XML from a string and assign EAD components to the inspector.
- Parameters:
xml_str (str) – String with EAD-XML syntax
- read_ead_file(file_path: str) None #
Parse EAD-XML from a file and assign EAD components to the inspector.
- Parameters:
file_path (str) – File path to a EAD-XML file
- property rights_ead: list#
Get or set the EAD metadata rights.
- subordinate_unitdates(c) dict #
Get all subordinate unit dates of an component.
- Parameters:
c (etree._Element) – Component of an EAD record
- Returns:
Dict of unit dates
- Return type:
dict
nfdinspector.error module#
- class nfdinspector.error.Error(language: str)#
Bases:
object
Class with various error messages for the metadata inspections
- dist(compare: str) str #
Get error message for missing distinction.
- Parameters:
compare (str) – Comparison
- Returns:
Error message
- Return type:
str
- dupl_blanks() str #
Get error message for duplicate blanks.
- Returns:
Error message
- Return type:
str
- dupl_text() str #
Get error message for duplicate text.
- Returns:
Error message
- Return type:
str
- empty_elem(tag: str) str #
Get error message for empty XML element.
- Parameters:
tag (str) – Tag of the concerned element
- Returns:
Error message
- Return type:
str
- few() str #
Get error message for too few entries.
- Returns:
Error message
- Return type:
str
- future(date: str) str #
Get error message for date in future.
- Parameters:
date (str) – Date string
- Returns:
Error message
- Return type:
str
- inconsistent_date(id: str, inconsistency: str) str #
Get error message for missing inconsistent date.
- Parameters:
id (str) – ID of concerned file
inconsistency (str) – Inconsistent date
- Returns:
Error message
- Return type:
str
- property language: str#
Get and set the language for the error messages.
- long() str #
Get error message for length.
- Returns:
Error message
- Return type:
str
- miss_actor(event_type: str) str #
Get error message for missing actor.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_date(event_type: str) str #
Get error message for missing date.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_earl_date(event_type: str) str #
Get error message for missing earliest date.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_event_info(event_type: str) str #
Get error message for missing event info.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_event_type() str #
Get error message for missing event type.
- Returns:
Error message
- Return type:
str
- miss_info() str #
Get error message for missing information.
- Returns:
Error message
- Return type:
str
- miss_label(id: str) str #
Get error message for missing label.
- Parameters:
id (str) – ID of the concerned entity
- Returns:
Error message
- Return type:
str
- miss_lang_code() str #
Get error message for missing language code.
- Returns:
Error message
- Return type:
str
- miss_lat_date(event_type: str) str #
Get error message for missing latest date.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_link() str #
Get error message for missing link.
- Returns:
Error message
- Return type:
str
- miss_mat() str #
Get error message for missing explicit material.
- Returns:
Error message
- Return type:
str
- miss_meas_type() str #
Get error message for missing measurement type.
- Returns:
Error message
- Return type:
str
- miss_meas_unit(meas_type: str) str #
Get error message for missing measurement unit.
- Parameters:
meas_type (str) – Measurement type
- Returns:
Error message
- Return type:
str
- miss_meas_value(meas_type: str) str #
Get error message for missing measurement value.
- Parameters:
meas_type (str) – Measurement type
- Returns:
Error message
- Return type:
str
- miss_norm_date(text_date: str) str #
Get error message for missing normalized date.
- Parameters:
text_date (str) – Date as text
- Returns:
Error message
- Return type:
str
- miss_norm_term(term: str) str #
Get error message for missing normalized term.
- Parameters:
term (str) – Term that is not normalized
- Returns:
Error message
- Return type:
str
- miss_place(event_type: str) str #
Get error message for missing place.
- Parameters:
event_type (str) – Event type
- Returns:
Error message
- Return type:
str
- miss_ref(label: str) str #
Get error message for missing reference/ID.
- Parameters:
label (str) – Label of the concerned entity
- Returns:
Error message
- Return type:
str
- miss_res_type(add: str) str #
Get error message for missing resource type.
- Parameters:
add (str) – Additional information
- Returns:
Error message
- Return type:
str
- miss_rights(add: str) str #
Get error message for missing rights statement.
- Parameters:
add (str) – Additional information
- Returns:
Error message
- Return type:
str
- miss_tech() str #
Get error message for missing explicit technique.
- Returns:
Error message
- Return type:
str
- not_uniq() str #
Get error message for text that is not unique.
- Returns:
Error message
- Return type:
str
- pattern(add: str) str #
Get error message for wrong pattern.
- Parameters:
add (str) – Additional information
- Returns:
Error message
- Return type:
str
- short() str #
Get error message for shortness.
- Returns:
Error message
- Return type:
str
nfdinspector.lido_inspector module#
- class nfdinspector.lido_inspector.LIDOInspector(error_lang: str = 'en')#
Bases:
MetadataInspector
Class for inspectors that examine records in LIDO-XML.
- about(element) str #
Get value from about attribute.
- Parameters:
element (etree._Element) – XML element with supposed about attribute
- Returns:
Value of about
- Return type:
str
- actor_id(parent) str #
Get ID of an actor.
- Parameters:
parent (etree._Element) – Parent element of the supposed actor element.
- Returns:
actorID
- Return type:
str
- concept_id(parent) str #
Get ID of a concept.
- Parameters:
parent (etree._Element) – Parent element of the supposed conceptID or Concept element.
- Returns:
conceptID or Concept
- Return type:
str
- config_file(file_path: str) None #
Read a configuration file and alter the default configurations of an inspector.
- Parameters:
file_path (str) – File path to a JSON file with configurations in the required syntax
- property configuration: dict#
Get or set the configuration. The inspection is carried out based on the configuration.
- configure(config: dict) None #
Alter the default configurations of an inspector.
- Parameters:
config (dict) – Dict of configurations with the syntax of the default configurations
- configure_setting(setting: str, change: dict) None #
Alter a specific setting in the configurations of an inspector.
- Parameters:
setting (str) – Name of the setting to change
change (dict) – Desired change for the specific setting
- property duplicate_descriptions: set#
Get or set the set of duplicate descriptions.
- property duplicate_titles: set#
Get or set the set of duplicate titles.
- find_duplicate_descriptions() set #
Find duplicate descriptions.
- Returns:
All duplicate descriptions in lido_objects
- Return type:
set
- find_duplicate_titles() set #
Find duplicate titles.
- Returns:
All duplicate titles in lido_objects
- Return type:
set
- find_duplicates(xpath: str) set #
Find duplicates based on an XPATH expression.
- Parameters:
xpath (str) – XPATH expression
- Returns:
All duplicate titles in lido_objects
- Return type:
set
- has_material(materials_tech: list) bool #
Check if record contains information about material.
- Parameters:
materials_tech (list) – XML elements of materials/techniques
- Returns:
True if record contains information about material, False if not
- Return type:
bool
- has_tech(materials_tech: list) bool #
Check if record contains information about technique.
- Parameters:
materials_tech (list) – XML elements of materials/techniques
- Returns:
True if record contains information about technique, False if not
- Return type:
bool
- has_valid_type(elements: list, valid_types: list) bool #
Check if elements have valid types.
- Parameters:
elements (list) – Elements with supposed type attributes
- Returns:
True if type is valid, False if not
- Return type:
bool
- inspect() None #
Carry out an inspection based on the read-in LIDO records.
- inspect_actor(actor, event_type, config: dict) list #
Inspect actor.
- Parameters:
actor (etree._Element) – XML element of supposed actor
event_type (etree._Element) – XML element of corresponding event type
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- inspect_actors(actors: list, event_type, config: dict) list #
Inspect multiple actors.
- Parameters:
actors (etree._Element) – XML elements of supposed actors
event_type (etree._Element) – XML element of corresponding event type
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- inspect_category(lido_object) list | None #
Inspect category.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_classifications(lido_object) list | None #
Inspect classifications.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_concept(concept, config: dict) list #
Inspect concept.
- Parameters:
concept (etree._Element) – XML element of the concept
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- inspect_concepts(concept_list: list, config: dict) list | None #
Inspect multiple concepts.
- Parameters:
concept_list (list) – List of XML elements of concepts
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_date(date, event_type) list #
Inspect date.
- Parameters:
date (etree._Element) – XML element of supposed date
event_type (etree._Element) – XML element of corresponding event type
- Returns:
List of error messages
- Return type:
list
- inspect_event(event) list #
Inspect event.
- Parameters:
event (etree._Element) – XML element of supposed event
- Returns:
List of error messages
- Return type:
list
- inspect_event_type(event_type) list #
Inspect event type.
- Parameters:
event_type (etree._Element) – XML element of supposed event type
- Returns:
List of error messages
- Return type:
list
- inspect_events(lido_object) list | None #
Inspect events.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_lido_rec_id(lido_object) str #
Inspect record ID.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
Record ID or error message if missing
- Return type:
str
- inspect_materials_tech(lido_object) list | None #
Inspect materials and techniques.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_measurements_set(measurements_set) list #
Inspect a measurements set.
- Parameters:
measurements_set (etree._Element) – XML element of supposed measurements set
- Returns:
List of error messages
- Return type:
list
- inspect_object_description(lido_object) list | None #
Inspect object description.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_object_measurements(lido_object) list | None #
Inspect objects measurements.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_object_work_types(lido_object) list | None #
Inspect object/work types.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_place(place, event_type, config: dict) list #
Inspect place.
- Parameters:
place (etree._Element) – XML element of supposed place
event_type (etree._Element) – XML element of corresponding event type
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- inspect_places(places: list, event_type, config: dict) list #
Inspect multiple places.
- Parameters:
places (etree._Element) – XML elements of supposed places
event_type (etree._Element) – XML element of corresponding event type
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- inspect_record_info_set(lido_object) list | None #
Inspect record information.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_record_rights(lido_object) list | None #
Inspect record rights.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_record_source(record_source) list #
Inspect record source.
- Parameters:
record_source (etree._Element) – XML element of supposed record source
- Returns:
List of error messages
- Return type:
list
- inspect_record_sources(lido_object) list | None #
Inspect record sources.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_record_type(lido_object) list | None #
Inspect record type.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_repository_name(lido_object) list | None #
Inspect repository name.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_resource_set(resource_set) list #
Inspect resource set.
- Parameters:
resource_set (etree._Element) – XML element of supposed resource set
- Returns:
List of error messages
- Return type:
list
- inspect_resource_sets(lido_object) list | None #
Inspect resource sets.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_subject_concepts(lido_object) list | None #
Inspect subject concepts.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_text(element, lido_object, config: dict) list | None #
Inspect a text element.
- Parameters:
element (etree._Element) – XML element with supposed text
lido_object (etree._Element) – Record of an object in LIDO-XML
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_title(lido_object) list | None #
Inspect title.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
List of error messages, None if there are no errors
- Return type:
list | None
- inspect_work_id(lido_object) str #
Inspect work ID.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
- Returns:
Work ID or error message if missing
- Return type:
str
- is_distinct_from_type(lido_object, value: str) bool #
Check if title is distinct from object/work type.
- Parameters:
lido_object (etree._Element) – Record of an object in LIDO-XML
value (str) – Text value of the inspected record and element
- Returns:
True if title is distinct from object/work type, False if not
- Return type:
bool
- is_uniq(text: str, element) bool #
Check if title or object description is unique.
- Parameters:
text (str) – Text that is checked
element (etree._Element) – XML element with supposed text
- Returns:
True if title/description is unique, False if not
- Return type:
bool
- legal_body_id(parent) str #
Get ID of a legal body.
- Parameters:
parent (etree._Element) – Parent element of the supposed legal body element.
- Returns:
legalBodyID
- Return type:
str
- property lido_namespace: str#
Get the LIDO namespace when needed for reading attributes.
- property lido_objects: list#
Get or set the list of LIDO records. These records are examined during the inspection.
- lido_type(element) str #
Get value from type attribute.
- Parameters:
element (etree._Element) – XML element with supposed type attribute
- Returns:
Value of type
- Return type:
str
- meas_type(measurement_type) str #
Get text or ID of measurement type.
- Parameters:
measurement_type (etree._Element) – XML element of measurement type.
- Returns:
Text or ID
- Return type:
str
- place_id(parent) str #
Get ID of a place.
- Parameters:
parent (etree._Element) – Parent element of the supposed place element.
- Returns:
placeID
- Return type:
str
- read_lido(xml_str: str) None #
Parse LIDO-XML from a string and assign LIDO records to the inspector.
- Parameters:
xml_str (str) – String with LIDO-XML syntax
- read_lido_file(file_path: str) None #
Parse LIDO-XML from a file and assign LIDO records to the inspector.
- Parameters:
file_path (str) – File path to a LIDO-XML file
- read_lido_files(files_path: str) None #
Parse LIDO-XML from multiple files in a folder and assign LIDO records to the inspector.
- Parameters:
files_path – Path to a folder with LIDO-XML files
- summarize_event_messages(messages: list, event_type: str) list #
Summarize several event-specific error messages (missing actor, place and date).
- Parameters:
messages (list) – Error messages of an event
event_type (etree._Element) – XML element of corresponding event type
- Returns:
List of error messages
- Return type:
list
- term(parent) str #
Get term or prefLabel of a concept.
- Parameters:
parent (etree._Element) – Parent element of the supposed term or prefLabel element.
- Returns:
Term or label
- Return type:
str
- value(parent) str #
Get value (appellation or descriptiveNote) of a text field.
- Parameters:
parent (etree._Element) – Parent element of the supposed value element.
- Returns:
Text value
- Return type:
str
nfdinspector.metadata_inspector module#
- class nfdinspector.metadata_inspector.MetadataInspector(error_lang: str = 'en')#
Bases:
object
Super class for various metadata standard-specific inspectors.
- attr(element, attribute_name: str) str #
Get attribute text from an XML element.
- Parameters:
element (etree._Element) – XML element with supposed attribute
attribute_name (str) – Supposed attribute name
- Returns:
Attribute text from an XML Element
- Return type:
str
- create_element(tag_name: str = 'element', text: str = '')#
Create an XML element from a tag name and text.
- Parameters:
tag_name (str) – Tag name for the XML element
text (str) – Text for the XML element
- Returns:
XML element
- Return type:
etree._Element
- date_object(date_str: str)#
Get a date object from a date string (ISO 8601).
- Parameters:
date_str (str) – Date string (ISO 8601)
- Returns:
Date object if valid ISO 8601 format, None if not valid
- Return type:
datetime.date | None
- date_range(date_str: str) dict #
Split a date to earliest and latest date.
- Parameters:
date_str (str) – Date string (ISO 8601)
- Returns:
Dict with date objects where earliest and latest date are separated
- Return type:
dict
- property error: Error#
Get or set an Error object. The Error object is needed for adding error messages to the inspections
- exists(element) bool #
Check if an XML element exists.
- Parameters:
element (etree._Element) – Supposed XML element
- Returns:
True if element exists, False if not
- Return type:
bool
- has_attribute(element, attribute_name: str) bool #
Check if an XML element has a specific attribute.
- Parameters:
element (etree._Element) – XML element with supposed attribute
attribute_name (str) – Supposed attribute name
- Returns:
True if element has a specific attribute, False if not
- Return type:
bool
- has_duplicate_blanks(text: str) bool #
Check if a text has duplicate blanks.
- Parameters:
text (str) – Text with possible duplicate blanks
- Returns:
True if text has duplicate blanks, False if not
- Return type:
bool
- has_subelems(element) bool #
Check if an XML element has subelements.
- Parameters:
element (etree._Element) – XML element with supposed subelements
- Returns:
True if element has subelements, False if not
- Return type:
bool
- has_text(element) bool #
Check if an XML element has text.
- Parameters:
element (etree._Element) – XML element with supposed text
- Returns:
True if element has text, False if not
- Return type:
bool
- inspect_entity(label: str, entity_id: str, config: dict) list #
Inspect label and ID of an entity (person, organisation etc.).
- Parameters:
label (str) – Label of an entity
entity_id (str) – ID of an entity
config (dict) – Configuration of the specific inspection.
- Returns:
List of error messages
- Return type:
list
- property inspections: list#
Get or set the inspections list. The list is filled while inspecting a data set.
- property rdf_namespace: str#
Get the RDF namespace when needed for reading attributes.
- static read_xml(xml_str: str)#
Parse XML from a string.
- Parameters:
xml_str (str) – String with XML syntax
- Returns:
Root element of an ElementTree
- Return type:
etree._Element
- static read_xml_file(file_path: str)#
Parse XML from a file.
- Parameters:
file_path (str) – File path to a XML file
- Returns:
Root element of an ElementTree
- Return type:
etree._Element
- static read_xml_files(files_path: str) list #
Parse XML from multiple XML files in a folder.
- Parameters:
file_path (str) – File path to a folder with XML files
- Returns:
List of root elements of multiple ElementTrees
- Return type:
list
- text(element) str #
Get text from an XML element.
- Parameters:
element (etree._Element) – XML element with supposed text
- Returns:
Text from an XML Element
- Return type:
str
- to_csv(file_path: str, delimiter: str = ',') None #
Generate a CSV file of the inspections.
- Parameters:
file_path (str) – File path for the CSV file
delimiter (str) – Delimiter for the columns in the CSV file
- to_json(file_path: str, indent: int | str | None = None) None #
Generate a JSON file of the inspections.
- Parameters:
file_path (str) – File path for the JSON file
indent (int | str | None) – Indent level of the JSON file
- property xlink_namespace: str#
Get the XLINK namespace when needed for reading attributes.
Module contents#
NFDInspector By Andreas Ketelaer andreas.ketelaer@bergbaumuseum.de
A Python package to inspect formal quality problems in research data.