Reference#

nfdinspector.ead_inspector module#

class nfdinspector.ead_inspector.EADInspector(error_lang: str = 'en')#

Bases: MetadataInspector

Class for inspectors that examine records in EAD-XML.

config_file(file_path: str) None#

Read a configuration file and alter the default configurations of an inspector.

Parameters:

file_path (str) – File path to a JSON file with configurations in the required syntax

property configuration: dict#

Get or set the configuration. The inspection is carried out based on the configuration.

configure(config: dict) None#

Alter the default configurations of an inspector.

Parameters:

config (dict) – Dict of configurations with the syntax of the default configurations

configure_level(setting: str, change: dict | list) None#

Alter a specific level in the configurations of an inspector.

Parameters:
  • setting (str) – Name of the setting which should be altered

  • change (dict | list) – New configuartions for the specific setting

configure_setting(setting: str, level: str, change: list | dict) None#

Alter a specific setting in the configurations of an inspector.

Parameters:
  • setting (str) – Name of the setting which should be altered

  • level (str) – Name of the level which should be altered

  • change (dict | list) – New configuartions for the specific setting

property cs: list[object]#

Get or set the list of EAD components. These components are examined during the inspection.

property ead_namespace: str#

Get the EAD namespace when needed for reading attributes.

inspect() None#

Carry out an inspection based on the read-in EAD components.

inspect_abstract(c, level: str) list | None#

Inspect abstract.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_daogrp(daogrp) list#

Inspect a digital archival object.

Parameters:

c (etree._Element) – Component of an EAD record

Returns:

List of error messages, None if there are no errors

Return type:

list

inspect_daos(c, level: str) list | None#

Inspect digital archival objects.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_date(date) list#

Inspect date.

Parameters:

date (etree._Element) – Date to inspect

Returns:

List of error messages

Return type:

list

inspect_dates(dates: list) list#

Inspect multiple dates.

Parameters:

dates (list) – Dates to inspect

Returns:

List of error messages

Return type:

list

inspect_dimensions(c, level: str) list | None#

Inspect dimensions.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_extent(c, level: str) list | None#

Inspect extent.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_genreform(c, level: str) list | None#

Inspect genreform.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_id(c) str#

Inspect component ID.

Parameters:

c (etree._Element) – Component of an EAD record

Returns:

Component ID or error message if missing

Return type:

str

inspect_index(c, level: str) list | None#

Inspect index.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_indexentry(indexentry, level: str) list#

Inspect an index entry.

Parameters:
  • indexentry (etree._Element) – XML element of an index entry

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages

Return type:

list

inspect_language(c, level: str) list | None#

Inspect language.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_materialspec(c, level: str) list | None#

Inspect materialspec.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_origination(origination, level: str) list#

Inspect origination.

Parameters:
  • origination (etree._Element) – XML element of origination

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages

Return type:

list

inspect_originations(c, level: str) list | None#

Inspect originations.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_scopecontent(c, level: str) list | None#

Inspect scope content.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_sub_dating(dates: list, sub_unitid: str, sub_dating: dict) list#

Inspect dates in comparison to sub dates.

Parameters:
  • dates (list) – Dates of the inspected component

  • sub_unitid (str) – Unit id of subordinate component

  • sub_dating (str) – Dating of subordinate component

Returns:

List of error messages

Return type:

list

inspect_text(element, config: dict) list | None#

Inspect a text element.

Parameters:
  • element (etree._Element) – XML element with supposed text

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_unitdates(c, level: str) list | None#

Inspect unit dates.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_unitdates_consistency(unitdates: list, c) list#

Inspect consistency of unit dates.

Parameters:
  • unitdates (list) – Unit dates of the inspected component

  • c (etree._Element) – Component of an EAD record

Returns:

List of error messages

Return type:

list

inspect_unitid(c) str#

Inspect unit ID.

Parameters:

c (etree._Element) – Component of an EAD record

Returns:

Unit ID or error message if missing

Return type:

str

inspect_unittitle(c, level: str) list | None#

Inspect unit title.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_userestrict(c, level: str) list | None#

Inspect use restrict.

Parameters:
  • c (etree._Element) – Component of an EAD record

  • level (str) – Level of the inspected EAD component

Returns:

List of error messages, None if there are no errors

Return type:

list | None

is_consistent_date(sub_date: dict, dates: list) bool#

Check if sub date (earliest and latest) is consistent.

Parameters:
  • sub_date (dict) – Sub date of a subordinate component

  • dates (list) – Dates of the superordinate component

Returns:

True if date is consistent, False if not

Return type:

bool

is_consistent_earliest_date(sub_date: dict, date: dict) bool#

Check if sub date (earliest) is consistent.

Parameters:
  • sub_date (dict) – Sub date of a subordinate component

  • date (dict) – Date of the superordinate component

Returns:

True if date is consistent, False if not

Return type:

bool

is_consistent_latest_date(sub_date: dict, date: dict) bool#

Check if sub date (latest) is consistent.

Parameters:
  • sub_date (dict) – Sub date of a subordinate component

  • date (dict) – Date of the superordinate component

Returns:

True if date is consistent, False if not

Return type:

bool

is_future(norm_date: dict) bool#

Check if a date is in the future.

Parameters:

norm_date (dict) – Normalized form of the inspected date

Returns:

True if date is in the future, False if not

Return type:

bool

normal_date_range(date) dict#

Get normalized date range.

Parameters:

date (etree._Element) – Date to normalize

Returns:

Normalized date range

Return type:

dict

normalized_unitdates(unitdates: list) list#

Normalize unit dates.

Parameters:

unitdates (list) – List of unit dates

Returns:

List of normalized dates

Return type:

list

read_ead(xml_str: str) None#

Parse EAD-XML from a string and assign EAD components to the inspector.

Parameters:

xml_str (str) – String with EAD-XML syntax

read_ead_file(file_path: str) None#

Parse EAD-XML from a file and assign EAD components to the inspector.

Parameters:

file_path (str) – File path to a EAD-XML file

property rights_ead: list#

Get or set the EAD metadata rights.

subordinate_unitdates(c) dict#

Get all subordinate unit dates of an component.

Parameters:

c (etree._Element) – Component of an EAD record

Returns:

Dict of unit dates

Return type:

dict

nfdinspector.error module#

class nfdinspector.error.Error(language: str)#

Bases: object

Class with various error messages for the metadata inspections

dist(compare: str) str#

Get error message for missing distinction.

Parameters:

compare (str) – Comparison

Returns:

Error message

Return type:

str

dupl_blanks() str#

Get error message for duplicate blanks.

Returns:

Error message

Return type:

str

dupl_text() str#

Get error message for duplicate text.

Returns:

Error message

Return type:

str

empty_elem(tag: str) str#

Get error message for empty XML element.

Parameters:

tag (str) – Tag of the concerned element

Returns:

Error message

Return type:

str

few() str#

Get error message for too few entries.

Returns:

Error message

Return type:

str

future(date: str) str#

Get error message for date in future.

Parameters:

date (str) – Date string

Returns:

Error message

Return type:

str

inconsistent_date(id: str, inconsistency: str) str#

Get error message for missing inconsistent date.

Parameters:
  • id (str) – ID of concerned file

  • inconsistency (str) – Inconsistent date

Returns:

Error message

Return type:

str

property language: str#

Get and set the language for the error messages.

long() str#

Get error message for length.

Returns:

Error message

Return type:

str

miss_actor(event_type: str) str#

Get error message for missing actor.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

miss_date(event_type: str) str#

Get error message for missing date.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

miss_earl_date(event_type: str) str#

Get error message for missing earliest date.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

miss_event_info(event_type: str) str#

Get error message for missing event info.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

miss_event_type() str#

Get error message for missing event type.

Returns:

Error message

Return type:

str

miss_info() str#

Get error message for missing information.

Returns:

Error message

Return type:

str

miss_label(id: str) str#

Get error message for missing label.

Parameters:

id (str) – ID of the concerned entity

Returns:

Error message

Return type:

str

miss_lang_code() str#

Get error message for missing language code.

Returns:

Error message

Return type:

str

miss_lat_date(event_type: str) str#

Get error message for missing latest date.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

Get error message for missing link.

Returns:

Error message

Return type:

str

miss_mat() str#

Get error message for missing explicit material.

Returns:

Error message

Return type:

str

miss_meas_type() str#

Get error message for missing measurement type.

Returns:

Error message

Return type:

str

miss_meas_unit(meas_type: str) str#

Get error message for missing measurement unit.

Parameters:

meas_type (str) – Measurement type

Returns:

Error message

Return type:

str

miss_meas_value(meas_type: str) str#

Get error message for missing measurement value.

Parameters:

meas_type (str) – Measurement type

Returns:

Error message

Return type:

str

miss_norm_date(text_date: str) str#

Get error message for missing normalized date.

Parameters:

text_date (str) – Date as text

Returns:

Error message

Return type:

str

miss_norm_term(term: str) str#

Get error message for missing normalized term.

Parameters:

term (str) – Term that is not normalized

Returns:

Error message

Return type:

str

miss_place(event_type: str) str#

Get error message for missing place.

Parameters:

event_type (str) – Event type

Returns:

Error message

Return type:

str

miss_ref(label: str) str#

Get error message for missing reference/ID.

Parameters:

label (str) – Label of the concerned entity

Returns:

Error message

Return type:

str

miss_res_type(add: str) str#

Get error message for missing resource type.

Parameters:

add (str) – Additional information

Returns:

Error message

Return type:

str

miss_rights(add: str) str#

Get error message for missing rights statement.

Parameters:

add (str) – Additional information

Returns:

Error message

Return type:

str

miss_tech() str#

Get error message for missing explicit technique.

Returns:

Error message

Return type:

str

not_uniq() str#

Get error message for text that is not unique.

Returns:

Error message

Return type:

str

pattern(add: str) str#

Get error message for wrong pattern.

Parameters:

add (str) – Additional information

Returns:

Error message

Return type:

str

short() str#

Get error message for shortness.

Returns:

Error message

Return type:

str

nfdinspector.lido_inspector module#

class nfdinspector.lido_inspector.LIDOInspector(error_lang: str = 'en')#

Bases: MetadataInspector

Class for inspectors that examine records in LIDO-XML.

about(element) str#

Get value from about attribute.

Parameters:

element (etree._Element) – XML element with supposed about attribute

Returns:

Value of about

Return type:

str

actor_id(parent) str#

Get ID of an actor.

Parameters:

parent (etree._Element) – Parent element of the supposed actor element.

Returns:

actorID

Return type:

str

concept_id(parent) str#

Get ID of a concept.

Parameters:

parent (etree._Element) – Parent element of the supposed conceptID or Concept element.

Returns:

conceptID or Concept

Return type:

str

config_file(file_path: str) None#

Read a configuration file and alter the default configurations of an inspector.

Parameters:

file_path (str) – File path to a JSON file with configurations in the required syntax

property configuration: dict#

Get or set the configuration. The inspection is carried out based on the configuration.

configure(config: dict) None#

Alter the default configurations of an inspector.

Parameters:

config (dict) – Dict of configurations with the syntax of the default configurations

configure_setting(setting: str, change: dict) None#

Alter a specific setting in the configurations of an inspector.

Parameters:
  • setting (str) – Name of the setting to change

  • change (dict) – Desired change for the specific setting

property duplicate_descriptions: set#

Get or set the set of duplicate descriptions.

property duplicate_titles: set#

Get or set the set of duplicate titles.

find_duplicate_descriptions() set#

Find duplicate descriptions.

Returns:

All duplicate descriptions in lido_objects

Return type:

set

find_duplicate_titles() set#

Find duplicate titles.

Returns:

All duplicate titles in lido_objects

Return type:

set

find_duplicates(xpath: str) set#

Find duplicates based on an XPATH expression.

Parameters:

xpath (str) – XPATH expression

Returns:

All duplicate titles in lido_objects

Return type:

set

has_material(materials_tech: list) bool#

Check if record contains information about material.

Parameters:

materials_tech (list) – XML elements of materials/techniques

Returns:

True if record contains information about material, False if not

Return type:

bool

has_tech(materials_tech: list) bool#

Check if record contains information about technique.

Parameters:

materials_tech (list) – XML elements of materials/techniques

Returns:

True if record contains information about technique, False if not

Return type:

bool

has_valid_type(elements: list, valid_types: list) bool#

Check if elements have valid types.

Parameters:

elements (list) – Elements with supposed type attributes

Returns:

True if type is valid, False if not

Return type:

bool

inspect() None#

Carry out an inspection based on the read-in LIDO records.

inspect_actor(actor, event_type, config: dict) list#

Inspect actor.

Parameters:
  • actor (etree._Element) – XML element of supposed actor

  • event_type (etree._Element) – XML element of corresponding event type

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

inspect_actors(actors: list, event_type, config: dict) list#

Inspect multiple actors.

Parameters:
  • actors (etree._Element) – XML elements of supposed actors

  • event_type (etree._Element) – XML element of corresponding event type

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

inspect_category(lido_object) list | None#

Inspect category.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_classifications(lido_object) list | None#

Inspect classifications.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_concept(concept, config: dict) list#

Inspect concept.

Parameters:
  • concept (etree._Element) – XML element of the concept

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

inspect_concepts(concept_list: list, config: dict) list | None#

Inspect multiple concepts.

Parameters:
  • concept_list (list) – List of XML elements of concepts

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_date(date, event_type) list#

Inspect date.

Parameters:
  • date (etree._Element) – XML element of supposed date

  • event_type (etree._Element) – XML element of corresponding event type

Returns:

List of error messages

Return type:

list

inspect_event(event) list#

Inspect event.

Parameters:

event (etree._Element) – XML element of supposed event

Returns:

List of error messages

Return type:

list

inspect_event_type(event_type) list#

Inspect event type.

Parameters:

event_type (etree._Element) – XML element of supposed event type

Returns:

List of error messages

Return type:

list

inspect_events(lido_object) list | None#

Inspect events.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_lido_rec_id(lido_object) str#

Inspect record ID.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

Record ID or error message if missing

Return type:

str

inspect_materials_tech(lido_object) list | None#

Inspect materials and techniques.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_measurements_set(measurements_set) list#

Inspect a measurements set.

Parameters:

measurements_set (etree._Element) – XML element of supposed measurements set

Returns:

List of error messages

Return type:

list

inspect_object_description(lido_object) list | None#

Inspect object description.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_object_measurements(lido_object) list | None#

Inspect objects measurements.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_object_work_types(lido_object) list | None#

Inspect object/work types.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_place(place, event_type, config: dict) list#

Inspect place.

Parameters:
  • place (etree._Element) – XML element of supposed place

  • event_type (etree._Element) – XML element of corresponding event type

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

inspect_places(places: list, event_type, config: dict) list#

Inspect multiple places.

Parameters:
  • places (etree._Element) – XML elements of supposed places

  • event_type (etree._Element) – XML element of corresponding event type

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

inspect_record_info_set(lido_object) list | None#

Inspect record information.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_record_rights(lido_object) list | None#

Inspect record rights.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_record_source(record_source) list#

Inspect record source.

Parameters:

record_source (etree._Element) – XML element of supposed record source

Returns:

List of error messages

Return type:

list

inspect_record_sources(lido_object) list | None#

Inspect record sources.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_record_type(lido_object) list | None#

Inspect record type.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_repository_name(lido_object) list | None#

Inspect repository name.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_resource_set(resource_set) list#

Inspect resource set.

Parameters:

resource_set (etree._Element) – XML element of supposed resource set

Returns:

List of error messages

Return type:

list

inspect_resource_sets(lido_object) list | None#

Inspect resource sets.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_subject_concepts(lido_object) list | None#

Inspect subject concepts.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_text(element, lido_object, config: dict) list | None#

Inspect a text element.

Parameters:
  • element (etree._Element) – XML element with supposed text

  • lido_object (etree._Element) – Record of an object in LIDO-XML

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_title(lido_object) list | None#

Inspect title.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

List of error messages, None if there are no errors

Return type:

list | None

inspect_work_id(lido_object) str#

Inspect work ID.

Parameters:

lido_object (etree._Element) – Record of an object in LIDO-XML

Returns:

Work ID or error message if missing

Return type:

str

is_distinct_from_type(lido_object, value: str) bool#

Check if title is distinct from object/work type.

Parameters:
  • lido_object (etree._Element) – Record of an object in LIDO-XML

  • value (str) – Text value of the inspected record and element

Returns:

True if title is distinct from object/work type, False if not

Return type:

bool

is_uniq(text: str, element) bool#

Check if title or object description is unique.

Parameters:
  • text (str) – Text that is checked

  • element (etree._Element) – XML element with supposed text

Returns:

True if title/description is unique, False if not

Return type:

bool

legal_body_id(parent) str#

Get ID of a legal body.

Parameters:

parent (etree._Element) – Parent element of the supposed legal body element.

Returns:

legalBodyID

Return type:

str

property lido_namespace: str#

Get the LIDO namespace when needed for reading attributes.

property lido_objects: list#

Get or set the list of LIDO records. These records are examined during the inspection.

lido_type(element) str#

Get value from type attribute.

Parameters:

element (etree._Element) – XML element with supposed type attribute

Returns:

Value of type

Return type:

str

meas_type(measurement_type) str#

Get text or ID of measurement type.

Parameters:

measurement_type (etree._Element) – XML element of measurement type.

Returns:

Text or ID

Return type:

str

place_id(parent) str#

Get ID of a place.

Parameters:

parent (etree._Element) – Parent element of the supposed place element.

Returns:

placeID

Return type:

str

read_lido(xml_str: str) None#

Parse LIDO-XML from a string and assign LIDO records to the inspector.

Parameters:

xml_str (str) – String with LIDO-XML syntax

read_lido_file(file_path: str) None#

Parse LIDO-XML from a file and assign LIDO records to the inspector.

Parameters:

file_path (str) – File path to a LIDO-XML file

read_lido_files(files_path: str) None#

Parse LIDO-XML from multiple files in a folder and assign LIDO records to the inspector.

Parameters:

files_path – Path to a folder with LIDO-XML files

summarize_event_messages(messages: list, event_type: str) list#

Summarize several event-specific error messages (missing actor, place and date).

Parameters:
  • messages (list) – Error messages of an event

  • event_type (etree._Element) – XML element of corresponding event type

Returns:

List of error messages

Return type:

list

term(parent) str#

Get term or prefLabel of a concept.

Parameters:

parent (etree._Element) – Parent element of the supposed term or prefLabel element.

Returns:

Term or label

Return type:

str

value(parent) str#

Get value (appellation or descriptiveNote) of a text field.

Parameters:

parent (etree._Element) – Parent element of the supposed value element.

Returns:

Text value

Return type:

str

nfdinspector.metadata_inspector module#

class nfdinspector.metadata_inspector.MetadataInspector(error_lang: str = 'en')#

Bases: object

Super class for various metadata standard-specific inspectors.

attr(element, attribute_name: str) str#

Get attribute text from an XML element.

Parameters:
  • element (etree._Element) – XML element with supposed attribute

  • attribute_name (str) – Supposed attribute name

Returns:

Attribute text from an XML Element

Return type:

str

create_element(tag_name: str = 'element', text: str = '')#

Create an XML element from a tag name and text.

Parameters:
  • tag_name (str) – Tag name for the XML element

  • text (str) – Text for the XML element

Returns:

XML element

Return type:

etree._Element

date_object(date_str: str)#

Get a date object from a date string (ISO 8601).

Parameters:

date_str (str) – Date string (ISO 8601)

Returns:

Date object if valid ISO 8601 format, None if not valid

Return type:

datetime.date | None

date_range(date_str: str) dict#

Split a date to earliest and latest date.

Parameters:

date_str (str) – Date string (ISO 8601)

Returns:

Dict with date objects where earliest and latest date are separated

Return type:

dict

property error: Error#

Get or set an Error object. The Error object is needed for adding error messages to the inspections

exists(element) bool#

Check if an XML element exists.

Parameters:

element (etree._Element) – Supposed XML element

Returns:

True if element exists, False if not

Return type:

bool

has_attribute(element, attribute_name: str) bool#

Check if an XML element has a specific attribute.

Parameters:
  • element (etree._Element) – XML element with supposed attribute

  • attribute_name (str) – Supposed attribute name

Returns:

True if element has a specific attribute, False if not

Return type:

bool

has_duplicate_blanks(text: str) bool#

Check if a text has duplicate blanks.

Parameters:

text (str) – Text with possible duplicate blanks

Returns:

True if text has duplicate blanks, False if not

Return type:

bool

has_subelems(element) bool#

Check if an XML element has subelements.

Parameters:

element (etree._Element) – XML element with supposed subelements

Returns:

True if element has subelements, False if not

Return type:

bool

has_text(element) bool#

Check if an XML element has text.

Parameters:

element (etree._Element) – XML element with supposed text

Returns:

True if element has text, False if not

Return type:

bool

inspect_entity(label: str, entity_id: str, config: dict) list#

Inspect label and ID of an entity (person, organisation etc.).

Parameters:
  • label (str) – Label of an entity

  • entity_id (str) – ID of an entity

  • config (dict) – Configuration of the specific inspection.

Returns:

List of error messages

Return type:

list

property inspections: list#

Get or set the inspections list. The list is filled while inspecting a data set.

property rdf_namespace: str#

Get the RDF namespace when needed for reading attributes.

static read_xml(xml_str: str)#

Parse XML from a string.

Parameters:

xml_str (str) – String with XML syntax

Returns:

Root element of an ElementTree

Return type:

etree._Element

static read_xml_file(file_path: str)#

Parse XML from a file.

Parameters:

file_path (str) – File path to a XML file

Returns:

Root element of an ElementTree

Return type:

etree._Element

static read_xml_files(files_path: str) list#

Parse XML from multiple XML files in a folder.

Parameters:

file_path (str) – File path to a folder with XML files

Returns:

List of root elements of multiple ElementTrees

Return type:

list

text(element) str#

Get text from an XML element.

Parameters:

element (etree._Element) – XML element with supposed text

Returns:

Text from an XML Element

Return type:

str

to_csv(file_path: str, delimiter: str = ',') None#

Generate a CSV file of the inspections.

Parameters:
  • file_path (str) – File path for the CSV file

  • delimiter (str) – Delimiter for the columns in the CSV file

to_json(file_path: str, indent: int | str | None = None) None#

Generate a JSON file of the inspections.

Parameters:
  • file_path (str) – File path for the JSON file

  • indent (int | str | None) – Indent level of the JSON file

Get the XLINK namespace when needed for reading attributes.

Module contents#

NFDInspector By Andreas Ketelaer andreas.ketelaer@bergbaumuseum.de

A Python package to inspect formal quality problems in research data.