bartocsuggest documentation¶

bartocsuggest is a Python module that suggests vocabularies given a list of words based on the BARTOC FAST API (https://bartoc-fast.ub.unibas.ch/bartocfast/api).

Documentation available at: https://bartocsuggest.readthedocs.io/en/latest/

Codebase available at: https://github.com/MHindermann/bartocsuggest

Core functionality¶

class bartocsuggest.Session(words, preload_folder=None, language='und')¶

Vocabulary suggestion session using the BARTOC FAST API.

Parameters

words (Union[List[str], str, _ConceptScheme]) – the input words (list of strings, or path to XLSX file, or JSKOS concept scheme)
preload_folder (Optional[str]) – the path to the preload folder, defaults to None
language (str) – the language of the words given as RFC 3066 language tag, defaults to “und” (for undefined)

preload(max=100000, min=0, verbose=False)¶

Preload responses.

For each word in self.words, a query is sent to the BARTOC FAST API. The response is saved to self.preload_folder. Use this method for batchwise handling of large (>100) self.words.

Parameters

max (int) – stop with the max-th word in self.words, defaults to 100000
min (int) – start with min-th word in self.words, defaults to 0
verbose (bool) – toggle running comment printed to console, defaults to False

Return type

None

suggest(remote=True, sensitivity=1, score_type=<class 'bartocsuggest.Recall'>, verbose=False)¶

Suggest vocabularies based on self.words.

Parameters

remote (bool) – toggle between remote BARTOC FAST querying and preload folder, defaults to True
sensitivity (int) – set the maximum allowed Levenshtein distance between word and result, defaults to 1
score_type (ScoreType) – set the score type on which the suggestion is based, defaults to bartocsuggest.Recall
verbose (bool) – toggle running comment printed to console, defaults to False

Return type

Suggestion

class bartocsuggest.Suggestion(_scheme, _vocabularies, _sensitivity, _score_type)¶

A suggestion of vocabularies.

Parameters

_scheme (_ConceptScheme) – the input concept scheme
_vocabularies (List[_Source]) – the suggested vocabularies
_sensitivity (int) – the used sensitivity
_score_type (ScoreType) – the used score type

get(scores=False, max=None)¶

Return the suggested vocabularies sorted from best to worst.

Parameters

scores (bool) – toggle returning results and their scores, defaults to False
max (Optional[int]) – limit the number of suggestions to max, defaults to None

Return type

Union[List[str], List[Tuple[str, int]]]

get_score_type()¶

Return the suggestion’s score type.

Return type: ScoreType

get_sensitivity()¶

Return the suggestion’s sensitivity.

Return type: int

print()¶: Print the suggestion to the console.

print_concordance(vocabulary_uri=None)¶

Print the concordance as JSKOS to the console.

The concordance is between the session’s input words from which this suggestion was derived and a vocabulary to be chosen by URI. If no vocabulary URI is selected, the most highly suggested vocabulary is used. To see the suggested vocabularies and their URIs, use the print method of this class. For JSKOS, see https://gbv.github.io/jskos/jskos.html (version 0.4.6).

Parameters: vocabulary_uri (Optional[str]) – the URI of the vocabulary, defaults to None
Return type: None

save_concordance(folder, filename=None, vocabulary_uri=None)¶

Save the concordance as JSKOS in the JSON format.

Parameters

folder (str) – the path to the save folder
filename (Optional[str]) – the name of the file, defaults to None
vocabulary_uri (Optional[str]) – the URI of the vocabulary, defaults to None

Return type

None

save_mappings(folder, filename=None, vocabulary_uri=None)¶

Save the mappings as JSKOS in the NDJSON format.

Mappings in this format can be used in the Cocoda Mapping Tool, see https://coli-conc.gbv.de/cocoda/app/ (version 1.3.6). For NDJSON, see https://github.com/ndjson/ndjson-spec (version 1.0.0).

Parameters

folder (str) – the path to the save folder
filename (Optional[str]) – the name of the file, defaults to None
vocabulary_uri (Optional[str]) – the URI of the vocabulary, defaults to None

Return type

None

class bartocsuggest.ScoreType¶

A score type.

All score types are relative to a specific vocabulary and a list of words. There are four score type classes: bartocsuggest.Recall, bartocsuggest.Average, bartocsuggest.Coverage, bartocsuggest.Sum. Use the help method on these classes for more information.

class bartocsuggest.Recall¶

The number of words over a vocabulary’s coverage.

The lower the better (minimum is 1). See https://en.wikipedia.org/wiki/Precision_and_recall#Recall.

For example, for words [a,b,c] and coverage 2, recall is len(words)/coverage = len([a,b,c])/2 = 1.5.

class bartocsuggest.Average¶

The average over a vocabulary’s match scores.

The lower the the better (minimum is 0). The score of a match is defined by the Levenshtein distance between word and match.

For example, for scores [1,1,4], the average is scores/len(scores) = (1+1+4)/3 = 2.

class bartocsuggest.Coverage¶

The number of a vocabulary’s matches in the list of words.

Note that this is dependent on the sensitivity parameter of bartocsuggest.Session.suggest().

For example, for words [a,b,c] and vocabulary matches a,c, the coverage is a,c in [a,b,c] = 2.

class bartocsuggest.Sum¶

The sum over a vocabulary’s match scores.

The lower the average the better (minimum is 0). The score of a match is defined by the Levenshtein distance between word and match.

For example, for scores [1,1,4], the sum is (1+1+4) = 6.

Wrappers¶

class bartocsuggest.AnnifSession(text, project_id, limit=None, threshold=None, preload_folder=None)¶

Wrapper for the Annif REST API based on the Annif-client module.

Annif indexes the input text based on the project identifier with an optional limit or threshold. Use this Session to get vocabulary suggestions for full texts instead of words. bartocsuggest.AnnifSession inherits its methods preload and suggest from bartocsuggest.Session.

Parameters

text (str) – the input text
project_id (str) – the project identifier
limit (Optional[int]) – the maximum number of results to return, defaults to None
threshold (Optional[int]) – the minimum score threshold, defaults to None

bartocsuggest documentation¶

Core functionality¶

Wrappers¶

Indices and tables¶

Table of Contents

This Page