git clone https://@opensourceprojects.eu/git/p/timbus/dpes/preservation-identifier/kbgen timbus-dpes-preservation-identifier-kbgen



File Date Author Commit
libs 2014-01-07 Carlos Coutinho Carlos Coutinho [ffd0f2] First Version on Git, synchronised with TIMBUS ...
ontologies 2014-04-23 Johannes Binder Johannes Binder [45d98c] Use official base IRI for the toolKB_instance, ...
src 2014-05-23 Johannes Binder Johannes Binder [cba377] Consider xpuids in addition to puids, ignore fm...
.gitignore 2014-05-23 Johannes Binder Johannes Binder [cba377] Consider xpuids in addition to puids, ignore fm...
README.md 2014-05-23 Johannes Binder Johannes Binder [cba377] Consider xpuids in addition to puids, ignore fm...
license_header.txt 2014-01-07 Carlos Coutinho Carlos Coutinho [ffd0f2] First Version on Git, synchronised with TIMBUS ...
pom.xml 2014-05-12 Johannes Binder Johannes Binder [5a9341] Ignore missing license for tmp files

Read Me

kbgen

This tool populates a toolKB ontology [1] with tools and file formats that are extracted
from Freebase [2] and Pronom [3].
A resulting ontology can be found in [4].

Usage

Build and run with: java -jar target/kbgen-1.0-SNAPSHOT.jar
The tool uses ontologies/toolKB_instance_empty.owl to insert formats and tools that are extracted from Freebase and Pronom.

The resulting ontology is stored in toolKB_instance.owl.

In case of memory errors increase the memory limit, e.g. using the VM option -Xmx2g

To handle different versions of file formats that are not part of freebase it is possible to provide CSV files that contain
formats which additionally should be considered. The CSV files are separated by the type of tool to file format mapping (read, write, read/write),
and are searched by following name in the working directory:

additional_formats_{r|w|rw}.csv

The format of the CSV files is:

([name], [puid], [tool]*).

The Pronom importer does not retrieve newer formats (Pronom IDs higher than about 450). So it might be necessary to run the kbgen,
add required formats that are missing to the cache_pronom_formats.json file, and rerun the kbgen.

Build

Use Maven to build the project.

References

[1] http://timbus.teco.edu/ontologies/preservationIdentifier/toolKB.owl

[2] http://www.freebase.com/

[3] http://www.nationalarchives.gov.uk/PRONOM/

[4] http://timbus.teco.edu/ontologies/preservationIdentifier/toolKB_instance.owl