File | Date | Author | Commit |
---|---|---|---|
META-INF | 2014-02-03 | Jorge Simões | [a743e2] Updated to new Extractors' architecture. |
src | 2015-01-02 | miguelnunes | [2bb53d] Merge branch 'master' of https://opensourceproj... |
.gitignore | 2014-01-07 | Carlos Coutinho | [5901d6] First Version on Git, synchronised with TIMBUS ... |
Readme.md | 2015-01-07 | miguelnunes | [594f4e] Updated Readme |
build.properties | 2014-01-07 | Carlos Coutinho | [5901d6] First Version on Git, synchronised with TIMBUS ... |
dspace_output_result_UL.txt | 2014-03-12 | Luís Marques | [a63cd4] Added Markdown readme |
pom.xml | 2015-01-02 | miguelnunes | [2bb53d] Merge branch 'master' of https://opensourceproj... |
readme.txt | 2014-01-07 | Carlos Coutinho | [5901d6] First Version on Git, synchronised with TIMBUS ... |
Read Me
DSpace Software Extractor
This extractor and respective converter are DEPRECATED.
Dspace Aip Extractor and Converter are to be used.
DSpace Software Extractor is a tool built in Java that extracts meta information of the present content (articles, authors, institutions, etc.) in a DSpace Repository.
How to get the code
git clone https://@opensourceprojects.eu/git/p/timbus/context-population/extractors/dspace timbus-context-population-extractors-dspace
Install Requirements
Requirements for the extraction target
Note: If the target machine has a DSpace 1.8.x or inferior then the extractor will not work, because it needs the OAI-PMH to reach the proper endpoints.
Collected Information
The dspace-extractor returns the metadata of the present content in the target dspace in JSON format for easier parsing and reading.
How to execute
1 | java -jar dspace-extractor |
Structure of the output
The output has the following structure:
DSpace-Content: - ListRecords DSpace-Sets: - ListSets DSpace-Identify - Identify Request
This structure reflects the internal organization of DSpace. The organization and process can be found here.
Expected output - an example of a extraction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | { "Dspace-Content": {"OAI-PMH": { "responseDate": "2013-09-05T12:47:16Z", "ListRecords": {"record": { "header": { "setSpec": [ "com_123456789_3", "col_123456789_4" ], "datestamp": "2013-08-09T11:54:50Z", "identifier": "oai:localhost:123456789/5" }, "metadata": {"oai_dc:dc": { "dc:subject": "article, how to", "xmlns:doc": "http://www.lyncode.com/xoai", "xmlns:oai_dc": "http://www.openarchives.org/OAI/2.0/oai_dc/", "dc:creator": "Dspace, dspace", "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance", "dc:date": [ "2013-08-09T10:54:50Z", "2013-08-09T10:54:50Z", "2013-08-09" ], "dc:identifier": "http://hdl.handle.net/123456789/5", "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd", "dc:title": "DSpace How to Guide", "xmlns:dc": "http://purl.org/dc/elements/1.1/" }} }}, "request": { "content": "http://localhost:8080/oai/request", "verb": "ListRecords", "metadataPrefix": "oai_dc" }, "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance", "xmlns": "http://www.openarchives.org/OAI/2.0/", "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" }}, "DSpace-Sets": {"OAI-PMH": { "responseDate": "2013-09-05T12:47:16Z", "ListSets": {"set": [ { "setSpec": "com_123456789_3", "setName": "Universidade" }, { "setSpec": "col_123456789_4", "setName": "Artigos" } ]}, "request": { "content": "http://localhost:8080/oai/request", "verb": "ListSets" }, "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance", "xmlns": "http://www.openarchives.org/OAI/2.0/", "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" }}, "DSpace-Identify": {"OAI-PMH": { "responseDate": "2013-09-05T12:47:16Z", "Identify": { "protocolVersion": 2, "description": {"XOAIDescription": { "content": "XOAI: OAI-PMH Java Toolkit", "xmlns": "http://www.lyncode.com/XOAIConfiguration" }}, "granularity": "YYYY-MM-DDThh:mm:ssZ", "baseURL": "http://localhost:8080/oai/request", "repositoryName": "DSpace at My University", "adminEmail": "dspace-help@myu.edu", "deletedRecord": "persistent", "earliestDatestamp": "2013-08-09T10:54:50Z" }, "request": { "content": "http://localhost:8080/oai/request", "verb": "Identify" }, "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance", "xmlns": "http://www.openarchives.org/OAI/2.0/", "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" }} |
Generated Concepts and Properties
Dspace-Content: - ListRecords - Record DSpace-Sets: - ListSets - Set DSpace-Identify - Identify Request
The response uses the DSpace OAIS-PMH as specified in this link.
Mapping to TIMBUS CUDF
The ontology will reflect the output, this means that all the records in a ListRecords will belong to a set. In turn this set will belong to a ListSets that belongs to a Comunity.
This page show an example of an output ontology.
TIMBUS Use Cases
Task 7.4 Digitally Preserving an Open Source System
RCAAP is a portuguese digital repository aggregator of open access digital articles.
Each repository is a DSpace in itself and represents an university or educacional institutions that possesses scientific articles. In order to preserve the content in the long term and prevent loss of information, this tool extracts the meta information of the content.
An example:
The output from an extraction from a DSpace repository has the following article
{ "setSpec": "hdl_10451_3220", "setName": "ICS - DEMOLINE - Artigos em Site Nacionais" }
The field "setSpec" has the value "hdl 10451 3320". The first value 10451 defines wich institution the articles belongs to, the second value 3220 is the article identifier.
The field setName evidently is the name of the set of the article.
This will be useful to map into an ontology, has show in the image.
Has shown the article is related to a Set of articles with the proper identifier.
Author
Luís Marques luis.marques@caixamagica.pt
License
Copyright (c) 2014, Caixa Magica Software Lda (CMS).
The work has been developed in the TIMBUS Project and the above-mentioned are Members of the TIMBUS Consortium.
TIMBUS is supported by the European Union under the 7th Framework Programme for research and technological development and demonstration activities (FP7/2007-2013) under grant agreement no. 269940.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTIBITLY, or FITNESS FOR A PARTICULAR PURPOSE. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law or agreed to in writing, shall any Contributor be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work.
See the License for the specific language governing permissions and limitation under the License.