git clone https://@opensourceprojects.eu/git/p/timbus/context-population/extractors/dspace timbus-context-population-extractors-dspace



File Date Author Commit
META-INF 2014-02-03 Jorge Simões Jorge Simões [a743e2] Updated to new Extractors' architecture.
src 2015-01-02 miguelnunes miguelnunes [2bb53d] Merge branch 'master' of https://opensourceproj...
.gitignore 2014-01-07 Carlos Coutinho Carlos Coutinho [5901d6] First Version on Git, synchronised with TIMBUS ...
Readme.md 2015-01-07 miguelnunes miguelnunes [594f4e] Updated Readme
build.properties 2014-01-07 Carlos Coutinho Carlos Coutinho [5901d6] First Version on Git, synchronised with TIMBUS ...
dspace_output_result_UL.txt 2014-03-12 Luís Marques Luís Marques [a63cd4] Added Markdown readme
pom.xml 2015-01-02 miguelnunes miguelnunes [2bb53d] Merge branch 'master' of https://opensourceproj...
readme.txt 2014-01-07 Carlos Coutinho Carlos Coutinho [5901d6] First Version on Git, synchronised with TIMBUS ...

Read Me

DSpace Software Extractor

 

This extractor and respective converter are DEPRECATED.
Dspace Aip Extractor and Converter are to be used.

DSpace Software Extractor is a tool built in Java that extracts meta information of the present content (articles, authors, institutions, etc.) in a DSpace Repository.

 

How to get the code

git clone https://@opensourceprojects.eu/git/p/timbus/context-population/extractors/dspace timbus-context-population-extractors-dspace

 

Install Requirements

  1. Oracle Java JDK 1.7

Requirements for the extraction target

  1. DSpace >= 3.x installed

 

Note: If the target machine has a DSpace 1.8.x or inferior then the extractor will not work, because it needs the OAI-PMH to reach the proper endpoints.

 

Collected Information

The dspace-extractor returns the metadata of the present content in the target dspace in JSON format for easier parsing and reading.

 

How to execute

1
java -jar dspace-extractor

 

Structure of the output

The output has the following structure:

DSpace-Content: 
        - ListRecords
DSpace-Sets:
    - ListSets
DSpace-Identify
    - Identify
Request

This structure reflects the internal organization of DSpace. The organization and process can be found here.

 

Expected output - an example of a extraction

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
{
   "Dspace-Content": {"OAI-PMH": {
    "responseDate": "2013-09-05T12:47:16Z",
    "ListRecords": {"record": {
        "header": {
            "setSpec": [
                "com_123456789_3",
                "col_123456789_4"
            ],
            "datestamp": "2013-08-09T11:54:50Z",
            "identifier": "oai:localhost:123456789/5"
        },
        "metadata": {"oai_dc:dc": {
            "dc:subject": "article, how to",
            "xmlns:doc": "http://www.lyncode.com/xoai",
            "xmlns:oai_dc": "http://www.openarchives.org/OAI/2.0/oai_dc/",
            "dc:creator": "Dspace, dspace",
            "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
            "dc:date": [
                "2013-08-09T10:54:50Z",
                "2013-08-09T10:54:50Z",
                "2013-08-09"
            ],
            "dc:identifier": "http://hdl.handle.net/123456789/5",
            "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd",
            "dc:title": "DSpace How to Guide",
            "xmlns:dc": "http://purl.org/dc/elements/1.1/"
        }}
    }},
    "request": {
        "content": "http://localhost:8080/oai/request",
        "verb": "ListRecords",
        "metadataPrefix": "oai_dc"
    },
    "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "xmlns": "http://www.openarchives.org/OAI/2.0/",
    "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
}},
"DSpace-Sets": {"OAI-PMH": {
    "responseDate": "2013-09-05T12:47:16Z",
    "ListSets": {"set": [
        {
            "setSpec": "com_123456789_3",
            "setName": "Universidade"
        },
        {
            "setSpec": "col_123456789_4",
            "setName": "Artigos"
        }
    ]},
    "request": {
        "content": "http://localhost:8080/oai/request",
        "verb": "ListSets"
    },
    "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "xmlns": "http://www.openarchives.org/OAI/2.0/",
    "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
}},
"DSpace-Identify": {"OAI-PMH": {
    "responseDate": "2013-09-05T12:47:16Z",
    "Identify": {
        "protocolVersion": 2,
        "description": {"XOAIDescription": {
            "content": "XOAI: OAI-PMH Java Toolkit",
            "xmlns": "http://www.lyncode.com/XOAIConfiguration"
        }},
        "granularity": "YYYY-MM-DDThh:mm:ssZ",
        "baseURL": "http://localhost:8080/oai/request",
        "repositoryName": "DSpace at My University",
        "adminEmail": "dspace-help@myu.edu",
        "deletedRecord": "persistent",
        "earliestDatestamp": "2013-08-09T10:54:50Z"
    },
    "request": {
        "content": "http://localhost:8080/oai/request",
        "verb": "Identify"
    },
    "xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "xmlns": "http://www.openarchives.org/OAI/2.0/",
    "xsi:schemaLocation": "http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"
}}

 

Generated Concepts and Properties

Dspace-Content: 
    - ListRecords
       - Record
DSpace-Sets:
        - ListSets
       - Set
DSpace-Identify
        - Identify
Request

The response uses the DSpace OAIS-PMH as specified in this link.

Mapping to TIMBUS CUDF

The ontology will reflect the output, this means that all the records in a ListRecords will belong to a set. In turn this set will belong to a ListSets that belongs to a Comunity.

This page show an example of an output ontology.

 

TIMBUS Use Cases

Task 7.4 Digitally Preserving an Open Source System

RCAAP is a portuguese digital repository aggregator of open access digital articles.
Each repository is a DSpace in itself and represents an university or educacional institutions that possesses scientific articles. In order to preserve the content in the long term and prevent loss of information, this tool extracts the meta information of the content.

An example:

The output from an extraction from a DSpace repository has the following article

        {
            "setSpec": "hdl_10451_3220",
            "setName": "ICS - DEMOLINE - Artigos em Site Nacionais"
        }

The field "setSpec" has the value "hdl 10451 3320". The first value 10451 defines wich institution the articles belongs to, the second value 3220 is the article identifier.
The field setName evidently is the name of the set of the article.

This will be useful to map into an ontology, has show in the image.

Has shown the article is related to a Set of articles with the proper identifier.

 

Author

Luís Marques luis.marques@caixamagica.pt

 

License

Copyright (c) 2014, Caixa Magica Software Lda (CMS).
The work has been developed in the TIMBUS Project and the above-mentioned are Members of the TIMBUS Consortium.
TIMBUS is supported by the European Union under the 7th Framework Programme for research and technological development and demonstration activities (FP7/2007-2013) under grant agreement no. 269940.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTIBITLY, or FITNESS FOR A PARTICULAR PURPOSE. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law or agreed to in writing, shall any Contributor be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work.
See the License for the specific language governing permissions and limitation under the License.