git clone https://@opensourceprojects.eu/git/p/timbus/context-population/extractors/rcaap-network-topology timbus-context-population-extractors-rcaap-network-topology



File Date Author Commit
.idea 2015-01-07 miguelnunes miguelnunes [ccb3d9] Created Readme
src 2015-01-07 miguelnunes miguelnunes [ccb3d9] Created Readme
target 2015-01-07 miguelnunes miguelnunes [ccb3d9] Created Readme
Readme.md 2015-01-14 miguelnunes miguelnunes [3cc0d8] Updated Readme
pom.xml 2015-01-07 miguelnunes miguelnunes [ccb3d9] Created Readme
rcaap-extractor.iml 2015-01-07 miguelnunes miguelnunes [ccb3d9] Created Readme
timbusproject-bundle-archetype.iml 2014-05-19 Luís Marques Luís Marques [b83cf7] Initial commit

Read Me

Rcaap Network Topology Extractor

Rcaap is a web portal that gathers information from several Dspace active instances. By periodically indexing each Dspace's documents, the Portal provides a search engine that integrates thousands of Scientific papers, doctoral thesis and other document types from the institutions' own repositories.

 

How to get the code

git clone https://opensourceprojects.eu/git/p/timbus/context-population/extractors/rcaap-network-topology

 

Install Requirements

  1. Oracle Java JDK 1.7
  2. Apache Maven installed

Requirements for the extraction target

  1. Linux installed
  2. An instance of RCAAP installed
  3. Mysql server running and publicly accessible

How to install

This project, like most others in Timbus, is built through Maven. All that is required to build the entire project is to run the following command on the root project folder:

1
$> mvn clean package

This will create a target folder in which it saves two different .jar files - The cli module, which is used to run locally on the machine and the bundle module. which is to be deployed into Virgo Container.
A tutorial on how to properly install Virgo and deploy Timbus artefacts into it can be found here
 

Collected Information

Essentially, RCAAP is a system that indexes information from various Dspace instances in one place. When connected to an institution's repository, RCAAP harvests the information regarding all groups, collections and documents from this repository and indexes it in a database.
It also contains a web service that exposes a website in which a user can search through all Dspace instance documents in a simple, seamless way.
Hence the need for this extractor: In order to preserve the RCAAP environment, a vital step is to preserve each repository's mapping from the database. This extractor accesses the Mysql database returns the Network topology - Which repositories are being mapped, their Url and further meta-data.

 

Example output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"extractor": "SSH Wrapper Extractor",
"format": {
"multiple": true
},
"result": [
    {
    "extractor": "rcaap-extractor",
    "format": {
        "id": "3c7ebef7-568f-3fc9-be8b-baa211c5c54bf",
        "multiple": false
    },
    "uuid": "e77aa4e4-2c69-11e4-81de-0373d6ff88e6",
    "result": [
        {
        "repository": {
            "country": "portugal",
            "homepage": "http://arca.igc.gulbenkian.pt/",
            "metadata_format": "OAI_DC",
            "directory": "http://directorio.rcaap.pt",
            "name": "ARCA - Access to Research and Communication Annals "
        },
        "tags_pt": [
            "SARI  Laboratório  Repositório \n"
        ]
        },

    ... ... ...

 

How to execute

Remotely:

The Context Population GUI was developed as an interface for the Core Extraction Manager tool.
To perform an extraction on a remote machine through the GUI, all is needed is to access the GUI, select Rcaap network topology in the extractors selection box and provide the target machine's information.
Further information on how to install and use the Extractors Manager and Context Population GUI is available here.

 

TIMBUS Use Cases

RCAAP digital preservation UC

This use case consists on an open digital repository platform that combines most of Portugal's most relevant scientific digital repositories.
It is a centralized platform that allows searching among all the repositories in a seamlessly way.
As each repository is a single Dspace instance, the Dspace AIP extractor is used to perform individual backups of all instances and search for possibly customized files to, later on, be preserved.

 

Author

Miguel Gama Nunes miguel.nunes@caixamagica.pt
Luis Marques luis.marques@caixamagica.pt

 

License

Copyright (c) 2014, Caixa Magica Software Lda (CMS).
The work has been developed in the TIMBUS Project and the above-mentioned are Members of the TIMBUS Consortium.
TIMBUS is supported by the European Union under the 7th Framework Programme for research and technological development and demonstration activities (FP7/2007-2013) under grant agreement no. 269940.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTIBITLY, or FITNESS FOR A PARTICULAR PURPOSE. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law or agreed to in writing, shall any Contributor be liable for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work.
See the License for the specific language governing permissions and limitation under the License.