From 4ffe80c05da8cf20e40d0b13d2603933c767f094 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Paulo=20Gra=C3=A7a?= Date: Mon, 7 Sep 2015 10:26:10 +0100 Subject: [PATCH] update readme text --- ingester/Readme.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/ingester/Readme.md b/ingester/Readme.md index e69de29..2179e96 100644 --- a/ingester/Readme.md +++ b/ingester/Readme.md @@ -0,0 +1,33 @@ +# Prepare and ingest + +This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository. + +## Instalation + +This software depends on several libraries like: +libxml, libxslt, pdftotext + +we prepared an executable for some Linux distributions (CentOS and Ubuntu). +You can download the source code and execute: + bash [PATH]/setup.sh + +It will install the required packages and Perl libraries. + +## How to use + +You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing: +find /tmp/pdfs -type f -name '*.pdf' -exec prepare {} \; + +this command will prepare each one of the found results. We recommend for you to prepare itens by each collection. + +For each collection you can ingest the itens by executing DSpace command: +[/dspace]/bin/dspace import --add --eperson=sama-saw@asa.fccn.pt --collection=10400.25/300 --source=/tmp/prepared_items --mapfile=mapfile1 + + +## Need help, or give any type of contribution? + +Please contact us at [FCT|FCCN](http://www.fccn.pt) or any commiter. + +## License + +Please contact us at [FCT|FCCN](http://www.fccn.pt). \ No newline at end of file -- 2.0.0