Commit 4ffe80c05da8cf20e40d0b13d2603933c767f094

Authored by Paulo Graça
1 parent 43f5c274c4
Exists in master

update readme text

Showing 1 changed file with 33 additions and 0 deletions   Show diff stats
ingester/Readme.md
... ... @@ -0,0 +1,33 @@
  1 +# Prepare and ingest
  2 +
  3 +This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository.
  4 +
  5 +## Instalation
  6 +
  7 +This software depends on several libraries like:
  8 +libxml, libxslt, pdftotext
  9 +
  10 +we prepared an executable for some Linux distributions (CentOS and Ubuntu).
  11 +You can download the source code and execute:
  12 + bash [PATH]/setup.sh
  13 +
  14 +It will install the required packages and Perl libraries.
  15 +
  16 +## How to use
  17 +
  18 +You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing:
  19 +find /tmp/pdfs -type f -name '*.pdf' -exec prepare {} \;
  20 +
  21 +this command will prepare each one of the found results. We recommend for you to prepare itens by each collection.
  22 +
  23 +For each collection you can ingest the itens by executing DSpace command:
  24 +[/dspace]/bin/dspace import --add --eperson=sama-saw@asa.fccn.pt --collection=10400.25/300 --source=/tmp/prepared_items --mapfile=mapfile1
  25 +
  26 +
  27 +## Need help, or give any type of contribution?
  28 +
  29 +Please contact us at [FCT|FCCN](http://www.fccn.pt) or any commiter.
  30 +
  31 +## License
  32 +
  33 +Please contact us at [FCT|FCCN](http://www.fccn.pt).
0 34 \ No newline at end of file
... ...