Commit a42d6ca9071e7325d521a9cb0a772f3990a96b9a

Authored by Paulo Graça
Exists in master

Merge branch 'master' of gitlab.fccn.pt:dev-b-on/sama-scripts

ingester/Readme.md
... ... @@ -0,0 +1,36 @@
  1 +# Prepare and ingest
  2 +
  3 +This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository.
  4 +
  5 +## Instalation
  6 +
  7 +This software depends on several libraries like:
  8 +libxml, libxslt, pdftotext
  9 +
  10 +we prepared an executable for some Linux distributions (CentOS and Ubuntu).
  11 +You can download the source code and execute:
  12 +
  13 + bash [PATH]/setup.sh
  14 +
  15 +It will install the required packages and Perl libraries.
  16 +
  17 +## How to use
  18 +
  19 +You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing:
  20 +
  21 + find ./examples -type f -name '*.pdf' -exec prepare {} \;
  22 +
  23 +this command will prepare each one of the found results. We recommend for you to prepare itens by each collection.
  24 +
  25 +For each collection you can ingest the itens by executing DSpace command:
  26 +
  27 + [/dspace]/bin/dspace import --add --eperson=eperson@dspaceuser.com --collection=10000.01/100 --source=/tmp/prepared_items --mapfile=mapfile1
  28 +
  29 +
  30 +## Need help, or give any type of contribution?
  31 +
  32 +Please contact us at [FCT|FCCN](http://www.fccn.pt) or any commiter.
  33 +
  34 +## License
  35 +
  36 +Please contact us at [FCT|FCCN](http://www.fccn.pt).
0 37 \ No newline at end of file
... ...