Prepare and ingest

This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository.


This software depends on several libraries like: libxml, libxslt, pdftotext

we prepared an executable for some Linux distributions (CentOS and Ubuntu). You can download the source code and execute: bash [PATH]/

It will install the required packages and Perl libraries.

How to use

You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing: find /tmp/pdfs -type f -name '*.pdf' -exec prepare {} \;

this command will prepare each one of the found results. We recommend for you to prepare itens by each collection.

For each collection you can ingest the itens by executing DSpace command: [/dspace]/bin/dspace import --add --collection=10400.25/300 --source=/tmp/prepared_items --mapfile=mapfile1

Need help, or give any type of contribution?

Please contact us at FCT|FCCN or any commiter.


