Commit 4ffe80c05da8cf20e40d0b13d2603933c767f094
1 parent
43f5c274c4
Exists in
master
update readme text
Showing
1 changed file
with
33 additions
and
0 deletions
Show diff stats
ingester/Readme.md
... | ... | @@ -0,0 +1,33 @@ |
1 | +# Prepare and ingest | |
2 | + | |
3 | +This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository. | |
4 | + | |
5 | +## Instalation | |
6 | + | |
7 | +This software depends on several libraries like: | |
8 | +libxml, libxslt, pdftotext | |
9 | + | |
10 | +we prepared an executable for some Linux distributions (CentOS and Ubuntu). | |
11 | +You can download the source code and execute: | |
12 | + bash [PATH]/setup.sh | |
13 | + | |
14 | +It will install the required packages and Perl libraries. | |
15 | + | |
16 | +## How to use | |
17 | + | |
18 | +You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing: | |
19 | +find /tmp/pdfs -type f -name '*.pdf' -exec prepare {} \; | |
20 | + | |
21 | +this command will prepare each one of the found results. We recommend for you to prepare itens by each collection. | |
22 | + | |
23 | +For each collection you can ingest the itens by executing DSpace command: | |
24 | +[/dspace]/bin/dspace import --add --eperson=sama-saw@asa.fccn.pt --collection=10400.25/300 --source=/tmp/prepared_items --mapfile=mapfile1 | |
25 | + | |
26 | + | |
27 | +## Need help, or give any type of contribution? | |
28 | + | |
29 | +Please contact us at [FCT|FCCN](http://www.fccn.pt) or any commiter. | |
30 | + | |
31 | +## License | |
32 | + | |
33 | +Please contact us at [FCT|FCCN](http://www.fccn.pt). | |
0 | 34 | \ No newline at end of file | ... | ... |