Commit a42d6ca9071e7325d521a9cb0a772f3990a96b9a
Exists in
master
Merge branch 'master' of gitlab.fccn.pt:dev-b-on/sama-scripts
Showing
1 changed file
Show diff stats
ingester/Readme.md
... | ... | @@ -0,0 +1,36 @@ |
1 | +# Prepare and ingest | |
2 | + | |
3 | +This tool was develop to prepare and ingest pdf files from a filesystem. The preparation process consists on extracting a valid identifier (Digital Object Identifier), and with that ID fetching the associated metadata to build a dspace submit information package for the ingestion in the repository. | |
4 | + | |
5 | +## Instalation | |
6 | + | |
7 | +This software depends on several libraries like: | |
8 | +libxml, libxslt, pdftotext | |
9 | + | |
10 | +we prepared an executable for some Linux distributions (CentOS and Ubuntu). | |
11 | +You can download the source code and execute: | |
12 | + | |
13 | + bash [PATH]/setup.sh | |
14 | + | |
15 | +It will install the required packages and Perl libraries. | |
16 | + | |
17 | +## How to use | |
18 | + | |
19 | +You can use the linux find command to fecth all PDF files you want to ingest, you can do that by executing: | |
20 | + | |
21 | + find ./examples -type f -name '*.pdf' -exec prepare {} \; | |
22 | + | |
23 | +this command will prepare each one of the found results. We recommend for you to prepare itens by each collection. | |
24 | + | |
25 | +For each collection you can ingest the itens by executing DSpace command: | |
26 | + | |
27 | + [/dspace]/bin/dspace import --add --eperson=eperson@dspaceuser.com --collection=10000.01/100 --source=/tmp/prepared_items --mapfile=mapfile1 | |
28 | + | |
29 | + | |
30 | +## Need help, or give any type of contribution? | |
31 | + | |
32 | +Please contact us at [FCT|FCCN](http://www.fccn.pt) or any commiter. | |
33 | + | |
34 | +## License | |
35 | + | |
36 | +Please contact us at [FCT|FCCN](http://www.fccn.pt). | |
0 | 37 | \ No newline at end of file | ... | ... |