 As my former colleague Jen Mitcham discovered a SPRUCE Mash-up is a very productive thing to be involved in. This time I took along a collection of our PDF and PDF/A files to test a tool that is being developed. The idea of the tool is that it will be able to identify PDF files with content that involves a preservation risk. This is not necessarily the same thing as a PDF/A file which presents itself as a valid PDF/A according to the various different PDF/A validators out there (or at least it might not be – the jury is still out on this point). The validator being used by the tool is Apache PDFBox Preflight, but we also used PDFTron PDF/A Manager and Adobe Acrobat Preflight all of which give different results! The hope is that this tool when further developed will give a customisable traffic light system of identifying preservation risks in PDFs and that it will be possible to embed it in repository software. Good luck on the future development!
As my former colleague Jen Mitcham discovered a SPRUCE Mash-up is a very productive thing to be involved in. This time I took along a collection of our PDF and PDF/A files to test a tool that is being developed. The idea of the tool is that it will be able to identify PDF files with content that involves a preservation risk. This is not necessarily the same thing as a PDF/A file which presents itself as a valid PDF/A according to the various different PDF/A validators out there (or at least it might not be – the jury is still out on this point). The validator being used by the tool is Apache PDFBox Preflight, but we also used PDFTron PDF/A Manager and Adobe Acrobat Preflight all of which give different results! The hope is that this tool when further developed will give a customisable traffic light system of identifying preservation risks in PDFs and that it will be possible to embed it in repository software. Good luck on the future development!
Other than that there was lots of great work done on file identification and although it was not possible to get on to my other issue of matching equivalent files of different formats I’m hoping to put in a bid on a spruce follow up grant for this.
More information on the issues and solutions is available from the event website.
 
                           
                           
                           
                            