Case Study

Automating Review Process through Imaging

Automating Review Process through Imaging

Pages 1 Pages

www.cignex.com 1 Automating Review Process through Imaging Large European Publishing House Business Need: • Lack of ability to extract content & metadata from images (GIF, PNG, JPG) • Unstructured input/ output format • Need to build rules/ training set along with Self Learning ability Approach: • Apache Tika to extract complex image content and metadata. • Content Storage in a scalable NoSQL repository – MongoDB • Entity Extraction / NLP through IBM Alchemy • Continuous self learning & Machine Learning through Mahout for improving accuracy Key Highlights • Ability to extract data from 100+ disparate file formats (GIF, PNG, JPG) • Accuracy in metadata extraction / output improved by 70% - 86% • Automated 95% of the review processes

Join for free to read