Introduction
I chose this project because it is the only one that, by some miracle, remains partially saved on my personal Figma account. Like the other ABBYY designers, I no longer have access to any ABBYY projects.
Context: What is Vantage and its Document Skill?
ABBYY Vantage is a platform for automating document processing and data extraction. It uses machine learning to classify documents, extract key information, and integrate this data into various business systems.
The Document Skill is one of the key components in Vantage. It is designed for extracting data from documents. Users create a new custom document skill if none of the default skills meet their needs. The principle of its operation is as follows:
- The user uploads a set of documents of the same type (say, invoices) from which data needs to be extracted.
- Next, the user labels fields: they manually create fields by linking them to specific regions on the document (where data is extracted from), and assign meaningful names to these fields.
- In addition to field labeling, some fields can be extracted and verified by validation rules (for example, the total amount should be the sum of individual item amounts and taxes).
- After this, it is necessary to train the skill and ensure that the model has learned to correctly extract the fields specified by the user. This is done on the "Results" tab.
- The goal is to achieve the highest accuracy, ideally 100%. After achieving the desired accuracy, the user can use this skill for automatic data extraction from a large number of documents for their organisation.
https://youtu.be/8dqAY7wGfYY?si=e7iSCDBejGktsgU7
How was the task set?
The task was set as follows: As part of the AI-first strategy, Vantage (especially its document skill) should show users that it knows a lot about documents, e.g. document fields, and should require as little manual effort as possible to train for new document types.
AI-first in the intelligent document processing stands for a number of expectations users have:
- the system knows a lot about documents out of the box and extracts generic fields without additional training – zero-shot learning.
- the system supports few-shot learning, meaning it can be adjusted to the required set of fields and custom document types with just a few documents, at least the user sees improvement quickly.
- the system copes with multimodal documents without any explanation of how to apply technology to this type of document
Steps to achieve the task