OpenText OCR: Intelligent Capture

We are in the digital age, but still, there are not many invoices received in structured electronic formats (XML, EDIFACT, etc.), whose data can be mapped directly to an ERP. In fact, a high percentage of companies keep basing their work on paper documents, which must later be transformed into digital format, and this translates into a significant investment of resources to process documents. If this is the case with your company, this post is for you.

Companies manage several types of documents on a daily basis

The process of extracting data from a document, known as Optical Character Recognition or OCR, is a technology that we know quite well at Brait. We have been carrying out automation projects for accounts payable processes for many years now, where OCR technologies are commonly applied.

OpenText has developed its own OCR technology, which incorporates into its Business Center Capture for SAP Solutions. We find the following advantages when using the BCC to process supplier invoices:

The OCR is able to determine what type of document is: payment, advance payment, etc.
Difference between gross and net amount and calculates the tax percentage.
It is able to identify the order number on an invoice and relate it to the one created in SAP. In this way, in VIM it will be possible to check the receipt of goods and compare it with the invoice, detecting if there has been any type of discrepancy in prices or quantities.
Duplicate detection: compares data extracted from documents automatically.

Data Flow in the Business Center for SAP Solutions

The latest solutions that OpenText has developed (only compatible with the latest versions of Business Center and VIM) are:

Intelligent Capture for SAP Solutions (IC4SAP) on-premise document recognition solution through a very powerful OCR.
Core capture for SAP Solutions (CC4SAP) ) which is the same OCR but in its cloud solution as SaaS.

Opentext simplifies its capture solutions into two solutions

Both solutions count with machine learning, which increases the recognition rate with each execution and leads to a fully automated process. Furthermore, as they are fully integrated with SAP, they are managed and configured from within SAP itself.

Opentext applies machine learning to its OCR technology

Through OpenText solutions, we can achieve fully automated documentation processing:

Capture through almost any channel, such as: scanners, faxes, mobiles, mailboxes, integration via API, SOAP o REST.
Document classification processing to extract meta data
1. Preparation of documents
  - Electronic documentation: it does not require quality improvement.
  - Images documents, extraction quality can be improved based on the image enhancement.
2. Documents classification based on the nature of the document:
  - Structured documents, those whose data is always located in the same location. For example, forms or questionnaires. It works with graphic templates.
  - Semi-structured documents which work with keywords and with PAL learning (automatic learning system) with the ability to create new specific templates based on the data that is being found (Intelligent Capture).
  - Unstructured documents: classification by keywords or test matching. A group of paragraphs are taken to make comparisons and classify the documents.
3. Metadata extraction
  - Area extraction. An area is defined so the OCR searches for data. It is used in structured documents.
  - Free form: it is the extraction mode used in semi-structured or unstructured documents. This extraction nis based on regular expressions.
Delivery. Prepares the results for the target system. Different exporters:
- Email
- PDF
- Content Server
- Documentum
- Share point
- Others
- It can also be extended with a custom table

Data capture process, from document entry to its delivery

You may be interested in any other solution

Of course, you can find other alternatives to extract the data from indexable PDFs: Cloudtrade and Tradeshift extract data from the PDFs (if the file was generated from a program and not from a scan). Extracting data from indexable PDFs avoids character “interpretation” errors: it prevents the software from making mistakes between 1 and l, for instance.

If you are looking for this type of tool or would like to hear more information about these or any other solutions, do get in contact with us. We would be happy to help you.

Share this post

Discover more

Nueva alianza entre Brait y xSuite Group

Update OpenText Core Content Management 2025 version 25.2

What is OpenText Core Content Management?