A basic DRP template includes the following elements:
• | Recognition objects - These are recognized automatically by the system after opening the Document Template Designer window. |
• | Correction objects - These are the manual corrections made by any user using the toolbox in the Document Template Designer window and can include corrections to keywords, columns, tables, and graphs. |
• | Output XML template map - For mapping parsed data to a UXML structure. This is the XML format to which information is extracted. The document file is exported after it contains the extracted information. |
When a document is uploaded into the SDMS system, the system may recognize an existing DRP for the document. If no DRP template is available for the document, internal decoders will construct a basic data extraction template using objects that were automatically identified by the system. However, a user may need to edit and must approve the template before it is saved in the system.
When using automatic recognition, the system may recognize the listed components and color them in the Document Template Designer as follows:
• | Keywords (highlighted in green) - Text values in the document. Can be linked to a UXML template. |
• | Tables (highlighted in light yellow) - Indexed area with columns and rows. The values in the table fields can be extracted and used to update external tables. |
NOTE Tables that are recognized as being part of the designed table (similar tables) are highlighted in light red.
• | Graphs (highlighted in light pink) - The user can mark a graph and click on the Graph option. Graphs are used to capture a "screen shot" of a chart area. |
• | Textbox (highlighted in light blue) - A box to capture values of keywords or any other element (but pictures) in special incidents, such as when keywords or other elements can not be captured correctly. |
• | Page Header and Footer (highlighted in light green) - The beginning area and the end area of a document. |
• | Anchors (indicated in yellow) - Markers that start and end the mandatory areas. |
• | Repeated sections (multirun) - The same data extraction applied to multiple sections of a document such as in the case of a multi-sample report. |
After the system identifies the objects in the document, the user reviews the data extraction method proposed by the system and decides whether objects need to be added, removed, or modified. The Document Template Designer window allows editing existing component properties using the Properties tab in the right-side pane of the window and from a menu that appears when right-clicking on an object. Adding a new object to the pattern is done by clicking the type of object from the toolbox and visually painting it (marking with the cursor) on the document image.
NOTE After the confirmation is completed in the Document Template Designer window, the correction template is stored inside the DRP, so the next document of the same type can be identified and parsed automatically.
The following sections provide information about adding and setting the properties of each of the components and other tools that are used in configuring the data recognition process:
|