Restructuring a destructured document is a highly complex operation. As you’ll see, a wide range of techniques need to be mastered in order to recreate the structure. The aim is to recover the initial structures used when documents were created. The main techniques which we use are:
- Reading and interpreting numerous file formats containing these destructured documents
- Using stringology algorithms after deciphering the information to find repetitions or patterns
- Data mining
- Knowledge discovery in databases (KDD)
- Deep-learning algorithms
- Artificial intelligence (AI) algorithms
- Computer human interfaces (CHI)
There are many destructured documents. We are interested in mostly graphic documents, ie. documents containing both drawings and text.