Named Entity Recognition Using a Pre-Trained LayoutLMv2 Model
blog.doxray.com
The amount of data generated by humans is ever-increasing, much of which is stored in documents and forms that need to be analyzed to extract relevant information. Today, many companies still extract data through manual effort, which is often a tedious and time-consuming task. Rule-based systems are sometimes used, but they require careful engineering and tend to struggle if the environment changes, leading to costly algorithm adjustments. Recently, with the rapid advancement of NLP fueled by the success of deep learning, neural networks have been recognized as a potential new approach for automatic document processing. Soon after, LSTMs and transformers began to outperform humans in document understanding and data extraction, especially when speed and price are taken into consideration. However, since standard language models work with only word sequences, they miss out on a lot of information stored in the visual aspects of documents, such as their layout. This makes sense to us as humans. For example, imagine reading C code purely as a sequence of symbols:
Named Entity Recognition Using a Pre-Trained LayoutLMv2 Model
Named Entity Recognition Using a Pre-Trained…
Named Entity Recognition Using a Pre-Trained LayoutLMv2 Model
The amount of data generated by humans is ever-increasing, much of which is stored in documents and forms that need to be analyzed to extract relevant information. Today, many companies still extract data through manual effort, which is often a tedious and time-consuming task. Rule-based systems are sometimes used, but they require careful engineering and tend to struggle if the environment changes, leading to costly algorithm adjustments. Recently, with the rapid advancement of NLP fueled by the success of deep learning, neural networks have been recognized as a potential new approach for automatic document processing. Soon after, LSTMs and transformers began to outperform humans in document understanding and data extraction, especially when speed and price are taken into consideration. However, since standard language models work with only word sequences, they miss out on a lot of information stored in the visual aspects of documents, such as their layout. This makes sense to us as humans. For example, imagine reading C code purely as a sequence of symbols: