<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI in Production]]></title><description><![CDATA[From customers and annotations to large language models and AI platforms, everything doXray does to run AI in production. ]]></description><link>https://blog.doxray.com</link><image><url>https://substackcdn.com/image/fetch/$s_!W7bt!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F702cad06-cd6d-4b83-ad33-05ace831d092_372x372.png</url><title>AI in Production</title><link>https://blog.doxray.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 27 Apr 2026 12:41:25 GMT</lastBuildDate><atom:link href="https://blog.doxray.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[doXray]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[doxray@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[doxray@substack.com]]></itunes:email><itunes:name><![CDATA[Dragan Kraljević]]></itunes:name></itunes:owner><itunes:author><![CDATA[Dragan Kraljević]]></itunes:author><googleplay:owner><![CDATA[doxray@substack.com]]></googleplay:owner><googleplay:email><![CDATA[doxray@substack.com]]></googleplay:email><googleplay:author><![CDATA[Dragan Kraljević]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Generative (TrOCR) models calibration]]></title><description><![CDATA[Teaching a generative TrOCR model to know what it doesn't know]]></description><link>https://blog.doxray.com/p/generative-trocr-models-calibration</link><guid isPermaLink="false">https://blog.doxray.com/p/generative-trocr-models-calibration</guid><dc:creator><![CDATA[Tin Ferković]]></dc:creator><pubDate>Wed, 13 Dec 2023 12:12:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Introduction to model calibration</h1><p>Calibrating a model means aligning the confidence predictions with output probabilities. Essentially, when the model&#8217;s softmax output for a certain class (or a certain token in the generative scenario) is high, we want the model to be correct most of the time. Likewise, when the confidence is low, the model should have a hard time choosing the correct class. Often, this is not the case. Simply training the model using the cross-entropy loss for the generative next token prediction does not anyhow guarantee that the model will be calibrated. Thus, calibrated models have the&nbsp;<strong>confidence</strong>&nbsp;and&nbsp;<strong>probability</strong>&nbsp;distributions aligned, i.e. high confidence examples are mostly classified correctly and vice versa.</p><p>There are numerous simple techniques for calibrating the classification problems, such as text classification or named entity recognition.&nbsp;<strong>Histogram binning</strong>,&nbsp;<strong>isotonic regression</strong>, and&nbsp;<strong>temperature scaling</strong>&nbsp;are just some of the techniques. However, such approaches cannot be used for generative models for three reasons:</p><ol><li><p>the number of classes is the size of the vocabulary, which is usually ~50k tokens (classes);</p></li><li><p>ground truth text can be tokenized differently compared to the output tokens of our model, even though they, when combined, make up the same text. Example: ground truth "SCHMITZ" can be obtained via different combinations of generated tokens, for instance, S, CH, MIT, Z and S, CH, M, IT, Z.</p></li><li><p>output tokens need to be aligned with the tokenized ground truth. Example&nbsp;<em>i)</em>:&nbsp;<strong>ground truth</strong>: S, CH, MIT, Z;&nbsp;<strong>obtained</strong>: SCH, N, IT, Z (only 1 letter differs but 3/4 tokens are incorrect). Example&nbsp;<em>ii)</em>:&nbsp;<strong>ground</strong>&nbsp;<strong>truth</strong>: S, CH, MIT, Z;&nbsp;<strong>obtained</strong>: S, CH, N, IT, Z (lengths differ and it is unclear how to align the two).</p></li></ol><h1>Generative calibration</h1><p>A different approach needs to be taken for generative models. A paper examining this field is&nbsp;<a href="https://openreview.net/pdf?id=0qSOodKmJaN">Calibrating sequence likelihood improves conditional language generation</a>. The idea is to utilize the&nbsp;<strong>beam search</strong>&nbsp;from the model&#8217;s generate function, which is normally used during inference, to get multiple candidate outputs for a single image input. The procedure is as follows:</p><ul><li><p>After fine-tuning your generative model, introduce a calibration stage, which is also done on the&nbsp;<strong>training</strong>&nbsp;data.</p></li><li><p>Choose a similarity function that will later be used to either rank the candidates for loss calculation (rank loss) or to determine the scale of the loss (margin loss).&nbsp;<strong>Character Error Rate (CER)</strong>&nbsp;was used for this metric.</p></li><li><p>Using the input pixel values,&nbsp;<strong>y</strong>&nbsp;ground truth and possible&nbsp;<strong>y</strong>&nbsp;target candidates outputted by the model's generate function, calculate the calibration loss. This loss aims to&nbsp;<strong>align models' decoded candidate's sequence likelihood according to their similarity with the ground truth target sequence</strong>, using the CER metric.&nbsp;<strong>Rank loss</strong>&nbsp;optimizes the ranking order of positive and negative candidate pairs&nbsp;<strong>y</strong>+,&nbsp;<strong>y</strong>&#8722; uniformly sampled where&nbsp;<em>s</em>(<strong>y</strong>+,&nbsp;<strong>y</strong>;&nbsp;<strong>x</strong>) &gt;&nbsp;<em>s</em>(<strong>y</strong>&#8722;,&nbsp;<strong>y</strong>;&nbsp;<strong>x</strong>).&nbsp;<strong>Margin loss</strong>&nbsp;maximizes the sequence probability gap of positive and negative candidate pairs. List-rank and reward losses gave the worst results in the paper, so they weren't considered.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CtjV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CtjV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 424w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 848w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 1272w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CtjV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png" width="1274" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92450,&quot;alt&quot;:&quot;Image 1: Possible calibration loss variations [1]&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 1: Possible calibration loss variations [1]" title="Image 1: Possible calibration loss variations [1]" srcset="https://substackcdn.com/image/fetch/$s_!CtjV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 424w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 848w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 1272w, https://substackcdn.com/image/fetch/$s_!CtjV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30b8c5a4-d4d1-4f47-b8a0-6d469ad089cf_1274x297.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Image 1: Possible calibration loss variations <a href="https://openreview.net/pdf?id=0qSOodKmJaN">[1]</a></figcaption></figure></div></li><li><p>(Optional) Apply the&nbsp;<strong>regularization loss</strong>&nbsp;to prevent models from deviating significantly from their fine-tuned MLE objective.&nbsp;<strong>Cross entropy</strong>&nbsp;is the standard fine-tuning MLE objective.&nbsp;<strong>KL divergence</strong>&nbsp;directly minimizes the probability distribution distance between the calibrated model and the fine-tuned model at each token on the observed target sequence. The main difference is cross entropy loss regularizes the model toward the gold reference while KL divergence regularizes the model toward the fine-tuned-only model. Aligning the probabilities of an initial (fine-tuned) and calibrated model is difficult due to the often phenomenon of different output lengths. However, to calculate the&nbsp;<strong>KL divergence</strong>&nbsp;loss, these probability tensors need to be of the same length. Thus, KL divergence loss wasn't considered.</p><p></p></li></ul><p>The main manipulation was in different loss calculations. Initial experimentation showed that CE regularization loss is not beneficial for the TrOCR task. In total, 6 model variations were evaluated: initial (fine-tuned) model, rank loss, rank loss with if clause variants 1, 2, 3, and margin loss.</p><p>Rank loss and margin loss are calculated as explained in the above picture equations. Calibration loss using rank loss with if clause variants 1, 2, and 3 are calculated as follows:</p><ul><li><p>Variant 1</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dVf9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dVf9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 424w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 848w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 1272w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dVf9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png" width="1203" height="150" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/384c119d-5f17-42bb-8152-58494e55a649_1203x150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:150,&quot;width&quot;:1203,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47959,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dVf9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 424w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 848w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 1272w, https://substackcdn.com/image/fetch/$s_!dVf9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384c119d-5f17-42bb-8152-58494e55a649_1203x150.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p>Variant 2</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nMII!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nMII!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 424w, https://substackcdn.com/image/fetch/$s_!nMII!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 848w, https://substackcdn.com/image/fetch/$s_!nMII!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 1272w, https://substackcdn.com/image/fetch/$s_!nMII!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nMII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png" width="1162" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:1162,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:40419,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nMII!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 424w, https://substackcdn.com/image/fetch/$s_!nMII!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 848w, https://substackcdn.com/image/fetch/$s_!nMII!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 1272w, https://substackcdn.com/image/fetch/$s_!nMII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65a71ffc-3e0d-49e7-bff1-ea9481f667af_1162x136.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p>Variant 3</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bup0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bup0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 424w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 848w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 1272w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bup0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png" width="878" height="143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa704f21-f523-4dac-866b-04472c348363_878x143.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:143,&quot;width&quot;:878,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bup0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 424w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 848w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 1272w, https://substackcdn.com/image/fetch/$s_!Bup0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa704f21-f523-4dac-866b-04472c348363_878x143.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul><p>These loss variations are not mentioned in the paper but were examined from our side. There were several reasons for this modification of the rank loss from the paper. First, when both&nbsp;<strong>y</strong>&nbsp;candidates are correct (have CER 0), we don&#8217;t necessarily want to increase the generation probability of one of them, while decreasing the other. This is what happens with the default rank loss function. Instead, we motivate an increase in generation probabilities of either the first beam (variants 2 and 3) or both beams (variant 1).&nbsp;Second, when both y candidates are incorrect, we don&#8217;t necessarily want to increase the probability of any of them. Instead, we reduce both beams (variants 1 and 3) or just the first one (variant 2). Third, when the candidates are incorrect and an improvement between them, i.e. absolute difference of CERs, is minimal, we again might not want to increase one while decreasing the other (variants 1 and 2). For these reasons, different rank loss variations were added to mitigate these effects.</p><h1>Results</h1><p>Below is the overview of the performance of each of the models. All models, except the first (uncalibrated) one, are&nbsp;<strong>calibrated on the train set</strong>. All models are tested on the test set. Originally published&nbsp;<a href="https://huggingface.co/microsoft/trocr-base-handwritten">trocr-base-handwritten</a>&nbsp;model and&nbsp;<a href="https://fki.tic.heia-fr.ch/databases/iam-handwriting-database">IAM dataset</a>&nbsp;were used.<strong>&nbsp;An example is skipped during evaluation if any of the predicted tokens' probability is less than 50%. These examples are still included in the graphs, but are below denoted as&nbsp;</strong><em><strong>skipped</strong></em><strong>.&nbsp;</strong>Examples that aren&#8217;t skipped and have any CER &gt; 0 are classified as incorrect. Although TrOCR is a generative model with a vision encoder and text decoder, the behavior differs from the tasks used in the paper. Namely, they used abstractive summarization, question answering, and structured data-to-text datasets. These are more open-ended than the TrOCR task which has a strict ground truth. Thus, a different calibration loss proved to work better in our scenario.</p><h2>Initial (fine-tuned) model</h2><p>Average CER:  0.0385<br>Average sequence probability:  0.5496<br>Correct   :  902<br>Incorrect:  408<br>Skipped  :  551</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RX_E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RX_E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RX_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70451,&quot;alt&quot;:&quot;Image 2: Test set example distribution of the initial (fine-tuned-only) model&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 2: Test set example distribution of the initial (fine-tuned-only) model" title="Image 2: Test set example distribution of the initial (fine-tuned-only) model" srcset="https://substackcdn.com/image/fetch/$s_!RX_E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!RX_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a937729-87d6-48c3-83de-fd960eb45ee2_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 2: Test set example distribution of the initial (fine-tuned-only) model</figcaption></figure></div><div><hr></div><h2>Calibrated - rank loss</h2><p>Average CER:  0.0334<br>Average sequence probability:  0.6627<br>Correct  :  1001<br>Incorrect:  451<br>Skipped  :  409</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2vYb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2vYb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2vYb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61653,&quot;alt&quot;:&quot;Image 3: Test set distribution of the model calibrated using rank loss&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 3: Test set distribution of the model calibrated using rank loss" title="Image 3: Test set distribution of the model calibrated using rank loss" srcset="https://substackcdn.com/image/fetch/$s_!2vYb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!2vYb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23f5e202-a5a0-4734-91cf-12c3593cf7d8_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 3: Test set example distribution of the model calibrated using rank loss</figcaption></figure></div><div><hr></div><h2>Calibrated - rank loss if clause variant 1</h2><p>Average CER:  0.0313<br>Average sequence probability:  0.4965<br>Correct  :  972<br>Incorrect:  375<br>Skipped  :  514</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iLSv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iLSv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iLSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66809,&quot;alt&quot;:&quot;Image 4: Test set example distribution of the model calibrated using the if clause variant 1 loss&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 4: Test set example distribution of the model calibrated using the if clause variant 1 loss" title="Image 4: Test set example distribution of the model calibrated using the if clause variant 1 loss" srcset="https://substackcdn.com/image/fetch/$s_!iLSv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!iLSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1465263b-ecd0-48ba-b8b3-09dd0eea16e3_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 4: Test set example distribution of the model calibrated using the if clause variant 1 loss</figcaption></figure></div><div><hr></div><h2>Calibrated - rank loss if clause variant 2</h2><p>Average CER:  0.0338<br>Average sequence probability:  0.6724<br>Correct  :  1001<br>Incorrect:  480<br>Skipped  :  380</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LiBr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LiBr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LiBr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:62491,&quot;alt&quot;:&quot;Image 5: Test set example distribution of the model calibrated using the if clause variant 2 loss&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 5: Test set example distribution of the model calibrated using the if clause variant 2 loss" title="Image 5: Test set example distribution of the model calibrated using the if clause variant 2 loss" srcset="https://substackcdn.com/image/fetch/$s_!LiBr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!LiBr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52f2d890-af87-4b2a-bce6-9e5e897247df_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 5: Test set example distribution of the model calibrated using the if clause variant 2 loss</figcaption></figure></div><div><hr></div><h2>Calibrated - rank loss if clause variant 3</h2><p>Average CER:  0.0385<br>Average sequence probability:  0.6386<br>Correct  :  972<br>Incorrect:  461<br>Skipped  :  428</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lsVJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lsVJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lsVJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63604,&quot;alt&quot;:&quot;Image 6: Test set example distribution of the model calibrated using the if clause variant 3 loss&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 6: Test set example distribution of the model calibrated using the if clause variant 3 loss" title="Image 6: Test set example distribution of the model calibrated using the if clause variant 3 loss" srcset="https://substackcdn.com/image/fetch/$s_!lsVJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!lsVJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6926c8-7a8a-4df8-88d1-a77c3b0e88a4_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 6: Test set example distribution of the model calibrated using the if clause variant 3 loss</figcaption></figure></div><div><hr></div><h2>Calibrated - margin loss</h2><p>Average CER:  0.0299<br>Average sequence probability:  0.6321<br>Correct  :  1059<br>Incorrect:  381<br>Skipped  :  421</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!27-3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!27-3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!27-3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!27-3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!27-3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!27-3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59842,&quot;alt&quot;:&quot;Image 7: Test set example distribution of the model calibrated using margin loss&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 7: Test set example distribution of the model calibrated using margin loss" title="Image 7: Test set example distribution of the model calibrated using margin loss" srcset="https://substackcdn.com/image/fetch/$s_!27-3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!27-3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!27-3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!27-3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67a1fcf0-aa65-4311-ade6-f244726b8320_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 7: Test set example distribution of the model calibrated using margin loss</figcaption></figure></div><div><hr></div><h1>Conclusion</h1><p>Generative models also proved to be suitable for calibration. For the TrOCR problem specifically, margin loss yielded the best results. Compared to the initial, fine-tuned-only model, test set metrics improved by a large margin. Average CER decreased from 0.0385 to 0.0299 (22.3% decrease), average sequence probability increased from 0.5496 to 0.6321, number of correctly classified examples increased by 17.4%, number of incorrect ones decreased by 6.6%, and number of skipped examples dropped by 23.6%. Its graph also shows a more desirable 2-D CER-probability example distribution compared to the initially fine-tuned-only model. Thus, the method proved useful, not only for calibrating a generative model but also for increasing its performance in general.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI in Production! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[LinguaAI: AI-Powered Personalized Conversations for Comprehensive Language Learning]]></title><description><![CDATA[Beyond Traditional Learning: The GPT-4 Edge in Adaptive Language Tutoring for any Language and any Level.]]></description><link>https://blog.doxray.com/p/linguaai-ai-powered-personalized</link><guid isPermaLink="false">https://blog.doxray.com/p/linguaai-ai-powered-personalized</guid><dc:creator><![CDATA[Mihaela Bakšić]]></dc:creator><pubDate>Thu, 02 Nov 2023 18:57:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As the accessibility of Large Language Models (LLMs) via APIs surges, a proliferation of opportunities for AI-centric innovations becomes evident. Answering this call is the nascent discipline of prompt engineering, distinguishing itself from the more resource-intensive fine-tuning of LLMs. Prompt engineering demands less in terms of data, resources, and deep model development expertise, facilitating the swift deployment of robust production-grade solutions. </p><p>The project is open-sourced and available on <a href="https://github.com/doXrayAI/LinguaAI/tree/main">GitHub</a>.</p><p><strong>Prompt engineering</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wX1Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wX1Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 424w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 848w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 1272w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wX1Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png" width="1078" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1078,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wX1Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 424w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 848w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 1272w, https://substackcdn.com/image/fetch/$s_!wX1Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba8e8c99-b85b-4aba-9adb-cda3ac267978_1078x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Prompt engineering has been applied to various tasks, from arithmetic reasoning to domain-specific question answering. At its most fundamental level, a prompt is the input to the language model that shapes the response of the model. For instance, the input &#8220;In the afternoon it will &#8230;&#8221; yields a response such as &#8220;In the afternoon, it will typically become warmer and brighter as the day progresses, assuming it's a typical sunny day. However, the specific weather conditions can vary depending on your location, time of year, and local climate patterns.&#8221; In the same vein, prompts can serve various purposes, including facilitating question answering (Figure 1) and fostering creativity in tasks like crafting poems (Figure 2).&nbsp;</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wDPg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wDPg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 424w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 848w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 1272w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wDPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png" width="373" height="158" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:158,&quot;width&quot;:373,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wDPg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 424w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 848w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 1272w, https://substackcdn.com/image/fetch/$s_!wDPg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fade95ebf-b6d3-4d75-ad22-50123d54366a_373x158.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Figure 1: Question answering task using ChatGPT<br><br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_V2P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_V2P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 424w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 848w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 1272w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_V2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png" width="714" height="589" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:589,&quot;width&quot;:714,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_V2P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 424w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 848w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 1272w, https://substackcdn.com/image/fetch/$s_!_V2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31df8255-ac9c-4b09-9e95-6cd4313984f8_714x589.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 2: Creative writing task using ChatGPT</p><p></p><p>As task intricacy escalates, there's a commensurate increase in prompt complexity. This frequently necessitates task decomposition, nuanced contextualization, and the meticulous selection of phrasing within the prompt.</p><p>Several techniques for prompt engineering leverage the well-known capabilities of deep neural networks, such as zero-shot and few-shot learning. Zero-shot learning is the ability of deep networks to produce correct solutions for unseen tasks. In the realm of prompt engineering, this entails providing a task description and a solitary input example to solicit the desired output. For more intricate tasks, though, significant enhancements in model performance can be achieved by furnishing a small set of input-output examples, thereby creating a few-shot learning scenario. Figures 2 and 3 illustrate concrete instances of zero-shot and few-shot learning in action.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NU4L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NU4L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 424w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 848w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 1272w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NU4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png" width="689" height="223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:223,&quot;width&quot;:689,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NU4L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 424w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 848w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 1272w, https://substackcdn.com/image/fetch/$s_!NU4L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a179719-7439-44d7-8829-c4cea2ea9134_689x223.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Figure 3: Zero-shot learning task with ChatGPT</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VQFq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VQFq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 424w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 848w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 1272w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VQFq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png" width="520" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:520,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VQFq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 424w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 848w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 1272w, https://substackcdn.com/image/fetch/$s_!VQFq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F782d0855-df32-4e64-89a7-a79b4bc7b525_520x308.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 4: Few-shot learning task with ChatGPT</p><p>Overall, when developing quality prompts, there is no one-size-fits-all solution. Crafting a high-quality prompt is a dynamic and iterative process tailored to specific use cases. Nevertheless, there have been efforts to formulate and systemise effective prompt creation methods, resulting in the formulation of prompt patterns,&nbsp; a prompt engineering equivalent of design patterns.&nbsp;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.doxray.com/subscribe?"><span>Subscribe now</span></a></p><p><strong>Application of prompt engineering to a personalised language learning task</strong></p><p>This article delves into prompt engineering techniques for the development of a language learning chatbot powered by GPT models. The objective is to create a highly adaptable chatbot tailored for language learning and practice via prompt engineering. Users should have the flexibility to select their desired language, language proficiency level, and the context in which they want to engage with the chatbot. These three key components collectively shape the conversational context. The ultimate goal is for the chatbot to seamlessly generate coherent and contextually appropriate conversations that align with the chosen language proficiency level. Illustrative conversations conducted with the LinguaAI chatbot can be found in Figures 5 and 6.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VmMG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VmMG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 424w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 848w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 1272w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VmMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png" width="773" height="807" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:807,&quot;width&quot;:773,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VmMG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 424w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 848w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 1272w, https://substackcdn.com/image/fetch/$s_!VmMG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dff57fb-757c-4d82-b995-ff12339676d7_773x807.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 5: Conversation with LinguaAI &#8220;In a restaurant&#8221;. The conversation is carried out in English on B2 level.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kCO9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kCO9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 424w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 848w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 1272w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kCO9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png" width="774" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:774,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kCO9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 424w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 848w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 1272w, https://substackcdn.com/image/fetch/$s_!kCO9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0000cc4b-0c3d-48cc-98b9-3e2dafa59607_774x870.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 6: Conversation with LinguaAI &#8220;In a restaurant&#8221;. The conversation is carried out in French on A2 level.</p><p>Before going into detail about implementation methods, it is imperative to underscore the core objectives and essential attributes that we seek to achieve with our solution. The fundamental requirement is that the conversation follows the prescribed dialogue format, yielding a single, grammatically correct response per chatbot prompt. We'll refer to these as "format requirements" henceforth. Furthermore, we wish the chatbot to possess common language skills such as fluency and coherence and we wish the language proficiency level to be adhered to. Lastly, the chatbot should be able to embody its designated role effectively, employing common phrases, vocabulary, and a speech style befitting the assigned persona or context.</p><p>Taking into account the qualities of an effective language-learning chatbot as outlined above, the proposed solution comprises the following key subtasks:</p><ul><li><p>Asserting user inputs</p></li><li><p>Inferring conversation roles from the setting description</p></li><li><p>Main chatbot dialogue generation</p></li><li><p>Chatbot response refinements</p></li></ul><p>The LinguaAI chatbot architecture, outlined in Figure 7, consists of several components, each responsible for carrying out one of the aforementioned subtasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YXh3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YXh3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 424w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 848w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 1272w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YXh3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png" width="731" height="681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:731,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YXh3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 424w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 848w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 1272w, https://substackcdn.com/image/fetch/$s_!YXh3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe629fa92-2edb-46c2-a5a7-a4a9e5d4bce0_731x681.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 7. LinguaAI chatbot architecture</p><p><em><strong>Asserting user inputs</strong></em></p><p>Given the open-ended nature of the user's description of the conversational context, it is essential to verify its suitability and clarity before moving forward. To address this, the solution involves the implementation of a straightforward binary classifier integrated with a Large Language Model (LLM). In this setup, an output of "1" signifies that the input qualifies as a situation, place, or activity description, typically involving two interlocutors, while an output of "0" indicates that it does not meet these criteria.</p><p><em><strong>Role inference</strong></em></p><p>Once the user input has been validated, the context is then passed to the role inference model. The purpose of this step is to enhance the natural flow of the dialogue and facilitate understanding of the task for both the model and the user. This is achieved by inferring two distinct, natural roles, representing the participants in the discussion tailored to the given context. The role designated to guide the conversation is assigned to the chatbot, while the other role is allocated to the user. This allocation ensures a more coherent and intuitive interaction.</p><p><em><strong>Main chatbot dialogue generation</strong></em></p><p>After role inference, we have all relevant information at our disposal and we are able to commence the dialogue. To initiate the dialogue, a zero-shot task definition is used as a prompt. The information included in the prompt is the full conversation context, the Persona Pattern instructions, the CEFR language level definition and the format guidelines. Every subsequent prompt is expected to be a user&#8217;s message. In response, the chatbot generates a single conversational line as its reply.</p><p><em><strong>LLM response refinement</strong></em></p><p>Following the initial response generated by the primary chatbot, a series of targeted refinements is applied to the response. This iterative refinement process is motivated by the fact that it results in improved performance of Large Language Models (LLMs) compared to the baseline zero-shot approach.</p><p>The goal of individual refinements is to ensure the outlined properties of a good language-learning chatbot are met. The first refinement in the pipeline is designed to confirm that the assigned role is adhered to, while the subsequent refinement focuses on ensuring that the appropriate language proficiency level is maintained.</p><p>We have observed that common language skills such as fluency and coherence and the format are satisfied for the vast majority of text produced by the model, so no refinement was performed for this purpose.</p><p><em><strong>The fundamental role fitness and language level correlation</strong></em></p><p>Our refinements are primarily directed at two crucial properties: language proficiency level and role fitness. It's essential to emphasise the inverse relationship between these two properties. We prioritise language proficiency level because it forms the foundation of the language learning process, and in most cases, generic roles do not significantly suffer as a result. However, for more specific roles, such as mimicking a particular individual, there may be a trade-off where some nuances in style and tone of the text could be sacrificed in favour of maintaining the desired language proficiency level.</p><p><strong>Refinement prompts alternative evaluation</strong></p><p>Recognizing that the input space for a refinement pipeline is virtually limitless and that there are multiple equally valid prompt solutions, we found that simply experimenting with prompts to assess refinement quality was no longer sufficient. Our objective was to demonstrate that refining responses genuinely enhances their quality in terms of role fitness and language proficiency level matching.</p><p>To achieve this, we created a dataset that consists of 18 dialogues, with three dialogues for each CEFR language level, for each refinement prompt alternative. These datasets were automatically generated by supplying a chatbot without a refinement pipeline to act as a user. The evaluation process involved the use of two evaluation bots developed through prompt engineering. Each message generated from the assessed prompt received a score, and the scores were then averaged for each conversation. This approach allowed for a more comprehensive and objective assessment of refinement quality. The results somewhat differed depending on the GPT model used.</p><p>When dealing with GPT-4, it was observed that all refinements and initial responses exhibited a near-perfect language proficiency level. However, refinements introduced only moderate improvements in terms of role fitness.</p><p>On the other hand, with GPT-3.5-turbo, refinements led to significant enhancements in both language proficiency and role fitness criteria. Nevertheless, GPT-3.5-turbo encountered difficulties in maintaining text simplicity for the A1 and A2 language levels. To address this issue, a text simplification refinement bot was added to the end of the pipeline. This bot is designed to activate exclusively for A1 and A2 language levels, ensuring that the text remains basic and easy to understand for users at these proficiency levels.</p><p><strong>Challenges Encountered During the Development Process</strong></p><p>Fine-tuning the temperature setting to suit the specific task at hand is a common practice when working with Large Language Models (LLMs). However, when adjusting the temperature, we encountered a dual effect. Increasing the temperature elevates the likelihood of format requirements not being met, but it also introduces more variability in the chatbot's responses. This variability allows two dialogues with identical context to evolve in various directions, which is valuable for the language learning process.</p><p>In the end, we opted for a moderate temperature setting for the model. It's important to note that this issue was more pronounced with GPT-3.5 compared to GPT-4.</p><p>Furthermore, it was challenging to develop an evaluation process for prompt performance. To automate the score assignment process, the quality of evaluation bots had to be manually checked beforehand. This was performed using small manually crafted datasets targeting the edge cases. Given that a certain lack of trust should be present for such manual construction, the results of the evaluation process have been taken with a grain of salt. Several best-performing prompt alternatives have been further examined in a real-use setting before making small adjustments to prompts and selecting the most effective one.</p><p><strong>Future developments</strong></p><p>There are several ways to extend the functionality of the chatbot and improve its user utility for language learning.&nbsp;</p><ul><li><p>Feedback constitutes a foundational stage in the learning process. User messages can be assessed and constructive feedback can be generated to enhance the user's experience.</p></li><li><p>Speaking and listening are crucial aspects of learning a foreign language because they help learners develop their oral communication skills and fluency, enabling them to engage in real-life conversations and effectively convey their thoughts and ideas. These skills also enhance comprehension, pronunciation, and cultural understanding, making the language learning experience more immersive and practical. Therefore, to further enhance LinguaAI, it's essential to incorporate the ability to record and listen to messages, coupled with robust text-to-speech and speech-to-text technologies.&nbsp;</p></li><li><p>The current refinement process involves initiating an API call for each refinement, leading to slower response times for users. It is imperative to explore strategies that can enhance response speed, either by minimising the frequency of API calls or by considering alternative approaches.</p></li><li><p>Adjusting the temperature parameter is essential to influence the diverse characteristics of the generated responses.</p></li></ul><p><strong>Key takeaways</strong></p><p>Prompt engineering with LLMs such as GPT has allowed us to quickly develop complex AI based solutions. Building a customizable chatbot for language learning relies on several different prompt engineering approaches, from zero-shot learning to iterative refinements. The development process involves user input assessment, role inference, main chatbot dialogue generation, and response refinements. Notably, the interplay between language level and role fitness is crucial and adjustments in temperature settings influence the chatbot's behaviour. To enhance the language learning experience for users, it's beneficial to expand LinguaAI with feedback, progress tracking and audio capabilities. These features can provide users with valuable insights into their language learning journey and help them make meaningful improvements over time.</p>]]></content:encoded></item><item><title><![CDATA[Multi-Task Learning with Intermediate Continual Learning for Industry NLP Use Cases]]></title><description><![CDATA[Utilizing adapters and hypernetworks to efficiently, effectively, and continuously train multiple tasks]]></description><link>https://blog.doxray.com/p/multi-task-learning-with-intermediate</link><guid isPermaLink="false">https://blog.doxray.com/p/multi-task-learning-with-intermediate</guid><dc:creator><![CDATA[Tin Ferković]]></dc:creator><pubDate>Mon, 21 Aug 2023 07:16:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!L_Sj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_rZK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_rZK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 424w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 848w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 1272w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_rZK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png" width="1033" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:1033,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60295,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_rZK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 424w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 848w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 1272w, https://substackcdn.com/image/fetch/$s_!_rZK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a6489ab-9f1d-480a-a649-6be8e50e8e7d_1033x292.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image 0: a) Shared Encoder, b) Adapter, c) Hypernetwork.</figcaption></figure></div><h4>Background and motivation</h4><p>Since the introduction of the pre-train and fine-tune transfer learning paradigm introduced by <a href="https://arxiv.org/abs/1810.04805">Devlin et al. (2019)</a>, the approach has been preserved until now. There are now different generative methods, but they scale well with an increase in model size <a href="https://arxiv.org/abs/2104.08691">(Lester et al., 2021)</a>, not making them accessible to everyone. The problem with the mentioned transfer learning paradigm is that it requires fine-tuning a new large language model (LLM) for each new task, which is unsustainable in terms of time, storage, and energy.</p><p>We would like to find an efficient, yet effective <strong>multi-task learning (MTL)</strong> method which would be able to handle learning multiple tasks using less time and storage space, eventually leading to less energy consumption. All this should be achieved while preserving the performance as in single-task learning (STL). The method should be easily accessible to all small to medium-sized companies, so it should be able to run on a single or few mid-range graphics processing units (GPUs). For that reason, the method should be <strong>discriminative</strong>, while generative ones require too large LLMs, e.g. 175B parameters for GPT-3 (<a href="https://arxiv.org/abs/2005.14165">Brown et al., 2020</a>) and 540B for Flan-PaLM (<a href="https://arxiv.org/abs/2210.11416">Chung et al., 2022</a>). Additionally, as such companies often face distribution shifts over time and continuously-incoming requirements from their clients, the method should be liable to <strong>continual learning (CL)</strong>.</p><p>In this blog, we write about finding such a method, which will be able to handle training multiple tasks jointly while also being liable to training new ones continuously. This CL does not necessarily have to be as good as lifelong learning, but only needs to cover a period of 6-12 months of new requirements. After such a period, the MTL can again be re-trained using all the previous MTL data and the newly available CL data. This method should take less time, use less storage space, be able to train on a single or few mid-range GPUs, and keep the performance at least on par with STL.</p><div><hr></div><h4>Data</h4><p>Two types of tasks were examined in this work &#8212; sequence classification (<strong>CLS</strong>) and token classification, i.e. <strong>NER</strong>. Three datasets were available for the CLS task, while four were available for the NER task, resulting in a total of seven datasets. These datasets were then split into MTL and CL (data-, task-, and class-incremental), resulting in the total of 10 datasets. A multilingual version of BERT was used to to the presence of multiple languages.</p><div><hr></div><h4>Methodology</h4><h5>Adapters</h5><p>Adapters take a completely different approach to MTL. Instead of sharing as many parameters as possible, as done in the shared encoder approach, adapters share no parameters at all, besides the pre-trained model. Alternatively, adapters train as few parameters as possible and inject them cleverly into the Transformer architecture. This makes adapters modular, less resource-intensive, and faster to train. Most importantly, performance reported for the GLUE benchmark (<a href="https://arxiv.org/abs/1804.07461">Wang et al., 2018</a>) does not suffer and stays on par with STL (<a href="http://proceedings.mlr.press/v97/houlsby19a.html">Houlsby et al., 2019</a>). Two of the function-composition methods will be examined in this work &#8212; bottleneck adapter (<a href="https://arxiv.org/abs/2005.00052">Pfeiffer et al., 2020c</a>) and Compacter++ (<a href="https://proceedings.neurips.cc/paper/2021/hash/081be9fdff07f3bc808f935906ef70c0-Abstract.html">Karimi Mahabadi et al., 2021</a>).</p><h6>Pfeiffer configuration</h6><p>The architecture of the bottleneck adapter will be the same as the one initially introduced in <a href="http://proceedings.mlr.press/v97/houlsby19a.html">Houlsby et al. (2019)</a>, with one exception. Instead of inserting the adapter after both MHA and FF sub-layers, it will only be inserted after the latter. The reason is a two times reduction in trainable parameters, while the performance on GLUE and SuperGLUE even increases slightly (<a href="https://proceedings.neurips.cc/paper/2021/hash/081be9fdff07f3bc808f935906ef70c0-Abstract.html">Karimi Mahabadi et al., 2021</a>). A visual representation of an adapter is shown in Figure 1, where only the adapter after the Transformer&#8217;s FF sub-layer is used.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L_Sj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L_Sj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 424w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 848w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 1272w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L_Sj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png" width="220" height="327.4831763122476" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1106,&quot;width&quot;:743,&quot;resizeWidth&quot;:220,&quot;bytes&quot;:43737,&quot;alt&quot;:&quot;Alt&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Alt" title="Alt" srcset="https://substackcdn.com/image/fetch/$s_!L_Sj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 424w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 848w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 1272w, https://substackcdn.com/image/fetch/$s_!L_Sj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bc540c8-a77c-4402-99f5-1f515d1314d3_743x1106.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 1: Bottleneck adapter architecture (Pfeiffer configuration)</figcaption></figure></div><h6>Compacter++ configuration</h6><p>Compacter++ is inserted in the same place as the aforementioned adapter, but the parameter formulation is different. It uses parameterized hyper-complex multiplication (PHM) layers. These layers have a similar form as FC layers, with the key difference being that the weights are learned as a sum of Kronecker products. They decompose adapters into A and B matrices, where A is shared across all adapter layers, while B matrices have adapter-specific parameters. Additionally, they parameterize B as a low-rank matrix. By combining the two, low-rank parameterized hypercomplex multiplication (LPHM) layer is shown on Figure 2 and formulated here:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;W = \\sum_{i=1}^n A_i \\otimes B_i = \\sum_{i=1}^n A_i \\otimes s_i t_i^\\top&quot;,&quot;id&quot;:&quot;GZMGESSMUX&quot;}" data-component-name="LatexBlockToDOM"></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3oRq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3oRq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 424w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 848w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3oRq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png" width="308" height="272.655737704918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1026,&quot;width&quot;:1159,&quot;resizeWidth&quot;:308,&quot;bytes&quot;:47272,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3oRq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 424w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 848w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!3oRq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa075ff07-69b4-4556-88b5-0d584276cfcb_1159x1026.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 2: Compacter++ architecture</figcaption></figure></div><h6>Adapter Fusion</h6><p>Once all the adapters for the MTL part of the training have been trained, one could ask themselves if these adapters can somehow be utilized for learning an incoming CL task. The AdapterFusion (<a href="https://arxiv.org/abs/2005.00247">Pfeiffer et al., 2020a</a>) approach deals with training a fusion of previously trained adapters in order to solve a single target task. It is a different approach to transfer learning. Figure 3 shows how the Adapter Fusion is inserted into each Transformer layer. Newly introduced fusion parameters learn to combine previously trained adapters as a dynamic function of the target task data. Similarly to attention (<a href="https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html">Vaswani et al., 2017</a>), a contextual activation of each adapter is learned. Given the adapters and target task data, AdapterFusion learns a parameterized mixer of the available adapters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pLr3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pLr3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 424w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 848w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 1272w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pLr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png" width="418" height="347.39731285988483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:866,&quot;width&quot;:1042,&quot;resizeWidth&quot;:418,&quot;bytes&quot;:292786,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pLr3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 424w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 848w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 1272w, https://substackcdn.com/image/fetch/$s_!pLr3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F640cf149-0365-44cb-9165-9c7ed72790c2_1042x866.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3: AdapterFusion architecture</figcaption></figure></div><h5>Hypernetworks</h5><p>Hypernetwork is a bridge between the shared encoder and adapters &#8212; it shares parameters through a hypernetwork, but also keeps and generates task-specific parameters. A simple visual overview of the hypernetwork approach is shown in Figure 4. Task embeddings are kept task-specific, while the hypernetwork, depending on the given embedding and its own parameters, generates target model parameters. In our case, a target model is the bottleneck adapter.</p><p>A hypernetwork approach that we used is from <a href="https://arxiv.org/abs/2205.12148">&#220;st&#252;n et al. (2022)</a>. The architecture is shown in Figure 4. Besides the <strong>task embedding</strong>, their work also introduces <strong>language and layer embeddings</strong>. Separating the task embeddings from language embeddings enables transfer to arbitrary task-language combinations at any time. In our experiments, language embeddings were kept the same for all the tasks, but were also be trained end-to-end. Layer embeddings were introduced in order to avoid having a different hypernetwork at each Transformer layer. This way, layer embeddings significantly reduce the number of parameters and allow for information sharing across layers. During training, only the corresponding task and language embedding were updated for each batch, depending on the task and language that the batch was sampled from. Additionally, layer embedding corresponding to the current Transformer layer is also updated. Task, language, and layer embeddings are concatenated and fed into a source projector network Ps, consisting of two FF layers and a ReLU activation. The idea of the Ps component is to reduce the size of the input embedding and potentially learn the interactions between the task, language, and layer embeddings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hbsS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hbsS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 424w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 848w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 1272w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hbsS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png" width="376" height="321.99457994579944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:948,&quot;width&quot;:1107,&quot;resizeWidth&quot;:376,&quot;bytes&quot;:154236,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hbsS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 424w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 848w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 1272w, https://substackcdn.com/image/fetch/$s_!hbsS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8bef92c-2d2e-44de-83ee-ceab492de85b_1107x948.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 4: Hyper-X architecture from <a href="https://arxiv.org/abs/2205.12148">(&#220;st&#252;n et al., 2022)</a>. The hypernetwork (1) takes the concatenation of task, language, and layer embeddings as input and generates a flat parameter vector. Before the final transformation, the source projector network projects the combination of these embeddings to a smaller dimension. The parameter vector is then reshaped and cast to weights of the adapter (2), which are inserted into a Transformer layer (3).</figcaption></figure></div><p>Besides the FF down- and up-projection matrices, a layer norm is also generated and trained via a hypernetwork. A slightly larger learning rate is used due to randomly initialized weights in a hypernetwork. A pre-trained encoder-based model is frozen. In order to achieve heterogeneous batches, gradients are accumulated over 4 steps before making an optimizer step. This way, a shared hypernetwork achieves gradient directions from multiple tasks before eventually following these directions. This should mitigate catastrophic interference, which is already largely mitigated by not sharing the parameters directly. Instead, a proxy, i.e. a hypernetwork, is shared. Instead of size-proportional sampling, temperature-based sampling with T=2 was used. Additionally, all CLS losses were divided by 5.0 and each point-loss by log(n), where n is the number of possible labels for the corresponding task. This was done to ensure a more stable training, regardless of the task type or number of possible labels.</p><p>When continuously training new tasks on top of the jointly trained hypernetwork model, <strong>CL regularization loss</strong> is added. This loss term is calculated using the adapter and layer normalization parameters generated by the hypernetwork, not the hypernetwork and embedding parameters directly. The reason is that the change in embedding and hypernetwork parameters is not important, as long as the parameters generated by these components do not change by much. The generated parameters are the ones being used in the model&#8217;s forward pass when the textual input is given.</p><p>There were two similar CL regularization methods examined, named type 1 and type 2 CL methods. Type 1 CL method penalized the sum of squared differences between the generated parameters before and after each optimizer step. This approach is expected to suffer in performance due to generated parameters changing slightly in each step during training. This leads to regularization loss being calculated using the generated parameters before the latest optimizer step, which could largely differ from the actually generated parameters before the beginning of training. Thus, the type 2 CL method compares the parameters generated before the start of training with the currently generated ones and calculates the loss between the two.</p><div><hr></div><h4>Results</h4><h5>Overview</h5><p>The overview of the results is shown in Table 1. It can be seen that STL, adapters, Adapter Fusion, and hypernetwork <strong>all perform similarly</strong> within one macro average F1-score point.  The hypernetwork CL performance downgrade is also shown in the table. The results shown are after two rounds of CL with three and two tasks respectively. The performance downgrade is reasonable for high-resource NER tasks, but becomes overly amplified for low-resource CLS datasets (130-475 training examples). A shared encoder approach has significantly lower results and suffers from an inability to efficiently continuously learn new tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WA9v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WA9v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 424w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 848w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 1272w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WA9v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png" width="600" height="378.7087912087912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:919,&quot;width&quot;:1456,&quot;resizeWidth&quot;:600,&quot;bytes&quot;:184143,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WA9v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 424w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 848w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 1272w, https://substackcdn.com/image/fetch/$s_!WA9v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a66d7d1-0563-48a6-919d-59eda8864572_1465x925.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Table 1: Best test set macro average F1-score results overview from each of the methods examined. The shared encoder approach is trained jointly with all 10 tasks. Pfeiffer configuration was used for both adapters and Adapter Fusion. Hypernetwork approach results for MTL datasets are reported after joint MTL (left) and after both rounds of CL (right).</figcaption></figure></div><p>STL does not show any superiority over adapters and hypernetworks. Additionally, training it takes longer and requires more trainable parameters, as shown in Table 2. Although a compacter++ configuration is slightly more efficient than a Pfeiffer one, the results are notably different (see Table 3). Adapter Fusion, other than the fact that it does not reach the performance of ST-adapters, is also longer to train and much more parameter heavy due to the fusion of all the available adapters. Here, it should be noted that a fusion for CL CLS tasks consisted of two previously trained CLS adapters, while a fusion for CL NER tasks consisted of three NER adapters. These are small numbers, and a fusion can be made of even more adapters, which then accordingly requires longer training and inference times. A shared encoder approach does use more parameters than other non-STL methods, due to the whole encoder. Its training time is additionally the shortest, but due to the usage of 16-bit mixed precision. Hypernetwork ensures 91.38% number of trainable parameter reduction. Training time is <em>only</em> reduced by 15%, making adapters faster to train. This training time increases during CL, especially when there are many MTL tasks whose forgetting should be mitigated.</p><p>A longer hypernetwork CL training time due to CL regularization loss also needs to be taken into account as a downside. Generating all the possible previous adapters of all the previous tasks after each optimizer step is an expensive operation. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qDEt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qDEt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 424w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 848w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 1272w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qDEt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png" width="610" height="222.46565934065933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:531,&quot;width&quot;:1456,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:149263,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qDEt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 424w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 848w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 1272w, https://substackcdn.com/image/fetch/$s_!qDEt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bddfaff-08b6-4b8b-8d24-a32193d43db7_1485x542.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Table 2: Comparison of methods training time and the trainable number of parameters. Training time is calculated for a single epoch and is summed over all 5 of the datasets in case of separate training (STL, adapters, Adapter Fusion). The comparison is conducted using all 5 available CL datasets only, since Adapter Fusion can only be run for these datasets. Absolute numbers would differ for MTL datasets, but relative comparison stays the same. All the experiments were run on a single Tesla T4 GPU with 15 GB of VRAM. Asterisk (*) indicates a method using 16-bit Automatic Mixed Precision (AMP).</figcaption></figure></div><h5>Adapters</h5><p>A performance comparison of the Pfeiffer and compacter++ adapter configuration is shown in Table 3. Pfeiffer configuration, the one that does not further reduce the number of parameters by using LPHM layers, consistently outperforms the compacter++ configuration.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ynjv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ynjv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 424w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 848w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ynjv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png" width="606" height="275.685534591195" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:954,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:80008,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ynjv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 424w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 848w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynjv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcc6a212-3199-4f51-8ca3-886e5c077056_954x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Table 3: Adapter test set macro average F1 and accuracy comparison. For CLS datasets, accuracy is also reported as a number after slash (/). Average metric is calculated for macro average F1 score only.</figcaption></figure></div><h5>Adapter Fusion</h5><p>Adapter Fusion test set performance comparison is shown in Table 4. As expected, the fusion of Pfeiffer configuration adapters also performs better or equal for all the datasets than the one of compacter++ adapters. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ykk7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ykk7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 424w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 848w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 1272w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ykk7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png" width="377" height="249.59098786828423" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:577,&quot;resizeWidth&quot;:377,&quot;bytes&quot;:51754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ykk7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 424w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 848w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 1272w, https://substackcdn.com/image/fetch/$s_!ykk7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d297095-3e46-41aa-bfab-1113f0f5c5e7_577x382.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Table 4: Adapter Fusion test set macro average F1 and accuracy comparison. For CLS datasets, accuracy is also reported as a number after slash (/). Average metric is calculated for macro average F1 score only.</figcaption></figure></div><div><hr></div><h4>Conclusion</h4><p>Adapters and hypernetworks proved as more efficient and equally effective approaches as STL. The adapter approach treats each MTL and CL task with a separate adapter, while hypernetwork separates MTL and CL phases. Adapters with Pfeiffer configuration reached a performance of STL. The hypernetwork forgetting rate for previously trained tasks was up to 2 macro average F1-score points, while low-resource CLS tasks with only 130-475 training examples faced losses of up to 35 percentage points. When continuously training a hypernetwork on new tasks, they mostly reached satisfactory performances, with extremely low-resource CLS tasks again struggling.</p><p>The proposed adapters approach is limiting in its similarity to STL, with a reduction of 28.4% in training time and 99.16% in trainable parameters. A separate adapter is deployed for each task. On the other hand, hypernetwork approach reduces the training time by 15% and trainable parameters by 91.38% and separates the joint MTL and CL phases. Additionally, a hypernetwork allows for some valuable knowledge sharing. However, during the CL phase, it struggles with low-resource CLS tasks trained during both MTL and CL phases.</p><p>These findings can be practically incorporated in companies serving many clients and their use cases and requirements. It allows them to train the models faster using only one or few mid-range GPUs, save the models using less storage, integrate them easier with ML systems, and continually train them, all whilst preserving an on-par performance with STL.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI in Production! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Exploring State-of-the-Art Models for Document-Level Entity Relation Extraction]]></title><description><![CDATA[Document-level entity relation extraction involves identifying and categorizing relationships between entities mentioned within whole documents, rather than individual sentences.]]></description><link>https://blog.doxray.com/p/exploring-state-of-the-art-models</link><guid isPermaLink="false">https://blog.doxray.com/p/exploring-state-of-the-art-models</guid><dc:creator><![CDATA[Dunja Smigovec]]></dc:creator><pubDate>Tue, 08 Aug 2023 07:30:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HMBM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Document-level entity relation extraction involves identifying and categorizing relationships between entities mentioned within whole documents, rather than individual sentences.</p><p>Document-level entity relation extraction presents unique challenges due to complex dependencies and context beyond individual sentences. Entities may exhibit relationships across different sections of a document, making it essential to capture meaningful connections at a broader scope.</p><p>By investigating state-of-the-art generative models, we aim to enhance our understanding of entity relations within documents and unlock the potential for more comprehensive information extraction.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HMBM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HMBM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 424w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 848w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 1272w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HMBM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png" width="525" height="240" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:875,&quot;resizeWidth&quot;:525,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HMBM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 424w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 848w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 1272w, https://substackcdn.com/image/fetch/$s_!HMBM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fc75111-ed69-4777-a5ad-369bebb7b8ba_875x400.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a><figcaption class="image-caption">Image source: <a href="https://medium.com/@roiyeho/generative-vs-discriminative-models-35b81f677822">Generative vs. Discriminative Models by Dr. Roi Yehoshua</a></figcaption></figure></div><p>The dataset consists of 4883 annotated invoice-type documents written in German, where each document has been manually annotated with entity groups. We split the dataset into train, validation, and test datasets, with 3418 documents, 722 documents, and 733 documents, respectively. The dataset consists of the following features:</p><ol><li><p><strong>Document:</strong> This feature is a dictionary containing the document's ID and filename, providing essential metadata for easy identification and organization.</p></li><li><p><strong>Tokens:</strong> Represented as a list of strings, this feature holds the tokens present in the document, allowing for a detailed analysis of the textual content.</p></li><li><p><strong>Token IDs:</strong> Each token in the document is assigned a unique ID, enabling efficient indexing and retrieval of information.</p></li><li><p><strong>Page IDs and Line IDs:</strong> These features are lists of integers that represent the page numbers and line numbers where the tokens appear within the document, providing spatial context to the data.</p></li><li><p><strong>Annotated Entities:</strong> This feature is a list of dictionaries that represent the named entities identified within the document. Each dictionary contains information such as the entity's name, instances of the entity (with unique IDs and corresponding text), and the token IDs associated with each instance.</p></li><li><p><strong>Annotated Entity Groups:</strong> This feature is a list of dictionaries that represent the entity groups found within the document. Each dictionary includes the group's name, instances of the group (with unique IDs and associated entity IDs), and the entity IDs corresponding to each group instance.</p></li></ol><p>Minimal preprocessing was performed, primarily involving structuring the annotated entity groups by converting entity IDs to their corresponding words from the text.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3AOv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3AOv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 424w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 848w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 1272w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3AOv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512" width="394" height="394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:512,&quot;width&quot;:512,&quot;resizeWidth&quot;:394,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3AOv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 424w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 848w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 1272w, https://substackcdn.com/image/fetch/$s_!3AOv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6beff825-e8bf-4e8c-b386-aae3c3b51ce0_512x512 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Invoice-type document</figcaption></figure></div><p>One of the major challenges encountered during preprocessing was the constraint of fitting the entire document within the models' input, given their maximum input length of 512 tokens. Despite attempts to chunk the documents and their associated groups, it was found that approximately 94% of the documents had groups that resided within the first 512 tokens. The inclusion of empty chunks not only prolonged the training time but also introduced considerable noise to the dataset. Consequently, the decision was made to extract only the initial 512 tokens from each document, ensuring the feasibility of the task while maintaining data integrity.</p><h2>Task Modeling</h2><p>This study approaches relation extraction as the extraction of entity groups from a document, where each group consists of an item or concept along with associated numerical values. Unlike tasks relying on semantic associations, here, the relationships depend primarily on the text layout. This structural aspect adds complexity, demanding an effective understanding of text hierarchy to identify and extract accurate entity groups. By framing relation extraction in this context, the study aims to explore targeted approaches to address these unique challenges and enhance the accuracy of relation extraction algorithms and models.</p><h2>Experiments</h2><p>In this study, we conducted three experiments to explore different approaches for relation extraction (RE) in the context of document-level entity relation extraction. Each experiment aimed to investigate the effectiveness of specific methods in identifying and grouping entities based on their relationships within the document text. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="395" height="262.8108465608466" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:4024,&quot;width&quot;:6048,&quot;resizeWidth&quot;:395,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;clear glass jar with multi colored heart shaped candies&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="clear glass jar with multi colored heart shaped candies" title="clear glass jar with multi colored heart shaped candies" srcset="https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1616458964840-5108e4d3adb3?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxleHBlcmltZW50fGVufDB8fHx8MTY5MTQ4MDUwOHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>NER + RE Experiment:</strong> In the first experiment, we combined Named Entity Recognition (NER) with RE. This approach simulates a realistic scenario where entities are recognized and labeled within the document text. By leveraging NER, we aimed to add complexity to the task, requiring the model to recognize entities and their relationships based on the provided textual context.</p></li><li><p><strong>Tags + RE Experiment:</strong> The second experiment utilized specific tags to emphasize entities within the document text. This approach eliminated the need for a separate NER task, as entities were explicitly marked with tags. By doing so, we aimed to simplify the process of entity recognition and focus solely on the RE aspect of the task.</p></li><li><p><strong>Entities + RE Experiment:</strong> In the third experiment, we explored the use of annotated entities within the document while disregarding the entire document text. The focus here was on the arrangement and grouping of identified entities to uncover underlying patterns and dependencies. By isolating the entities and their relationships, we aimed to understand the influence of entity information alone on the RE task.</p></li></ol><h2>Prompting</h2><p>In addition to the three experiments, we utilized prompting, a technique involving specific instructions or queries to guide the language model's generation process. Prompting serves as a context for coherent and relevant responses. The model can operate in both zero-shot and one-shot settings.</p><p>In zero-shot, the model generates responses without any specific examples, relying solely on the provided instructions. In one-shot, a single example is given to guide the model's comprehension and generation.</p><p>The choice of prompt significantly influences the model's output, making it a crucial factor in our study. By incorporating prompting, we gained insights into how it affects the model's performance in document-level entity relation extraction, enhancing our understanding of its adaptability and responsiveness to different prompts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="293" height="439.5" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:5760,&quot;width&quot;:3840,&quot;resizeWidth&quot;:293,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;woman wearing grey shirt&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="woman wearing grey shirt" title="woman wearing grey shirt" srcset="https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1561557944-6e7860d1a7eb?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxN3x8cm9ib3R8ZW58MHx8fHwxNjkxNDc2MTA0fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@dissii">Mahdis Mousavi</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>We employed three types of prompts corresponding to each experiment:</p><ol><li><p>Experiment 1 (NER + RE) Prompt: "<strong>Find all entities from the invoice-type document written in German.</strong> After finding all the words that are entities, group all entities that are related according to the document text. Don't include dates and personal information such as email addresses, IBAN, phone numbers, and addresses. Group ONLY entities that are related, not all of them. Related entities are entities that represent a row of a table and relate to the same item. Groups are usually made of one entity that resembles the item or a concept, and the rest of the group are numbers that relate to it. The output must be a list of lists, each list within the main one is a group of entities that belong together."</p></li><li><p>Experiment 2 (RE) Prompt: "<strong>Extracted entities are: "{annotated_entities}" from the invoice-type document written in German.</strong> After finding all the words that are entities, group all entities that are related according to the document text. Don't include dates and personal information such as email addresses, IBAN, phone numbers, and addresses. Group ONLY entities that are related, not all of them. Related entities are entities that represent a row of a table and relate to the same item. Groups are usually made of one entity that resembles the item or a concept, and the rest of the group are numbers that relate to it. The output must be a list of lists, each list within the main one is a group of entities that belong together."</p></li><li><p>Experiment 3 (Entities + RE) Prompt: "<strong>Extracted entities are: "{annotated_entities}" from the invoice-type document written in German.</strong> Group all entities that are related. Don't include dates and personal information such as email addresses, IBAN, phone numbers, and addresses. Group ONLY entities that are related, not all of them. Related entities are entities that represent a row of a table and relate to the same item. Groups are usually made of one entity that resembles the item or a concept, and the rest of the group are numbers that relate to it. The output must be a list of lists, each list within the main one is a group of entities that belong together."</p></li></ol><h2>Models</h2><p>The models selected were based on their relevance and performance in natural language processing tasks. Each model is introduced, along with its architecture, strengths, and limitations, to assess its effectiveness in capturing and extracting entity relations within documents.</p><ol><li><p><strong>T5 (Text-to-Text Transfer Transformers):</strong> T5 is a transformer-based neural network architecture that has gained significant attention for its impressive performance on various language tasks. It uses a sequence-to-sequence framework for text conversion and can be fine-tuned on specific tasks. Its flexibility and efficiency make it a promising candidate for document-level entity relation extraction.</p></li><li><p><strong>Multilingual T5:</strong> An extension of T5, multilingual T5 models can process multiple languages and generate text in any of those languages. They offer the advantage of handling multiple languages without the need for separate models, reducing computational costs for multilingual NLP systems.</p></li><li><p><strong>FLAN-T5:</strong> FLAN-T5 builds on the original T5 architecture and underwent extensive fine-tuning on over 1,800 language tasks, resulting in enhanced reasoning capabilities and promptability. Its customizability and performance across NLP tasks make it a prominent player in the field.</p></li><li><p><strong>GPT (Generative Pre-trained Transformers):</strong> The GPT architecture, based on the Transformer model, revolutionized language processing with its autoregressive generation capabilities. It effectively captures long-range dependencies and contextual information, making it a powerful tool for language generation tasks.</p></li><li><p><strong>GPT-3.5 Turbo:</strong> An advancement over GPT-3, GPT-3.5 Turbo demonstrates enhanced proficiency and reduced hallucinatory responses. Its versatility and integration with the user-friendly ChatGPT platform offer practical applications in various domains.</p></li><li><p><strong>GPT-4:</strong> The latest iteration, GPT-4, surpasses its predecessors in accuracy and performance. It excels in handling longer contexts and has an extended token span, yielding more accurate and coherent responses. OpenAI's extensive investment in human feedback enhances GPT-4's intelligence and self-control, addressing challenges related to toxicity controls.</p></li></ol><h2>Results</h2><h3>Experimental Configuration</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sTzu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sTzu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 424w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 848w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 1272w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sTzu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png" width="656" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:656,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sTzu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 424w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 848w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 1272w, https://substackcdn.com/image/fetch/$s_!sTzu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c68520d-3324-4f5d-919b-4dc4713761dd_656x392.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In model performance evaluation, mT5 surpasses both T5 and FLAN-T5 across all ROUGE metrics. FLAN-T5, despite being trained on multiple languages, exhibits similar performance to T5. Interestingly, using only annotated entities without document text consistently yields the highest performance across all models. On the other hand, the inclusion of tags does not significantly improve the model's performance.</p><p>ROUGE metrics provide valuable insights into model evaluation. Specifically, ROUGE1 scores higher than ROUGE2 scores, indicating a better match in terms of individual word or entity identification and matching. Furthermore, ROUGEL scores higher than ROUGE2, emphasizing the significance of considering the overall structure and length of relations in the evaluation process. Overall, these metrics help assess the model's ability to capture and replicate text patterns and relationships effectively.</p><h3>Dataset Size</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y_Uj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y_Uj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 424w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 848w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 1272w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y_Uj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png" width="491" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:491,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73247,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y_Uj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 424w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 848w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 1272w, https://substackcdn.com/image/fetch/$s_!y_Uj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7454fa1-79aa-4d89-820d-49b7caa4a8b3_491x378.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Increasing the dataset size does not lead to significant changes in performance. The highest scores are often obtained on the training dataset with only 2000 examples. This suggests that the size of the training dataset may not be the primary factor influencing performance in relation extraction tasks. Instead, other factors such as data quality, model architecture, and feature engineering may play a more significant role in determining the model's effectiveness in capturing entity relationships within documents. Therefore, focusing on optimizing these factors might yield better results in relation extraction tasks rather than solely relying on dataset size.</p><h3>Prompting</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WmTm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WmTm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 424w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 848w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 1272w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WmTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png" width="563" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:563,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:106897,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WmTm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 424w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 848w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 1272w, https://substackcdn.com/image/fetch/$s_!WmTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff47483dc-ef8e-4d68-b2c0-34c80c187ac3_563x402.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Fine-tuning the mT5 model leads to consistent performance, indicating its effectiveness in capturing entity relations in document-level tasks.</p><p>However, when it comes to GPT-3.5 Turbo and GPT-4, their performance varies across different experimental setups. This highlights the importance of careful model selection for document-level entity relation extraction tasks, as different models may excel under different conditions.</p><p>In the zero-shot experiments, GPT-4 consistently outperforms GPT-3.5 Turbo across all experiment types and even surpasses the fine-tuned T5 model. This indicates that GPT-4 has a natural ability to capture entity relationships without specific fine-tuning on the task.</p><p>In the one-shot experiments, GPT-4 demonstrates superior performance compared to GPT-3.5 Turbo. However, GPT-3.5 Turbo performs better when using the one-shot method, emphasizing the significance of providing explicit examples for this model to perform well.</p><p>Overall, the results underscore the importance of carefully selecting the appropriate model for document-level entity relation extraction tasks, considering their performance across different experimental setups and the presence or absence of fine-tuning or explicit examples.</p><h2>Conclusion</h2><p>This study focused on document-level entity relation extraction and presented results from experiments using different models: T5, mT5, FLAN-T5, GPT-3.5 Turbo, and GPT-4. GPT-4 outperformed other models in ROUGE scores, indicating its effectiveness. However, performance varied based on experiment setups, including NER or entity tags.</p><p>The study has some limitations. ROUGE scores measure summaries, not entity relation extraction accuracy. Generalizability to other domains and datasets needs exploration. Longer documents (beyond 512 tokens) pose challenges. Future research can address these limitations and explore new strategies, enhancing model accuracy for real-world entity relation extraction tasks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI in Production! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Named Entity Recognition Using Question Answering in Zero- and Few-Shot Settings]]></title><description><![CDATA[The human race produces copious amounts of documents daily to aid its effective functioning.]]></description><link>https://blog.doxray.com/p/named-entity-recognition-using-question</link><guid isPermaLink="false">https://blog.doxray.com/p/named-entity-recognition-using-question</guid><dc:creator><![CDATA[Luka Pavlović]]></dc:creator><pubDate>Fri, 28 Jul 2023 07:16:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CgyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The human race produces copious amounts of documents daily to aid its effective functioning. Even though digitalization allows the automatic processing of documents when they are stored in an appropriate format, this is not the case for documents that are printed or hand-written. The introduction of Large Language Models (LLMs) in conjunction with Optical Character Recognition (OCR) systems allows us to effectively extract the relevant information from these documents. However, these models are quite expensive to train from the ground up (e.g., training of GPT-4 costs about $100M). Due to this fact, the standard paradigm for using such a big and expensive model consisted of two distinct phases: pre-training and fine-tuning. The pre-training phase constitutes training the model using a self-supervised pre-training objective such as masked language modeling, sequence reshuffling, and next token prediction. During this phase, the model is presented with a large corpus (e.g., LLaMA 2 was pre-trained using ~2T tokens) from which the model gains general knowledge. The model is further specialized during the fine-tuning phase to perform a specific task such as document classification or Named Entity Extraction (NER).</p><p>In this post, we explore how the model performs when trained <strong>using as little data as possible</strong>. The idea is that the model has already seen similar examples during the pre-training and that it needs only a little bump in the right direction.</p><p>The task that we are solving is <strong>NER using Question Answering (QA) task</strong>. NER is a task in which the model has to extract the relevant (named) entities from the text. By using the QA task, we prompt the model to extract relevant entities using a natural question to which the model has to provide an answer.</p><h2>Model</h2><p>The model of choice for this experiment is Fine-tuned Text-to-Text Transfer Transformer (Flan-T5). Flan-T5 is a generative transformer-based model with a full encoder-decoder architecture. This means that the encoder consumes the input text and produces a latent representation which the decoder uses to generate a new sequence. Also, the model was further fine-tuned using 1.8k various tasks. We used the base variant of the model due to its timid size of 250M parameters (for reference, GPT-4 supposedly boasts 1T parameters). However, the T5 model has its drawbacks. Most notably, the model is limited to 512 input tokens.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CgyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CgyL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CgyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg" width="1456" height="747" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:253640,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!CgyL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CgyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0108151e-9b0a-4d83-9682-16e2ba8b500b_2113x1084.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 1: Flan-T5 input and output examples.</figcaption></figure></div><h2>Data</h2><p>For our experiments, we used four different proprietary datasets containing invoice-type documents in German. To follow the Q paradigm of our experiments, we formatted the dataset to contain three fields: context, question, and answer. To create the context field, we split the document into pages and collected all of the words into a single string. This helped alleviate some of the issues related to the model's input token limit. Furthermore, we formed the question by asking the model about a specific entity type present in the context. Lastly, the answer field contained all of the possible answers to a given question within the context. Here is a simple example: </p><p><code><br>'context': 'Luke Skywalker destroyed the Death Star. Young Skywalker achieved this by piloting the X-wing starfighter.'<br>'question': 'Who is the main character in this text?'<br>'answer': ['Luke Skywalker', 'Young Skywalker']</code></p><h2>Experiments</h2><p>To gauge how the reduced quantity of training data affected the performance, we performed zero- and few-shot experiments. While training the model in the zero-shot setting, we introduced examples only from the non-target dataset. For instance, if DS1 was our target dataset, we trained the model using examples from DS2, DS3, or DS4.</p><p>The <strong>zero-shot experiments</strong> that we performed can be roughly divided into two groups. In the first group are the examples in which we used all three non-target datasets for training and validation. In this group, we increased the number of training examples from each non-target dataset from 100 to 400. In the second group, we included a different combination of non-target datasets in the training set. The number of examples from each of the included non-target datasets amounted to 400.</p><p>For the <strong>few-shot setting</strong>, we introduced a limited number of training examples from the target dataset. Again, we had two families of experiments. In the first part, we included 400 examples from each of the non-target datasets and increased the number of examples from the target dataset from 100 to 400. Also, in the second group of the few-shot experiments, we kept the number of examples from the target dataset fixed at 400, while trying different combinations of the included non-target datasets. In this case, we also kept the number of examples from the non-target dataset at 400. Since subsampling can produce noise, we ran all of the aforementioned experiments 10 times which were averaged to produce the final result.</p><p>Additionally, we conducted a <strong>pure zero-shot</strong> where we did not update the model's parameters to achieve the lower-bound baseline. Also, we did the full-scale fine-tuning on the whole training set of the target dataset to obtain upper-bound baseline results.</p><h2>Results</h2><h3>Zero-shot</h3><h4>Increasing the number of examples from non-target datasets group</h4><p>As we can observe in Figures 2-5, the initial introduction of training examples from the non-target dataset did improve performance. However, any increase in the number of training examples resulted in only a marginal improvement. Also, for the experiments with dataset DS3 as the target dataset, we note a drop in the performance. This is probably because DS3 comprised a unique set of entity types. The model probably learned to solve the task during the early stages of training but became overfitted to the seen entity types as the number of training samples increased.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!52Pp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!52Pp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!52Pp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:168646,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!52Pp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!52Pp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbc95a1-44ec-4b7d-bf3c-be15803987b3_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 2: Results of zero-shot experiments with DS1 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ufvk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ufvk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ufvk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163544,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ufvk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ufvk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa087fd5-03fa-4848-a620-9cfcfadfd9b6_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3: Results of zero-shot experiments with DS2 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-U5I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-U5I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-U5I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:212465,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-U5I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-U5I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fdadee1-f6d1-4f6b-b222-34e55942ab75_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 4: Results of zero-shot experiments with DS3 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bXZL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bXZL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bXZL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:226329,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bXZL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bXZL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9af8fa7a-3540-429d-8e90-2692226f150c_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5: Results of zero-shot experiments with DS4 as the target dataset.</figcaption></figure></div><h4>Different combinations of non-target datasets group</h4><p>In experiments in which we aimed to explore how the number of included datasets impacted the performance, we did not find a link between those two. However, the results showed that the presence of certain datasets in the training set impacted the performance when tested on specific datasets. For instance, performance for target dataset DS1 proved to be higher when the model was trained using only DS4 than when DS2 or DS3 were included in the training dataset. &nbsp;</p><h3>Few-shot</h3><h4>Increasing the number of examples from the target dataset group</h4><p>Figures 6-9 showcase the results of the first set of few-shot experiments. These experiments show that adding more training examples from the target dataset improves performance. Even though this is an expected result and the results are not on par with the full-scale fine-tuning, this confirms the idea that including only a limited number of examples can lead the model in the right direction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rgZu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rgZu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rgZu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:240450,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rgZu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rgZu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd871fa5c-5bf9-4e92-9c75-c394bc5495cc_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 6: Results of few-shot experiments with DS1 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!px-I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!px-I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!px-I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!px-I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!px-I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!px-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!px-I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!px-I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!px-I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!px-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3862051d-e100-4870-88ec-03952d3ff696_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 7: Results of few-shot experiments with DS2 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RPPr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RPPr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RPPr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:179715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RPPr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RPPr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94841f15-138f-444c-8062-f395acfd0ff6_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 8: Results of few-shot experiments with DS3 as the target dataset.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Ct6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Ct6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Ct6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:211785,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Ct6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4Ct6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02fed105-a5f0-4553-8399-3730417f0103_3200x2000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 9: Results of few-shot experiments with DS4 as the target dataset.</figcaption></figure></div><h4>Different combinations of non-target datasets group</h4><p>In the experiments where we included different numbers of datasets into the training set, we did not observe such differences in performance as we did in the zero-shot setting. However, we noted a high standard deviation in the results among the experiment runs. We attribute this to two main causes. The first one is the random sampling. It is possible that we hit a local minimum or maximum in the performance, or that the produced sample contained some difficult examples. Secondly, it is possible that there existed some ambiguity among the documents from different datasets. When the model was trained using DS2 or DS3 and tested using DS1 it yielded a high standard deviation, which is also supported by the results from the zero-shot case.</p><h2>Key takeaways</h2><p>The results that we obtained in this research pose a promising prospect. Even though the performance is not at the level of fine-tuning the model using the high-resource dataset, the results showcase that the few-shot setting can be a <strong>viable option for prototyping and proof of concept solutions</strong>, or when using a low-resource training dataset. Another benefit of using a reduced dataset is a <strong>reduction in training time</strong>. For example, a single run of training in the few-shot setting took about 45 minutes, while the full-scale fine-tuning took around a day. Finally, for high-performance demands, full-scale fine-tuning remains the king, but with the arrival of bigger and more advanced models, this might change.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI in Production! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Secure Your Kubernetes Web Application Using Gatekeeper And Keycloak ]]></title><description><![CDATA[In certain scenarios, you may need to integrate an authentication layer into a basic application or tool that doesn't demand complex authentication. This technical article addresses such situations, aiming to help users safeguard their applications without modifying the underlying code. To facilitate understanding, we'll walk you through the process of incorporating authentication into a rudimentary web app, as hands-on examples often yield the best learning outcomes.]]></description><link>https://blog.doxray.com/p/secure-your-kubernetes-web-application</link><guid isPermaLink="false">https://blog.doxray.com/p/secure-your-kubernetes-web-application</guid><dc:creator><![CDATA[Bojan]]></dc:creator><pubDate>Mon, 03 Jul 2023 08:43:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In today's connected world, the importance of safeguarding even the simplest of web applications cannot be overstated. In this guide, we present an efficient way to incorporate an essential layer of security&#8212;authentication&#8212;into your applications, all without altering their foundational code.</p><p>This blog post is focused on empowering developers to add an effective shield to their applications. It aims to provide you with the right tools and knowledge to bolster the security of your applications. Through a hands-on, step-by-step approach, we will guide you to integrate authentication into a basic web application. Our philosophy hinges on the belief that learning is best facilitated through practical examples.</p><h2>1. Keycloak</h2><p>Before proceeding, it's essential to be familiar with <a href="https://www.keycloak.org/">Keycloak</a> and have an instance in use. For those who are new to Keycloak, it's recommended to spend some time understanding its purpose and capabilities. In short, Keycloak is an open-source identity and access management platform. If you're confident in setting it up, you can quickly launch Keycloak as a standalone Docker container using the following command:</p><pre><code><code>docker run -p 8080:8080 -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin quay.io/keycloak/keycloak:21.1.1 start-dev</code></code></pre><p>After executing this command, you'll need to create a realm, add at least one user, and ensure Keycloak has connectivity to the Kubernetes (k8s) cluster where the application will be deployed.</p><p>In our setup, Keycloak runs in High Availability (HA) mode on a Google Kubernetes Engine (GKE) cluster, with Microsoft Azure Active Directory (AD) configured as the Identity Provider. To enable group-level mapping between Keycloak and Azure AD, we have configured it as <a href="https://auth0.com/docs/authenticate/protocols/openid-connect-protocol">oidc</a> . This configuration allows users to authenticate apps/services using their Azure AD credentials via Keycloak. A detailed explanation of this setup will be left for a future blog post.</p><p></p><h2>2. Gatekeeper</h2><p>Gatekeeper is a straightforward authentication and authorization proxy, initially created by the Keycloak organization. As development has ceased, several projects have forked the source code and continued Gatekeeper's development. We use <a href="https://github.com/gogatekeeper/gatekeeper">gogatekeeper</a> in our implementation.</p><p></p><h2>3. Implementation</h2><p>Let's dive into the practical implementation - imagine you have an application that you wish to secure with authorization.</p><h3><strong>3.1. Create and Configure a Keycloak Client</strong></h3><p>First, navigate to your Keycloak instance, choose the desired <strong>realm</strong>, and then head to the <strong>Clients </strong>section. Here, create a new client (as shown in Figures 3.1 and 3.2).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FzCb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FzCb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 424w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 848w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 1272w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FzCb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png" width="682" height="369.1043956043956" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:682,&quot;bytes&quot;:146830,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!FzCb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 424w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 848w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 1272w, https://substackcdn.com/image/fetch/$s_!FzCb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc372fd7-b21c-4cfb-8ec0-2ca33ef3372c_2290x1240.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.1. Creating a Keycloak client - General Settings</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0pA8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0pA8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 424w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 848w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0pA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png" width="666" height="383.77335164835165" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:839,&quot;width&quot;:1456,&quot;resizeWidth&quot;:666,&quot;bytes&quot;:170424,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0pA8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 424w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 848w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 1272w, https://substackcdn.com/image/fetch/$s_!0pA8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d9c675-2e07-4289-a295-88da5c6de131_2132x1228.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.2. Creating a Keycloak Client - Capability config</figcaption></figure></div><p>Upon completing these steps, your client should now be created.</p><p>The next step involves creating Protocol Mappers for groups and audiences. Head to the <strong>Client Scopes </strong>section of your client, and click on the dedicated scope and mappers, as shown in Figure 3.3.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Whb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Whb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 424w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 848w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Whb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png" width="624" height="368.57142857142856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:1456,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:202086,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_Whb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 424w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 848w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!_Whb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b237a0-26ce-4d90-b8bb-22753acddf7c_1900x1122.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.3. Adding mappers - Client scopes</figcaption></figure></div><p>Next, navigate to the <strong>Mappers </strong>tab and click on <strong>Add mapper </strong>followed by <strong>By configuration</strong>. This will present you with options to select the ones you need. First, choose <strong>Group Membership</strong> as depicted in Figure 3.4.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iuS_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iuS_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 424w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 848w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iuS_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png" width="554" height="445.93956043956047" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1172,&quot;width&quot;:1456,&quot;resizeWidth&quot;:554,&quot;bytes&quot;:143616,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iuS_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 424w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 848w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!iuS_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff30f71aa-f43a-40d0-a92a-6cce27a339cc_1578x1270.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.4. Adding a group mapper</figcaption></figure></div><p>Repeat these steps and select Audience as displayed in Figure 3.5.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tf9V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tf9V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 424w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 848w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tf9V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png" width="606" height="475.31043956043953" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:1456,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:136337,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tf9V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 424w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 848w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!tf9V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1d967e3-1857-4cdf-bfbe-264cad893b4d_1622x1272.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.5. Adding audience mappers</figcaption></figure></div><p>Lastly, ensure that you input the <strong>Redirect URI</strong>, which informs the Keycloak client which application is authorized to utilize this client. For the time being, you can configure it with <code>http://*</code> and/or <code>https://*</code>.<em><br></em></p><h3>3.2. Groups and Users</h3><p>This section caters to individual preferences and requirements. While not prescribing a specific approach, it is worth noting that from a usability standpoint, having a dedicated group(s) linked to your Keycloak client enables more granular access levels.</p><p>For testing, it is advisable to create a group and two users: one who is part of the group and another who is not. To create a group in Keycloak, simply go to <strong>Groups </strong>and click <strong>Create group</strong>. Next, head to <strong>Users </strong>and click <strong>Create user</strong>. Once you've filled in all the necessary details, click on <strong>Join groups </strong>to add the user to the group created earlier, as illustrated in Figure 3.6.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g_3V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g_3V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 424w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 848w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g_3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png" width="620" height="516.0989010989011" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1212,&quot;width&quot;:1456,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:118437,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!g_3V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 424w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 848w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!g_3V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5beeeebb-7a2d-4d6b-8cfd-5a6d7fd75ea0_1622x1350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.6. User creation</figcaption></figure></div><p>For the second test user, do not perform this final step of adding them to the group.</p><p>After the users are created, you can set up their passwords or request them to do so during their initial login. There are numerous other configuration options available, so feel free to explore them.</p><p></p><h4>3.2.1. <strong>Mapping Keycloak Group to Identity Provider Group (Optional)</strong></h4><p>This optional step is included because many individuals use Keycloak in conjunction with an Identity Provider. This means that if we want user groups configured in the Identity Provider to access our application, the appropriate mapping must be established. In our case, we use Microsoft Azure AD and map the AD group ID value to the desired group in Keycloak. To do this, go to <strong>Identity providers</strong>, select the desired provider, navigate to the <strong>Mappers </strong>tab, and create a new mapping as demonstrated in Figure 3.7.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JvKa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JvKa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 424w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 848w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 1272w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JvKa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png" width="570" height="345.28846153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:882,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:179491,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!JvKa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 424w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 848w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 1272w, https://substackcdn.com/image/fetch/$s_!JvKa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9a55f3-31d8-4bce-baff-6622ac2f3c0c_2188x1326.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3.7. Creating an Identity Provider mapper</figcaption></figure></div><p></p><h2>4. Simple Web App</h2><p>I have created a basic k8s application consisting of <code>Deployment</code> and <code>Service</code> resources. It's important to note that I am not using ingress, so SSL/TLS is not enabled. Since I am deploying on GKE, I am using the annotation<code> networking.gke.io/load-balancer-type: "Internal" </code>to obtain an internal IP. However, you may not need this for your environment.</p><p>To give you an idea of how Gatekeeper functions, we will first deploy the application without Gatekeeper and Keycloak. The figure below illustrates the app's deployment scheme without Gatekeeper in the scenario.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I96A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I96A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 424w, https://substackcdn.com/image/fetch/$s_!I96A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 848w, https://substackcdn.com/image/fetch/$s_!I96A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 1272w, https://substackcdn.com/image/fetch/$s_!I96A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I96A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png" width="394" height="318.57207207207205" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:888,&quot;resizeWidth&quot;:394,&quot;bytes&quot;:76715,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!I96A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 424w, https://substackcdn.com/image/fetch/$s_!I96A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 848w, https://substackcdn.com/image/fetch/$s_!I96A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 1272w, https://substackcdn.com/image/fetch/$s_!I96A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b68ec92-5f03-4763-b360-43fe2fc44fc9_888x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 4.1. Simple Web App Deployment Scheme</figcaption></figure></div><p>You can use the following YAML manifest to create <code>Deployment</code> and the <code>Service</code>:</p><pre><code><code>---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: api-test
  name: api-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-test
  template:
    metadata:
      labels:
        app: api-test
    spec:
      containers:
        - name: api-test
          image: yeasy/simple-web:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              memory: "384Mi"
              cpu: "375m"
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: api-test
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
  labels:
    app: api-test
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
  selector:
    app: api-test</code></code></pre><p>If you prefer, you can remove the Service and instead utilize port forwarding. To apply the manifests, execute:</p><pre><code><code>kubectl apply -f api-test.yaml
kubectl get pods --all-namespaces</code></code></pre><p>After applying the template above, you should see the <code>Service</code>, <code>Deployment</code>,<br>and <code>Pod</code> deployed. To access the simple web app via a browser, use the Service IP.</p><p>You should see a view similar to the one displayed in Figure 4.2.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tTWo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tTWo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 424w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 848w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 1272w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tTWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png" width="1456" height="219" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:219,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:145624,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tTWo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 424w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 848w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 1272w, https://substackcdn.com/image/fetch/$s_!tTWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23b8f3d2-d9d0-48f2-8373-5d531d4f9dae_2342x352.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 4.2. Simple Web App Browser Access</figcaption></figure></div><p>Congratulations! The simple application is now up and running.</p><p></p><p></p><h2>5. Protect the Application with the gatekeeper</h2><p>As we've previously learned, Gatekeeper is used as an authentication proxy. In comparison to the setup in the previous section, the new configuration should resemble the one depicted in Figure 5.1.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZTx2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZTx2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 424w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 848w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 1272w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZTx2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png" width="496" height="394.14285714285717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:896,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:115722,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZTx2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 424w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 848w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 1272w, https://substackcdn.com/image/fetch/$s_!ZTx2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F907d8501-896b-4ffd-a5e8-b76023676ca8_896x712.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5.1. Simple Web App Deployment Scheme with Gatekeeper and Keycloak</figcaption></figure></div><p>In the image above, you can see how the&nbsp;<code>Service</code>&nbsp;resource communicates with the Gatekeeper container, rather than directly with the application. The gatekeeper initiates authentication and authorization against Keycloak, either granting or denying access to the application. This means that the Gatekeeper configuration contains redirection details and authentication details.  </p><p>The Gatekeeper configuration file is created within a&nbsp;<code>ConfigMap</code>, allowing it to be mounted as a volume in the pod container:</p><pre><code><code>apiVersion: v1
kind: ConfigMap
metadata:
  name: gatekeeper-config
data:
  keycloak-gatekeeper.conf: |+
    discovery-url: https://&lt;yourkeycloakFQDN&gt;/realms/&lt;your-realm&gt;
    enable-default-deny: false
    secure-cookie: false
    client-id: app-test
    client-secret: fi9cU5n5ytu2L0LF2J9u8FJNL5cNEHet
    listen: :3000
    encryption-key: AgXa7xRcoClDEU0ZDSH4X0XhL5Qy2Z2j
    redirection-url: http://10.164.0.42
    upstream-url: http://127.0.0.1:80
    resources:
    - uri: /*
      groups:
      - app-test-group
    - uri: /public/*
      white-listed: true
    - uri: /favicon
      white-listed: true
    - uri: /css/*
      white-listed: true
    - uri: /img/*
      white-listed: true</code></code></pre><p>Important parts of the config are:</p><ul><li><p><code>discovery-url&nbsp;</code>- Keycloak server realm address and path.</p></li><li><p><code>client-id</code>&nbsp;- The client ID we used when creating the client.</p></li><li><p><code>client-secret</code>&nbsp;- Obtained from the&nbsp;<strong>Credentials</strong>&nbsp;tab in the Keycloak app-test client we created. Shown in Figure 5.2.</p></li><li><p><code>redirection-url</code>&nbsp;- URL used by the application. In this case, the IP of the&nbsp;`Service`. This is an optional parameter since it defaults to the URL scheme and host, but it is important to mention when HTTPS is used, as you often want to specify where authentication should be directed.</p></li><li><p><code>upstream-url</code>&nbsp;- URL where Gatekeeper will forward traffic. It is&nbsp;`127.0.0.1`&nbsp;since containers inside the same pod communicate via the local network, and port 80 since our test app uses port 80.</p></li><li><p><code>secure-cookie</code>&nbsp;- In this case, set to false since we're using HTTP, not HTTPS.</p></li><li><p><code>enable-default-deny</code>&nbsp;- Indicates if we should deny all incoming requests by default and explicitly state what is allowed.</p></li><li><p><code>resources</code>&nbsp;- Collection of resources we want to protect/expose and the groups that can access them.  </p></li></ul><p>For more details about the config itself, you can check the&nbsp;<a href="https://gogatekeeper.github.io/gatekeeper/userguide/#example-of-usage-and-configuration-with-keycloak">GoGatekeeper Keycloak config</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lTI7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lTI7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 424w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 848w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lTI7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png" width="610" height="290.33653846153845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1456,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:147350,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!lTI7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 424w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 848w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!lTI7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7634e42c-6331-4c29-9fe0-b977eeba2a61_2262x1076.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5.2. Obtaining Client Secret</figcaption></figure></div><p>To accommodate the changes, we need to update the deployment templates. The new&nbsp;YAML&nbsp;file should look like this:</p><pre><code><code>---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: api-test
  name: api-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: api-test
  template:
    metadata:
      labels:
        app: api-test
    spec:
      imagePullSecrets:
        - name: gitlab-registry
      containers:
        - name: api-test
          image: yeasy/simple-web:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              memory: "384Mi"
              cpu: "375m"
        - name: gatekeeper
          image: quay.io/gogatekeeper/gatekeeper:2.3.1
          resources:
            requests:
              memory: "128Mi"
              cpu: "125m"
          args:
            - --config=/etc/keycloak-gatekeeper.conf
          ports:
            - containerPort: 3000
          volumeMounts:
            - name: gatekeeper-config
              mountPath: /etc/keycloak-gatekeeper.conf
              subPath: keycloak-gatekeeper.conf
      volumes:
        - name: gatekeeper-config
          configMap:
            name: gatekeeper-config
---
apiVersion: v1
kind: Service
metadata:
  name: api-test
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
  labels:
    app: api-test
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 3000
    protocol: TCP
  selector:
    app: api-test</code></code></pre><p>In this manifest, notice that the application itself has no ports exposed. The only port exposed is the Gatekeeper port. When you deploy the updated application with the Gatekeeper configuration, you should see two containers in the pod, as shown in Figure 5.3 and Figure 5.4.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZRiM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZRiM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 424w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 848w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 1272w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZRiM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png" width="1456" height="65" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:65,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67981,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZRiM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 424w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 848w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 1272w, https://substackcdn.com/image/fetch/$s_!ZRiM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5372b7e-b8a8-474d-84c5-d86ad4ff1715_1828x82.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 5.3. Pod View</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oglU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oglU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 424w, https://substackcdn.com/image/fetch/$s_!oglU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 848w, https://substackcdn.com/image/fetch/$s_!oglU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 1272w, https://substackcdn.com/image/fetch/$s_!oglU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oglU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png" width="1456" height="108" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:108,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97196,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!oglU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 424w, https://substackcdn.com/image/fetch/$s_!oglU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 848w, https://substackcdn.com/image/fetch/$s_!oglU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 1272w, https://substackcdn.com/image/fetch/$s_!oglU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c884131-33fa-4e39-8d6c-0b7a61a1acbc_1612x120.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 5.4. Container View</figcaption></figure></div><p>If you prefer to use a stricter redirect URI, you can replace <code>http://*</code> with <code>http://10.164.0.42/oauth/callback</code>. Note that in this case, the IP used is the test application's Service IP. Typically, you should use&nbsp;HTTPS&nbsp;and a fully qualified domain name instead of the IP.  </p><p></p><p>Now, when attempting to access the app, you will be redirected to the Keycloak authentication window before accessing the app, as shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!evqp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!evqp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 424w, https://substackcdn.com/image/fetch/$s_!evqp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 848w, https://substackcdn.com/image/fetch/$s_!evqp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!evqp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!evqp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png" width="618" height="280.9862637362637" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:662,&quot;width&quot;:1456,&quot;resizeWidth&quot;:618,&quot;bytes&quot;:248195,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!evqp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 424w, https://substackcdn.com/image/fetch/$s_!evqp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 848w, https://substackcdn.com/image/fetch/$s_!evqp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!evqp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9e03fc7-94f9-4ebb-a787-db2bf754b616_2562x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5.5. Keycloak login window that Gatekeeper redirects you to</figcaption></figure></div><p>After logging in, you will notice that the request originated from localhost (Figure 5.6). In this case, it means that the Gatekeeper container inside the pod we deployed has contacted the application within the same pod via the local network.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fr5V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fr5V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 424w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 848w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 1272w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fr5V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png" width="1456" height="316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91547,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!fr5V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 424w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 848w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 1272w, https://substackcdn.com/image/fetch/$s_!fr5V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea559ea-acd3-4cd1-aec3-c20571617d2b_2266x492.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Figure 5.6. Simple Web App browser printout after successful login</figcaption></figure></div><p>Upon checking the Gatekeeper logs, you can see that it first verifies if there is an existing session token. If not, it initiates a new session if the user is authenticated properly:</p><pre><code><code>2023-06-05T14:00:15.858Z    error    no session found in request, redirecting for authorization    {"error": "authentication session not found"}
2023-06-05T14:00:22.261Z    info    issuing access token for user    {"email": "test@doxray.com", "sub": "17c13f75-6c86-4aee-8d66-ce1775548967", "expires": "2023-06-05T14:15:22Z", "duration": "14m5</code></code></pre><p>If we attempt to log in with another user who does not belong to the group we created and assigned in Keycloak and Gatekeeper configurations, access will be denied, as illustrated in Figure 5.7.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JMsX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JMsX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 424w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 848w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JMsX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png" width="570" height="242.3282967032967" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:104965,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!JMsX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 424w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 848w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 1272w, https://substackcdn.com/image/fetch/$s_!JMsX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F389e7af2-eaa5-4a84-be4f-09ad2dbda180_2362x1004.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 5.7. Access denied for a user not part of the required group</figcaption></figure></div><p>Upon examining the Gatekeeper logs, it becomes evident that the user is not a member of the necessary group:</p><pre><code><code>2023-06-05T14:02:46.267Z    error    no session found in request, redirecting for authorization    {"error": "authentication session not found"}
2023-06-05T14:02:52.051Z    info    issuing access token for user    {"email": "test2@doxray.com", "sub": "2317d31e-adfb-4bed-8ab0-e53a2ececa0d", "expires": "2023-06-05T14:17:52Z", "duration": "14m
2023-06-05T14:02:52.093Z    warn    access denied, invalid groups    {"access": "denied", "email": "test2@doxray.com", "resource": "/*", "groups": "app-test-group"}</code></code></pre><p>It is good to note that Gatekeeper allows the addition of custom forbidden pages, which may be necessary for specific use cases.  </p><p>Furthermore, you can employ an Identity Provider for authentication with your application (refer to <em>section 2.1</em>). Observe the availability of an additional login option for Microsoft Azure AD, as seen in Figure 5.5. This feature may have already caught your attention.</p><p></p><p></p><h2>6. Conclusion</h2><p>Gatekeeper serves as an effective authentication proxy solution that, in appropriate use cases, can significantly reduce development time. With its customizable and configurable nature, Gatekeeper becomes easier to use as you familiarize yourself with it. As mentioned earlier, integrating Gatekeeper requires minimal changes to application authentication. You only need an additional&nbsp;`configMap`&nbsp;containing the Keycloak configuration and a&nbsp;`Deployment`&nbsp;to set up the sidecar container responsible for handling authentication.  </p><p>I hope you find this information valuable and applicable to your projects.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI in Production! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Data Extraction from Invoice Type Documents using Large Language Models]]></title><description><![CDATA[Towards tailored made AI models that adapt to changes and match the needs and demands of a dynamic real-world environment.]]></description><link>https://blog.doxray.com/p/data-extraction-from-invoice-type</link><guid isPermaLink="false">https://blog.doxray.com/p/data-extraction-from-invoice-type</guid><dc:creator><![CDATA[Mihaela Kakarigi]]></dc:creator><pubDate>Tue, 23 May 2023 08:52:05 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/26d22f57-7579-4b23-ab1d-b76868a2d142_1025x643.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Extracting data from invoices is a crucial task for many businesses as it allows them to accurately track their expenses and payments. The information available in invoices, like supplier names, account numbers, dates, item descriptions, etc., are important for any company that deals with purchasing, accounting, supply management, data analytics, and similar.</p><p></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6150448-2f33-4b3c-8a22-c6ef72cdb384_634x515.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bc863b7-e06a-4255-8305-250379d578ac_1025x643.png&quot;}],&quot;caption&quot;:&quot;Figure 1. Examples of data in invoice-type documents&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09979fb1-71e3-44d8-903e-7fa478556379_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p>Manually extracting this data from paper or digital documents is often time-consuming, prone to errors (typos or transposition errors), and costly as it might require a team of people to perform the task. It may take a person 5 minutes on average to read and understand the invoice, and manually type the information or copy and paste it into a destination software (e.g. ERP).</p><p>By automating the process of data extraction from invoices, businesses can reduce the time and resources required for manual data entry. Having accurate and up-to-date information about expenses and payments can help businesses make informed decisions about their finances and operations. Next to this, extracting data from invoices can help with:</p><ul><li><p>Improving cash flow and payment efficiency by reducing errors and speeding up the processing of invoices and payments</p></li><li><p>Detecting fraud and duplicate entries by matching invoice line items to purchase orders and contracts</p></li><li><p>Analyzing spending patterns can help businesses identify areas where they can reduce cost and optimise spending, or identify areas where they can reduce waste and improve efficiency</p></li><li><p>Calculating CO2 emissions or carbon footprint by analyzing expenses and payments related to energy consumption and transportation.</p></li></ul><p>Traditionally, automated data extraction is done using rule-based methods, such as regular expressions or template matching. These methods are inflexible and can be labor-intensive, and error-prone. Additionally, generalizability is often challenging to get right, if the rules are too specific, they may miss some cases or exceptions, or on the other hand, if the rules are too general, they may produce false positives or negatives.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.doxray.com/subscribe?"><span>Subscribe now</span></a></p><p></p><p>Template matching and rule-based approaches require a manually defined template or rule for every single invoice format and layout, and rely on data being in specific positions in a document, consistent, clear, and legible.</p><p>However, invoice documents come from many different sources, industries, or businesses, the document formats and appearance vary, and documents are often scanned (see Figure 2.). Scans can be at a low resolution and have handwritten notes, annotations, or typos and errors in the data. The ever-changing nature and variations of real-world data present further challenges for rule-based methods. For example, changes in laws and regulations might affect how and what data is displayed in a document, or a company might change the layout of the data in their invoices, add more account numbers, change visual branding, etc.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6491a02-9829-4ea9-b4c9-29e9dc4eed0a_826x1169.jpeg&quot;},{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c0b5143-6135-4216-9b00-359c54f62e10_960x1280.jpeg&quot;},{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b71ca03-2c84-4dc6-bedf-1ef1b0b5c235_700x895.jpeg&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fa94231-fed0-4de0-81c5-e2cbe55c4755_630x883.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90fe9d07-84b1-4b83-8a9b-9088f5500188_750x1061.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/063120a7-623f-4dde-bfc4-122adef5a49b_856x1088.png&quot;}],&quot;caption&quot;:&quot;Figure 2. Examples of various shapes and formats in which invoice documents can come.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6029fa06-64a8-4602-b428-f20b838ba118_1456x964.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p>In recent years, transformer-based large language models (LLMs), have revolutionized the field of natural language processing (from entity extraction to text generation). LLMs are trained on large amounts of data from various sources and formats, enabling them to recognize patterns in the data and make predictions based on these patterns, regardless of how complex the document format might be. LLMs can leverage their general knowledge and reasoning abilities to extract data (e.g. tax rates, discounts, payment terms, etc.), handle poor-quality documents, and easily deal with documents in various languages.</p><p>Unlike rule-based methods, LLMs are not limited by strict rules, making them more flexible and less prone to errors.</p><p>However, general models like <a href="https://openai.com/blog/chatgpt">OpenAI's ChatGPT</a> can be too broad to achieve the highest accuracy for a specific use case, like invoice data extraction, as it requires the model to understand the structure and meaning of the data in the invoices. </p><p>Therefore, for the best results, it is necessary to fine-tune these models on specific use cases such as invoice data extraction, like we do at <a href="https://doxray.com/">doXray</a>. Fine-tuning involves training the model on a smaller, specific dataset relevant to the use case. This dataset may contain examples of invoices and their corresponding data fields, or other related texts that can help the model learn about invoices. By fine-tuning the model on this dataset, we adjust the model's parameters and align it to the invoice data extraction use case. This allows the model to make more accurate predictions and extract the required data with higher precision.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 424w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 848w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 1272w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80" width="728" height="484.12" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:665,&quot;width&quot;:1000,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;red black and white round decor&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="red black and white round decor" title="red black and white round decor" srcset="https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 424w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 848w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 1272w, https://images.unsplash.com/photo-1600177691578-f5df70545e33?ixlib=rb-4.0.3&amp;ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&amp;auto=format&amp;fit=crop&amp;w=1000&amp;q=80 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At <a href="https://doxray.com/">doXray</a>, our research has shown that periodic revisions of the model and continual learning are particularly beneficial for real-world use cases (Figure 3.). This is because invoice documents can change  over time and by utilizing continual learning, our model can adapt to new invoice formats and accurately extract data, even when encountering previously unseen invoice formats and data layouts. This ensures that our model remains up-to-date and effective in extracting data from invoices.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lqC5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lqC5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 424w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 848w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 1272w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lqC5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png" width="1323" height="354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:1323,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lqC5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 424w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 848w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 1272w, https://substackcdn.com/image/fetch/$s_!lqC5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec9f6f11-a72b-4768-85c4-4cc12c7ef11d_1323x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3. An example of the periodical performance revision and continual learning of doXray AI models.</figcaption></figure></div><p></p><p>In summary, the combination of large language models, fine-tuning, and continual learning provides a powerful set of tools for accurately extracting data from invoice-type documents, ensuring the best possible value and experience for <a href="https://doxray.com/">doXray&#8217;s</a> clients.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/p/data-extraction-from-invoice-type?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.doxray.com/p/data-extraction-from-invoice-type?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Annotation Team Lead and Teamwork ]]></title><description><![CDATA[Document annotation is a crucial step in training AI models, particularly in the context of Natural Language Processing (NLP).]]></description><link>https://blog.doxray.com/p/annotation-team-lead-and-teamwork</link><guid isPermaLink="false">https://blog.doxray.com/p/annotation-team-lead-and-teamwork</guid><dc:creator><![CDATA[Dragan Kraljević]]></dc:creator><pubDate>Fri, 13 Jan 2023 16:10:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/h_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf112569-26fe-443f-a68d-174f51fe6471_731x402.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Document annotation is a crucial step in training AI models, particularly in the context of Natural Language Processing (NLP). It involves adding labels or "annotations" to text data within documents to help the AI understand specific elements, such as entities, sentiment, or categories. In simpler terms, it's like providing a highlighter and a set of instructions to help the AI recognize and categorize key information in the text.</p><p>For businesses, this process is essential because it allows the AI model to learn from real-world examples and develop a better understanding of the context and nuances of human language. Annotated data helps the AI make accurate predictions when processing new, unannotated documents, ultimately improving the efficiency and effectiveness of various tasks, such as document categorization, email classification, and information extraction.</p><p>My&nbsp;position&nbsp;at doXray&nbsp;is&nbsp;that of an Annotation Team Lead which consists of carefully studying customer demands for a given use case together with the AI and business team. This position is also responsible for transferring customer demands to the rest of the annotation team and monitoring&nbsp;the&nbsp;annotation process.&nbsp;&nbsp;</p><p>The annotation&nbsp;Team&nbsp;Lead assigns annotators to documents provided by the customer&nbsp;and makes a set of written rules to guide the annotation process. In case of ambiguities, a meeting is scheduled&nbsp;where all the issues are discussed, and annotation rules are updated accordingly.&nbsp;After the annotation process is completed, the annotation team lead double-checks all documents and makes sure the annotations are in accordance with customer&nbsp;demands.&nbsp;</p><p>The overall process of managing and completing customer&nbsp;demands&nbsp;regarding annotations can be lengthy, as such having a reliable and competitive&nbsp;team&nbsp;and efficient teamwork strategy&nbsp;is of crucial importance.&nbsp;&nbsp;</p><p></p><h3><strong>Our Core Values&nbsp;</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zw4v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zw4v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 424w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 848w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 1272w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zw4v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png" width="731" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:731,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:530755,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zw4v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 424w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 848w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 1272w, https://substackcdn.com/image/fetch/$s_!Zw4v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1a5a476-03f2-43c5-97a4-576cc08efcc7_731x402.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Team spirit</strong>:&nbsp;When recruiting a new team member, we prioritize finding the best fit by outlining essential qualifications and conducting thorough interviews. Once we identify a suitable candidate, we hold a follow-up meeting to clarify their role and responsibilities. We ensure a warm welcome, introducing them to the team to foster a positive atmosphere that enhances morale and productivity for everyone involved.</p></li><li><p><strong>Communication</strong>:&nbsp;Our dedicated team prioritizes understanding customer needs and maintaining effective communication. This approach allows us to address challenges promptly and ensure all team members contribute, including newer members. We encourage open dialogue and value diverse opinions, which helps optimize deliverables for our clients. Our business prioritizes comprehending client requirements in depth when handling their documents. For complex demands, we conduct weekly meetings with our machine learning engineers to collaboratively address challenges and determine the most effective approach for various document types, ensuring high-quality results and a streamlined workflow for our team.</p></li><li><p><strong>Quality assurance:</strong> this is a priority for our team. We collaborate to ensure timely completion of tasks and address any delays in a dedicated meeting. After the annotation process, the team lead reviews the work for adherence to client requirements. If improvements are needed, we convene to discuss and implement solutions.</p></li></ul><h3>&nbsp;</h3><p>As an Annotation Team Lead managing a small or medium-sized team, striking a balance between the rewarding and challenging aspects of the role is essential. Our team remains committed to meeting customer expectations and fostering a collaborative work environment to ensure success.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Machine Learning at doXray! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Named Entity Recognition Using a Pre-Trained LayoutLMv2 Model]]></title><description><![CDATA[The amount of data generated by humans is ever-increasing, much of which is stored in documents and forms that need to be analyzed to extract relevant information.]]></description><link>https://blog.doxray.com/p/named-entity-recognition-using-a</link><guid isPermaLink="false">https://blog.doxray.com/p/named-entity-recognition-using-a</guid><dc:creator><![CDATA[Marko Cvjetko]]></dc:creator><pubDate>Wed, 11 Jan 2023 12:45:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rfG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The amount of data generated by humans is ever-increasing, much of which is stored in documents and forms that need to be analyzed to extract relevant information. Today, many companies still extract data through manual effort, which is often a tedious and time-consuming task. Rule-based systems are sometimes used, but they require careful engineering and tend to struggle if the environment changes, leading to costly algorithm adjustments. Recently, with the rapid advancement of NLP fueled by the success of deep learning, neural networks have been recognized as a potential new approach for automatic document processing. Soon after, LSTMs and transformers began to outperform humans in document understanding and data extraction, especially when speed and price are taken into consideration. However, since standard language models work with only word sequences, they miss out on a lot of information stored in the visual aspects of documents, such as their layout. This makes sense to us as humans. For example, imagine reading C code purely as a sequence of symbols:</p><pre><code>void swap(int* xp, int* yp){int temp = *xp;*xp = *yp;*yp = temp;}void bubbleSort(int arr[], int n){int i, j; for (i = 0; i &lt; n - 1; i++) for (j = 0; j &lt; n -i -1;j++) if (arr[j] &gt; arr[j + 1]) swap(&amp;arr[j], &amp;arr[j + 1]);}</code></pre><p>A disaster! In contrast, here is the same code, properly formatted;</p><pre><code>void swap(int* xp, int* yp){
    int temp = *xp;
    *xp = *yp;
    *yp = temp;
}

void bubbleSort(int arr[], int n){
    int i, j;
    for (i = 0; i &lt; n - 1; i++)
        for (j = 0; j &lt; n - i - 1; j++)
            if (arr[j] &gt; arr[j + 1])
                swap(&amp;arr[j], &amp;arr[j + 1]);
}</code></pre><p>Obviously, understanding formatted code is much easier. With a similar line of thought in mind, researchers at Microsoft developed layout language models (LayoutLMs). LayoutLMs are transformer-based models which, alongside the text, additionally have access to the layout of the document, meaning access to the document's image and the bounding boxes of every word. At the time of writing, three major LayoutLM models had been released: LayoutLM, LayoutLMv2 and LayoutLMv3, and all of them achieved state-of-the-art performance on a variety of visually-rich document understanding tasks.</p><p></p><h2>The Model: LayoutLMv2/LayoutXLM</h2><p>In 2020, one year after the release of the original LayoutLM paper, a paper describing LayoutLMv2 was published. The two models have similar architectures, both using three types of embeddings: text, layout and visual embeddings. Depending on the model version, the embeddings come in different flavours, but generally, these are their descriptions:</p><ul><li><p>text embedding: fixed-length vectorized representation of tokenized text,</p></li><li><p>layout embedding: axis-aligned bounding boxes of each token and</p></li><li><p>visual embedding: page region image encoded into a fixed-length sequence by a CNN-based visual encoder. The regions correspond to token bounding boxes from the layout embeddings.</p></li></ul><p>For each token, the three embeddings are concatenated to a single vector and forwarded to the following layer.</p><p>The major difference between the two models is that LayoutLMv2 had pre-training objectives which used all embedding types, while LayoutLM didn't use visual embeddings until fine-tuning. This helped LayoutLMv2 to better learn cross-modality interactions between visual and textual information, leading to an increase in performance.</p><p>The original LayoutLMv2 was pre-trained on English documents and thus could not be used on the provided dataset of documents in German. Luckily, LayoutXLM, a multilingual version of the LayoutLMv2 exists. This model has the same architecture and pre-training objectives but is pre-trained on multilingual documents.</p><p>More detailed descriptions of LayoutLMs can be found in their respective papers, linked in the references section.</p><p></p><h2>Dataset</h2><p>A dataset of invoice documents is used to fine-tune the model. As previously mentioned, the documents are in German. The raw dataset consists of PDFs and their OCR scans in JSON format. The scans include bounding box information for each recognized word. Since LayoutLMv2/XLM can only take one page at a time, multi-page documents were split so that each sample consists of one page. OCR scan files contained a lot of data irrelevant to the LayoutXLM, so a script was written to extract the important data and arrange it in CSV format. This simplifies and reduces the duration of data loading. PDFs are converted to PNG files and resized to 224x224 (dimension stated in the paper). A dummy excerpt of a CSV and a document image follow:</p><pre><code>word, x0, y0, x1, y1, label

Bei, 45, 64, 65, 76, ZERO
der, 67, 64, 101, 76, ZERO
Westschmiede, 104, 64, 121, 76, SUPPLIER
in, 123, 64, 182, 76, ZERO
D&#252;sseldorf, 184, 64, 229, 76, CITY
</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rfG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rfG7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 424w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 848w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 1272w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rfG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png" width="409" height="577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:409,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image 1: invoice sample (source: invoicehome.com)&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 1: invoice sample (source: invoicehome.com)" title="Image 1: invoice sample (source: invoicehome.com)" srcset="https://substackcdn.com/image/fetch/$s_!rfG7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 424w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 848w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 1272w, https://substackcdn.com/image/fetch/$s_!rfG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28aad84f-52b7-47a7-9f58-dbadc03efc98_409x577.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Image 1: invoice sample (source: <a href="https://invoicehome.com/">invoicehome.com</a>)</em></figcaption></figure></div><p>During preprocessing, several issues were detected:</p><ul><li><p>LayoutXLM was pre-trained using a maximum sequence length of 512. Longer examples were truncated.</p></li><li><p>Some bounding box coordinates were either missing or out of bounds. Since this was not a common occurrence, the corrupted coordinates were set to their corresponding page corner.</p></li><li><p>Some OCR scans could not be matched with PDF files, probably because of typos in file names or different naming conventions. Since a large majority of samples could be matched, removing non-matches was deemed to be the best solution.</p></li></ul><p>Preprocessing yielded 88.5k samples, with a predetermined 70.8k/17.7k train/evaluation split. The total number of tokens is 4.4 million. Excluding the "0" entity, there are 14 entity types, shown below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Whe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Whe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 424w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 848w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 1272w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Whe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png" width="640" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image 2: Number of tokens per entity type&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 2: Number of tokens per entity type" title="Image 2: Number of tokens per entity type" srcset="https://substackcdn.com/image/fetch/$s_!_Whe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 424w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 848w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 1272w, https://substackcdn.com/image/fetch/$s_!_Whe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0f992b6-e9f5-4bef-affa-653427f740cf_640x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Image 2: Number of tokens per entity type</em></figcaption></figure></div><p></p><h2>Training and Error Evaluation</h2><p>AdamW optimizer is used alongside a linear learning rate scheduler with a learning rate of 5x10^-5 and a warmup of 0.1. With 8 GB of GPU RAM, a batch size of 4 could be achieved. To compensate for the small batch size, 8 gradient accumulation steps were done between each parameter update, making the effective batch size 32.</p><p>The model is evaluated after each epoch. The best model was determined by measuring the overall macro F1-score. The model was trained for a total of 12 epochs using a GeForce GTX 1080, amounting to 53 hours of training time.</p><p>The model gave the best results on the evaluation split after 6 epochs, reaching a 0.88 f1-score.</p><h3>LayoutXLM</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oyff!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oyff!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 424w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 848w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 1272w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oyff!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png" width="503" height="409" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:409,&quot;width&quot;:503,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Table 1: LayoutXLM Results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table 1: LayoutXLM Results" title="Table 1: LayoutXLM Results" srcset="https://substackcdn.com/image/fetch/$s_!Oyff!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 424w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 848w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 1272w, https://substackcdn.com/image/fetch/$s_!Oyff!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc218c6c7-417e-4cea-a3a0-0096410f340b_503x409.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Table 1: LayoutXLM Results</em></figcaption></figure></div><p>The model achieves slightly better results compared to a BERT baseline:</p><p></p><h3>BERT</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kp2D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kp2D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 424w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 848w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 1272w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kp2D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png" width="262" height="49" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:49,&quot;width&quot;:262,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Table 2: BERT results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table 2: BERT results" title="Table 2: BERT results" srcset="https://substackcdn.com/image/fetch/$s_!Kp2D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 424w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 848w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 1272w, https://substackcdn.com/image/fetch/$s_!Kp2D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06f3ac82-09fc-4bd0-8579-c60aa48fbe16_262x49.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Table 2: BERT results</em></p><p>A few things should be noted.</p><p>The similar performances of BERT and LayoutXLM may indicate that the visual information in this particular document dataset was insufficient to make a difference.</p><p>Since LayoutXLM works only with single-page samples, splitting multi-page documents might lead to a loss of valuable context. In contrast, BERT used multi-pagers as a single sample. Perhaps LayoutXLM could compare more favourably to BERT if the dataset consisted only of single-page documents.</p><p>BERTs classification scores are the result of grid search hyperparameter optimization, whereas LayoutXLM was trained with a cherry-picked hyperparameter set. A more extensive hyperparameter search could hopefully lead to even better results.</p><p>8.7k tokens were falsely classified while having a confidence score of 0.98 or higher. In almost all of these instances, the model classified an entity token as a non-entity and vice-versa, as opposed to misinterpreting one entity for some other. It was suggested that these high-confidence false classifications are the result of poor annotation. More work should be done to analyze annotation quality.</p><p></p><h2>Conclusion</h2><p>LayoutXLM, a multilingual version of the LayoutLMv2 model was trained on an in-house dataset of German invoices. The model outperforms a BERT baseline by a slight margin, despite BERT being optimized with a robust hyperparameter search. Several roads could be pursued in future work:</p><ul><li><p>Analysis of the invoice dataset annotation quality and its impact on model performance.</p></li><li><p>Hyperparameter search optimization for LayoutXLM.</p></li><li><p>Comparison with BERT on different, perhaps more visually-rich document datasets.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Machine Learning at doXray! Subscribe for free to receive new posts and support our work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>References</h2><ul><li><p><a href="https://arxiv.org/abs/1912.13318">LayoutLM: Pre-training of Text and Layout for Document Image Understanding</a></p></li><li><p><a href="https://arxiv.org/abs/2012.14740">LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding</a></p></li><li><p><a href="https://arxiv.org/abs/2104.08836">LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding</a></p></li><li><p><a href="https://arxiv.org/abs/2204.08387">LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Named Entity Recognition using BERT on Subsampled Datasets]]></title><description><![CDATA[The Named-Entity Recognition (NER) system I worked on during my internship at doXray is part of a live production system.]]></description><link>https://blog.doxray.com/p/named-entity-recognition-using-bert</link><guid isPermaLink="false">https://blog.doxray.com/p/named-entity-recognition-using-bert</guid><dc:creator><![CDATA[Eva Jagodić]]></dc:creator><pubDate>Wed, 11 Jan 2023 12:43:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YrGY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Named-Entity Recognition (NER) system I worked on during my internship at doXray is part of a live production system. At doXray, models in production go through two main phases:</p><ol><li><p><strong>Initial training</strong>, which is done on a larger amount of labelled data before the model is deployed to production</p></li><li><p><strong>Continual learning</strong>, which consists of retraining the model, once it is already in production, on a monthly basis with new labelled data alongside the older data.</p></li></ol><p>However, this approach poses some challenges. As new data arrives each month, the total amount of training data is constantly rising, making model training more and more expensive. Furthermore, as with any system in production, there is a possibility of data drift and over time most of the data in the dataset will be older data.</p><p><strong>Subsampling</strong> in machine learning is a method of reducing the amount of data by selecting a subset of the original data. It is commonly done to resolve class imbalances, but in our case, subsampling was used to combat the aforementioned challenges. By subsampling older data and keeping all newer data, we want to reduce the amount of data and <strong>speed up training</strong>, while keeping the performance at the same level. Additionally, by prioritising newer, more relevant data, we want to <strong>account for data drift</strong>.</p><p></p><h2>Dataset</h2><p>The dataset I used during my internship contains <strong>invoice-type</strong> OCR scanned documents. There are around 42000 documents in the full dataset, with 10000 of them being recent data (less than 1 year old), and 32000 being older data (more than 1 year old). Each document has information about the supplier of the document and the time it was supplied.</p><p>The documents are already split into tokens and each token has a corresponding label. <strong>There are 15 labels in total</strong>, including the neutral &#8216;0&#8217; label, which denotes that the token is outside of any named entities. The complete list of entities can be seen in Tables 2 and 3. Unlike the usual IOB-tagging scheme that is widely used for NER tasks, the tags for the beginning and inside of a named entity are not used and each label corresponds to exactly one token.</p><p><strong>There are 32 unique document suppliers</strong> with 23 of them being present in the older data, and 29 of them in recent data. This suggests that some suppliers are new in recent data, so they will not be represented at all in the older data, but also that some suppliers have been discontinued in recent data.</p><p>Another important aspect of the dataset is the number of tokens in different documents. The shortest document length is 48 tokens, the longest is 19599 and <strong>the average is 744</strong>. As can be seen, document lengths vary greatly, but the average stays roughly the same for both older and recent data. However, this indicates that documents will have to be chunked in order to fit the BERT input, which is 512 BERT tokens.</p><p></p><h4>Dataset Splits</h4><p>The chart in Figure 1 represents the distribution of labelled documents per month. As can be seen, there is a larger amount of initial training data from November 2020, which is followed by 1000-2000 new documents each month, up to January 2022. Starting from then there is significantly less labelled data. If the model is trained on the full dataset, the recent data is in the minority, which poses a problem because it is likely more relevant in production.</p><p>To discover which combination of data works best, the training dataset was first split into <strong>recent</strong> and <strong>older</strong> <strong>data</strong>, with documents <strong>older than a year</strong> being in the older category, and everything newer than that in the recent category.</p><p>The dataset was then subsampled in different ways, with the model trained on the <strong>full dataset being the baseline model</strong>. Each of the subsampled datasets consists of all recent data, combined with a subsample of older data. Two different subsampling methods were used in combination with three different amounts of data, leading to 6 subsamples.</p><ol><li><p>Data amount</p><ol><li><p>half as many older documents are sampled as there are recent documents</p></li><li><p>the same number of older documents is sampled as there are recent documents</p></li><li><p>twice as many older documents are sampled as there are recent document</p></li></ol></li><li><p>Subsampling method</p><ol><li><p>documents are chosen randomly</p></li><li><p>documents are chosen so that there is an approximately equal</p></li><li><p>number of documents from each supplier</p></li></ol></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YrGY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YrGY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 424w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 848w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 1272w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YrGY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png" width="1024" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 1: Data distribution per month&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 1: Data distribution per month" title="Figure 1: Data distribution per month" srcset="https://substackcdn.com/image/fetch/$s_!YrGY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 424w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 848w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 1272w, https://substackcdn.com/image/fetch/$s_!YrGY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e92a60-9d90-4afc-a67b-a9734563ca13_1024x563.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Figure 1: Data distribution per month</em></figcaption></figure></div><p>To ensure the validity of the results, all the datasets (including the baseline) were split into five equal parts in which 4/5 were used as a training set and the remaining 1/5 as the validation set (i.e., data was prepared for <strong>5-fold cross-validation</strong>). This results in 6 subsampled datasets + baseline dataset = 7 datasets x 5 folds = 35 different datasets to train on.</p><p></p><h2>Data Preprocessing</h2><p>As mentioned in Chapter 2, BERT expects inputs to be 512 BERT tokens long. However, BERT doesn&#8217;t use word tokenization (which is the way data is currently tokenized), but <strong>wordpiece tokenization</strong> which splits words into subwords. BERT also expects the special tokens [CLS] and [SEP] at the beginning and end of each input. This means that the input can effectively be only <strong>510 wordpieces long</strong>. To ensure this is the case, after subsampling the documents in the way described in Chapter 3, the documents were chunked.</p><p>The next step in data preprocessing is <strong>aligning the original token labels</strong>. The original labels align with the original token. However, these tokens have now been split into subwords. There are two common approaches:&nbsp;</p><ol><li><p>propagating the original label to each subword</p></li><li><p>aligning the label only to the first subword and giving the rest of the subwords the -100 label (which will be ignored when calculating CrossEntropyLoss).&nbsp;</p></li></ol><p>For this task, the second approach was used.</p><p></p><h2>Training and Evaluation</h2><p>The model used for the experiment was BertForTokenClassification from the <a href="https://huggingface.co/">HuggingFace</a> Python library, with the &#8216;bert-base-multilingual-cased&#8217; pretrained model checkpoint.</p><p>The training was done using batch sizes of 16, a learning rate of 3e-05 with an Adam optimizer, linear scheduler, and early stopping with a patience of 2. The maximum number of epochs was 15, but this limit was never reached.</p><p>To compare the results of models fine-tuned on different subsampled datasets against each other and the baseline, two evaluation datasets were prepared for each of the 5 folds. They consisted of the validation data described in Chapter 3 but filtered to only contain data from the last 6 months for the first set and the last 12 months for the second set. The metric used was the F1-score for each individual entity as well as the macro average of all entities excluding the neutral &#8216;0&#8217; label. To calculate the final metrics for every dataset, the scores for each of the 5 folds were averaged.</p><p></p><h2>Results</h2><p>Table 1 shows the macro F1-score average on 6-month and 12-months-old evaluation data as well as the average best epoch and the approximate time needed for one epoch in minutes.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NvOE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NvOE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 424w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 848w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 1272w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NvOE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png" width="1024" height="203" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:203,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Table 1: Overall results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table 1: Overall results" title="Table 1: Overall results" srcset="https://substackcdn.com/image/fetch/$s_!NvOE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 424w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 848w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 1272w, https://substackcdn.com/image/fetch/$s_!NvOE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F423f4601-f7d6-4d61-9de4-2a1a5b2ae77c_1024x203.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Table 1: Overall results</em></figcaption></figure></div><p>As can be seen from Table 1, all results are very similar, with the general trend being that more data leads to very slightly better F1-scores and that models achieve slightly better F1-scores on 6-month evaluation data than on 12-month evaluation data. For 6-month evaluation data, both biggest subsampled datasets outperform the baseline in terms of F1-score, but only by a very thin margin. For 12-month evaluation data, the baseline remains the model with the highest F1-score. The biggest difference in the datasets is in their training time, with the smallest of the models needing less than half the training time needed for the baseline.</p><p>Important to note here is that datasets that were subsampled according to the supplier are slightly smaller than the ones subsampled by random, due to some suppliers having slightly fewer documents than were necessary for a completely equal split between all suppliers. The shorter training times are therefore to be expected.&nbsp;</p><p>Table 2 and Table 3 offer more detailed results, showing F1-scores for each individual entity. These results also show very similar results between all subsampled datasets, with none of the entities experiencing a significant performance drop or rise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VTvy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VTvy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 424w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 848w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 1272w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VTvy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png" width="1024" height="307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:307,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Table 2: F1-scores on 6-months-old data&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table 2: F1-scores on 6-months-old data" title="Table 2: F1-scores on 6-months-old data" srcset="https://substackcdn.com/image/fetch/$s_!VTvy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 424w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 848w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 1272w, https://substackcdn.com/image/fetch/$s_!VTvy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8910bd3c-f7de-44c5-b7db-c26275685902_1024x307.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Table 2: F1-scores on 6-months-old data</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CQ6f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CQ6f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 424w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 848w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 1272w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CQ6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png" width="1024" height="311" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:311,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Table 3: F1-scores on 12-months-old data&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Table 3: F1-scores on 12-months-old data" title="Table 3: F1-scores on 12-months-old data" srcset="https://substackcdn.com/image/fetch/$s_!CQ6f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 424w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 848w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 1272w, https://substackcdn.com/image/fetch/$s_!CQ6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe61bbf1-92b5-49ae-9326-3f09557b2b62_1024x311.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Table 3: F1-scores on 12-months-old data</em></figcaption></figure></div><p></p><h2>Conclusion</h2><p>The main task of this internship was to find a way to reduce the amount of older training data in a way that, when used to train a new model, would maintain the F1-score of the baseline but lower the training time. This was done by creating 6 different subsampled datasets and using each one to train a BERT model, along with a baseline. These models were then compared against each other. The results showed a very small decrease in F1-scores (at most a 0.011 decrease) for datasets with a much smaller number of data (the smallest of which is around 35% of the size of the full dataset). Similarly, no notable difference was found between subsamples created at random and the subsamples created by sampling the same number of documents from each supplier.</p><p><strong>In conclusion, subsampling older data in any of the ways tested during this internship offers a faster training time and minimal cost with respect to the F1-score.</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.doxray.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Machine Learning at doXray! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>