The increasing digitisation of information processes has made text data a central yet complex resource to be exploited. Heterogeneous documents such as CVs, reports and e-mails, characterised by high linguistic and semantic variability, still pose considerable challenges to automated systems in terms of data extraction, semantic comparison and integration into decision-making processes.
The new study AI4Cyber analyses an approach that employs Large Language Models (LLM) for the semantic structuring of information. It proposes a pipeline capable of transforming unstructured documents into formal representations and introduces semantic comparison and scoring mechanisms to assess the relevance of information, ensuring control, transparency and reproducibility in local and secure environments.
Looking forward, this analysis proposes the evolution towards models with more advanced reasoning capabilities and adaptive processes supported by human feedback, strengthening the role of LLM in building structured and reliable knowledge bases.
If you wish to learn more, here is the link to our comprehensive study.
In addition, you can subscribe to the specific mailing list Cyber Studios by Tinexta Defence, to receive updates on upcoming research:


