The use of Large Language Models (LLMs) as general-purpose assistants is getting more widespread every day. Despite this, the deployment of these models in high-risk scenarios remains controversial due to issues such as hallucinations, biases or
lack of interpretability of their results. Many approaches striving to mitigate these issues aim to synergize LLMs with Knowledge Graphs (KGs), a type of Knowledge Base that captures extensive knowledge in a flexible but structured way. On the one hand, ...
The use of Large Language Models (LLMs) as general-purpose assistants is getting more widespread every day. Despite this, the deployment of these models in high-risk scenarios remains controversial due to issues such as hallucinations, biases or
lack of interpretability of their results. Many approaches striving to mitigate these issues aim to synergize LLMs with Knowledge Graphs (KGs), a type of Knowledge Base that captures extensive knowledge in a flexible but structured way. On the one hand, allowing LLMs to access KGs empowers them to exploit out-of-domain knowledge and thus avoid hallucinations. On the other hand, LLMs can also benefit KGs by assisting during their construction or enriching them with new knowledge.
In this work, we will perform an initial exploration of this synergy between LLMs and KGs in domain-specific texts and, particularly, biomedical scientific publications.
Firstly, we will explore the task of Relation Extraction via LLMs from such corpora.
This will be achieved by generating triples, the underlying knowledge representation of KGs, and by performing a thorough evaluation, culminating in the creation of an evaluation dataset for this domain. Secondly, we will analyze the use of these triples in reasoning- and knowledge-intensive tasks, specifically in a Question Answering
scenario within the same biomedical corpora. By building an experimental Retrieval Augmented Generation (RAG) system, we will analyze the effect of the inclusion of such triples, acting as condensed and structured information, in combination with other traditional techniques.
These two explorations pave the way for the development of a system that allows LLMs to better perform on domain-specific corpora, while also showing an overview of the state-of-the-art models and techniques in this scenario.
+