Knowledge graphs (KGs) play a central role in representing structured information of real world entities and their relationships and supporting tasks such as reasoning, search, and question answering. Recent advances in natural language processing have opened the door to building KGs directly from text, making it possible to automate the extraction of entities and their semantic relations at scale. This work
presents a complete pipeline for constructing a knowledge graph from natural language and using it to improve the accuracy and reliability of answers generated by large language models (LLMs).
The proposed system consists of four main stages. First, entities are identified using a domain adapted Named Entity Recognition model trained with BIO tagging on a corpus of automatically generated sentences. Then, a relation extraction component generates subject–predicate–object triples using a generative Transformer based model fine-tuned with domain-specific data. These structured triplets are
encoded into dense vectors using semantic embeddings and indexed using a high performance HNSW (Hierarchical Navigable Small World) search structure. This allows efficient retrieval of relevant facts at inference time. To evaluate the performance of the system, experiments were conducted across twelve semantic domains using the WikiDialogue dataset. The evaluation focused on entity detection accuracy and triplet extraction precision. Results show that training on multiple domains significantly improves generalization. Moreover, the use of strict entity boundaries and span-based matching helps reduce false positives. In the final stage, the indexed triplets are used as input to an LLM (LLaMA 3.3–70B Instruct) using a Retrieval-Augmented Generation setup. Given a user query, the system retrieves semantically similar triplets and includes them in the model prompt. Three prompting strategies are explored: no context (the model answers based solely on its internal knowledge), strict (answers only if the fact is explicitly stated
in the retrieved context), and permissive (uses background knowledge when the context is insufficient). Examples show that strict prompts reduce hallucinations, while permissive prompts provide fluent answers but may introduce unsupported information. Overall, the results confirm that a modular KG-based pipeline can enhance LLM output by grounding it in structured knowledge. The system is scalable, adaptable to multiple domains, and highlights the trade-offs between precision, recall, and fluency in generation tasks. This opens the door to applying similar approaches in settings where precision and clarity are essential, such as healthcare, finance, or enterprise data systems.