Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Ferreira Moreira, Pedro José
  • dc.date.accessioned 2025-11-04T17:38:20Z
  • dc.date.available 2025-11-04T17:38:20Z
  • dc.date.issued 2025
  • dc.description Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
  • dc.description Supervisors: Vicenç Gómez & Leo Anthony Celi Academic Tutor: Vicenç Gómez
  • dc.description.abstract Transformer-scale language models can now ace many medical exams, but their frozen parametric memory risks propagating outdated guidelines and systemic bias to the bedside. To counter this, we re-imagine the diagnostic assistant as a navigator that plans, retrieves, executes code, and verifies evidence rather than guessing from memory. We introduce DeepMed, a 4 B-parameter multi-agent framework who attempts to switch the paradigm of medical assistances from diagnostic oracles to information retrievers. Agents invoke external tools via the open Model Context Protocol (MCP), including M3, a natural-language gateway to the MIMIC-IV EHR, and a sandboxed Python REPL for on-the-fly calculations. Performance is audited on the newly proposed MedBrowseComp benchmark (1 089 quarterly-regenerating, multi-hop oncology related queries), legacy QA suites, the EquityMedQA counter factual set, and the EHRSQL challenge. With just a 4 billion parameter LLM as the cognitive engine DeepMed achieves 26 %single-pass accuracy on MedBrowseComp, outperforming larger entreprise grade systems that rely on 10 to 100 times larger fine tuned models while running locally on a consumer laptop. On EquityMedQA it increases correctness from 50.8 % to 57.4%, a 13% relative reduction in demographic disparity. Coupling MCP to the M3 EHR interface lifts pass@1 on EHRSQL from 2% to 9%. By fusing agentic planning, typed tool use, and evidence-first reporting, DeepMed shows that bias-aware, verifiable clinical AI can be achieved without frontier-scale models or costly GPU clusters. The open-sourced multiagent framework, MCP server tool contributions like M3 and MedBrowseComp benchmark provide a reproducible path toward transparent, low-cost decision support in safety-critical healthcare settings.ENG
  • dc.identifier.uri http://hdl.handle.net/10230/71770
  • dc.language.iso eng
  • dc.rights Llicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/
  • dc.subject.other Sistemes multiagent
  • dc.title Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics
  • dc.type info:eu-repo/semantics/masterThesis