Deep research agentic framework for mitigating bias in AI-driven healthcare diagnostics

Ferreira Moreira, Pedro José2025-11-042025-11-042025http://hdl.handle.net/10230/71770Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)Supervisors: Vicenç Gómez & Leo Anthony Celi Academic Tutor: Vicenç GómezTransformer-scale language models can now ace many medical exams, but their frozen parametric memory risks propagating outdated guidelines and systemic bias to the bedside. To counter this, we re-imagine the diagnostic assistant as a navigator that plans, retrieves, executes code, and verifies evidence rather than guessing from memory. We introduce DeepMed, a 4 B-parameter multi-agent framework who attempts to switch the paradigm of medical assistances from diagnostic oracles to information retrievers. Agents invoke external tools via the open Model Context Protocol (MCP), including M3, a natural-language gateway to the MIMIC-IV EHR, and a sandboxed Python REPL for on-the-fly calculations. Performance is audited on the newly proposed MedBrowseComp benchmark (1 089 quarterly-regenerating, multi-hop oncology related queries), legacy QA suites, the EquityMedQA counter factual set, and the EHRSQL challenge. With just a 4 billion parameter LLM as the cognitive engine DeepMed achieves 26 %single-pass accuracy on MedBrowseComp, outperforming larger entreprise grade systems that rely on 10 to 100 times larger fine tuned models while running locally on a consumer laptop. On EquityMedQA it increases correctness from 50.8 % to 57.4%, a 13% relative reduction in demographic disparity. Coupling MCP to the M3 EHR interface lifts pass@1 on EHRSQL from 2% to 9%. By fusing agentic planning, typed tool use, and evidence-first reporting, DeepMed shows that bias-aware, verifiable clinical AI can be achieved without frontier-scale models or costly GPU clusters. The open-sourced multiagent framework, MCP server tool contributions like M3 and MedBrowseComp benchmark provide a reproducible path toward transparent, low-cost decision support in safety-critical healthcare settings.engLlicència CC Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional (CC BY-NC-ND 4.0)Sistemes multiagentDeep research agentic framework for mitigating bias in AI-driven healthcare diagnosticsinfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccess