This project is an end-to-end modelling pipeline for radiology report summarisation. The training data contains XML reports with separate findings and impression sections, which are cleaned into paired examples for fine-tuning a T5 family model.
I explored t5-small, flan-t5-small, and an 8-bit
quantised
flan-t5-large model with LoRA. The smaller flan-t5-small
model gave the best balance of quality and training speed on local hardware, reaching
a test Rouge-1 score of 0.6213.
| Model | Eval Rouge-1 | Test Rouge-1 |
|---|---|---|
| t5-small | 0.4941 | 0.3892 |
| flan-t5-small | 0.5928 | 0.6213 |
| flan-t5-large, 8-bit quantised | 0.5021 | 0.4012 |
The raw dataset contains 3,999 XML report files. Records without both findings and impressions are excluded, leaving 1,809 paired examples for fine-tuning. Cleaning also removes reports with redacted personal information, normalises numbered impressions into sentence format, and tidies small punctuation artifacts.
Removing records with redacted information produced the largest improvement in Rouge score. The later formatting steps made the data more consistent, though their score changes were smaller and likely more affected by random variation.
The trained model is exposed through a FastAPI app with a small web
interface and an /impression endpoint. The deployed demo accepts
findings text and returns the generated impression from the fine-tuned model.