File size: 4,560 Bytes
fab8051
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: RadExtract
emoji: πŸ—‚οΈ
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
header: mini
app_port: 7870
tags:
  - medical
  - nlp
  - radiology
  - langextract
  - gemini
  - structured-data
---

# RadExtract: Radiology Report Structuring Demo

[![πŸ€— Hugging Face Spaces](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/google/radextract)
[![LangExtract](https://img.shields.io/badge/Powered%20by-LangExtract-green)](https://github.com/google/langextract)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A demonstration application powered by [LangExtract](https://github.com/google/langextract) that structures radiology reports using Gemini models. Transform unstructured radiology text into organized, interactive segments with clinical significance annotations.

## Try the Demo

**[Launch RadExtract Demo](https://huggingface.co/spaces/google/radextract)**

Transform unstructured radiology reports into structured data with highlighted findings that are precisely mapped back to the original source text.

## Key Features

- **Structured Output**: Organizes reports into anatomical sections with clinical significance
- **Interactive Highlighting**: Click any finding to see its exact source in the original text
- **Clinical Significance**: Annotates findings as minor, significant, or grounding
- **Character-Level Mapping**: Precise attribution back to source text
- **Multi-Model Support**: Gemini 2.5 Flash (fast) and Pro (comprehensive)

## Quick Start

### Setup

```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp env.list.example env.list
# Edit env.list and set KEY=your_gemini_api_key_here
```

### Local Development

```bash
source venv/bin/activate
export KEY=your_gemini_api_key_here
python app.py
```

Access at: http://localhost:7870

## API Usage

### Example Request
```bash
curl -X POST \
  -H 'X-Model-ID: gemini-2.5-flash' \
  -H 'X-Use-Cache: true' \
  -d 'FINDINGS: Normal heart and lungs. IMPRESSION: Normal study.' \
  http://localhost:7870/predict
```

### Response Format
```json
{
  "segments": [{
    "type": "body",
    "label": "Chest", 
    "content": "Normal heart and lungs",
    "intervals": [{"startPos": 10, "endPos": 32}],
    "significance": "minor"
  }],
  "text": "Chest:\n- Normal heart and lungs",
  "annotated_document_json": {...}
}
```

## Architecture

- **Backend**: Flask + Python 3.10+ with full type safety
- **NLP Engine**: [LangExtract](https://github.com/google/langextract) for structured extraction
- **AI Models**: Google Gemini 2.5 (Flash/Pro)
- **Frontend**: Vanilla JavaScript with interactive UI
- **Deployment**: Docker + Hugging Face Spaces
- **Package Details**: See [pyproject.toml](https://huggingface.co/spaces/google/radextract/blob/main/pyproject.toml) for dependencies, metadata, and tooling

## Project Structure

```
radextract/
β”œβ”€β”€ app.py                 # Flask API endpoints
β”œβ”€β”€ structure_report.py    # Core structuring logic
β”œβ”€β”€ sanitize.py           # Text preprocessing & normalization
β”œβ”€β”€ prompt_instruction.py  # LangExtract prompt
β”œβ”€β”€ cache_manager.py      # Response caching
β”œβ”€β”€ static/               # Frontend assets
└── templates/            # HTML templates
```

## Development

### Setup
```bash
git clone https://huggingface.co/spaces/google/radextract
cd radextract
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
```

### Code Quality
```bash
# Format code
pyink . && isort .

# Type checking
mypy . --ignore-missing-imports

# Run tests
pytest
```

### Docker
```bash
# Build and run
docker build -t radextract .
docker run -p 7870:7870 --env-file env.list radextract
```

## License

Apache License 2.0 - see [LICENSE](LICENSE) for details.

## Related Projects

- **[LangExtract](https://github.com/google/langextract)**: Core NLP library

---

**Built for the medical AI community** | **Hosted on Hugging Face Spaces**

## Disclaimer

This is not an officially supported Google product. If you use RadExtract or LangExtract in production or publications, please cite accordingly and acknowledge usage. Use is subject to the [Apache 2.0 License](LICENSE). For health-related applications, use of LangExtract is also subject to the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-foundations/terms).