Core NLP XML to Brat Ann Converter
Overview
This Python script automates the conversion of CoreNLP XML files into the Brat annotation format, specifically handling Enhanced++ Dependencies. It extracts Part-of-Speech (POS) tags and syntactic dependency relations to generate annotation and visualization configuration files compatible with the Brat tool, streamlining linguistic annotation workflows.
Key Features
- Parses CoreNLP XML structure using Python’s built-in
xml.etree.ElementTreelibrary. - Extracts tokens, POS tags, and dependency relations with Enhanced++ Dependencies support.
- Generates Brat-compatible output files:
.ann(annotations),annotation.conf, andvisual.conf. - Designed for ease of use: run from the command line with a single XML input file.
Usage
Run the script in your terminal by specifying the input XML file:
python3 converter.py [fileToConvert.xml]
This produces three output files in the current directory:
fileToConvert.ann— Brat annotation fileannotation.conf— Configuration for annotation labelsvisual.conf— Visualization settings
Important Notes
- The script relies solely on Python Standard Library modules, requiring no external dependencies.
- Input XML files must follow CoreNLP’s structure with
<document>,<sentences>,<tokens>, and<dependencies>elements for proper parsing. - This tool processes one XML file at a time and does not support batch conversion.
Development Details
- Language: Python
- Libraries:
xml.etree.ElementTree(Standard Library) - Focus: Natural Language Processing (NLP) annotation automation
