- Published on
A Cross-Document Coreference Resolution Approach to Low-Resource Languages
- Authors
- Name
- Pai Buabthong
- @paippb
Theptakob, N., Seneewong Na Ayutthaya, T., Saetia, C., Chalothorn, T., Buabthong, P., "A Cross-Document Coreference Resolution Approach to Low-Resource Languages." Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science vol 14118, (2023). DOI: 10.1007/978-3-031-40286-9_34
tl;dr
- The paper addresses cross-document coreference resolution for low-resource languages, focusing on Thai.
- An English model based on agglomerative clustering is adapted for Thai coreference resolution.
- A method for converting Thai text into datasets similar to the ECB+ benchmark for English is introduced.
- The study compares manual and automatic span detection methods for coreference resolution.
- A fine-tuned longformer model achieves the best performance with a CoNLL F1 score of 72.87.
- The approach demonstrates improved performance in Thai coreference resolution.
- The proposed framework can potentially be extended to other low-resource languages.
Comments
- The approach is specifically designed for Thai, a low-resource language, which might limit its applicability to languages with even fewer resources or different linguistic structures.
- The performance of the model heavily relies on the quality and size of the datasets converted to mimic the ECB+ benchmark, which may not fully capture the nuances of the Thai language.
- The study finds that automatic span detection methods, while effective, still lag behind manual methods in accuracy, which could impact the overall coreference resolution performance.
- Adapting an English-based model to a low-resource language like Thai involves significant challenges in terms of linguistic differences, which may not be fully addressed by the current methodology.