Published on

A Cross-Document Coreference Resolution Approach to Low-Resource Languages

Authors

Theptakob, N., Seneewong Na Ayutthaya, T., Saetia, C., Chalothorn, T., Buabthong, P., "A Cross-Document Coreference Resolution Approach to Low-Resource Languages." Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science vol 14118, (2023). DOI: 10.1007/978-3-031-40286-9_34

tl;dr

  • The paper addresses cross-document coreference resolution for low-resource languages, focusing on Thai.
  • An English model based on agglomerative clustering is adapted for Thai coreference resolution.
  • A method for converting Thai text into datasets similar to the ECB+ benchmark for English is introduced.
  • The study compares manual and automatic span detection methods for coreference resolution.
  • A fine-tuned longformer model achieves the best performance with a CoNLL F1 score of 72.87.
  • The approach demonstrates improved performance in Thai coreference resolution.
  • The proposed framework can potentially be extended to other low-resource languages.

Comments

  • The approach is specifically designed for Thai, a low-resource language, which might limit its applicability to languages with even fewer resources or different linguistic structures.
  • The performance of the model heavily relies on the quality and size of the datasets converted to mimic the ECB+ benchmark, which may not fully capture the nuances of the Thai language.
  • The study finds that automatic span detection methods, while effective, still lag behind manual methods in accuracy, which could impact the overall coreference resolution performance.
  • Adapting an English-based model to a low-resource language like Thai involves significant challenges in terms of linguistic differences, which may not be fully addressed by the current methodology.