HLTCon 2021 • The Key to Cost-Efficient, Quality Text Annotation: Data Pre-Processing

Overview

Data annotation is 20 times more work than the engineering time required to train a model. Pre-processing that data can reduce the redundant manual work in the already very labor-intensive and expensive process of annotation. A veteran in managing the complicated training data process will share how to properly prepare your data to head off headaches: from annotators tagging duplicate documents to machine learning models being confused by essentially identical characters encoded in different ways — especially an issue with Chinese, Japanese, Korean, and Arabic script languages, but also accented European languages.

View Full Schedule

Overview

Zach Yocum

Linguistic Data Engineer

Basis Technology