ML data platform for accurate document extraction and mapping