Generic formats like JSON or XML are easier to version than forms. However, they were not originally intended to be ...
LiteParse is a standalone OSS PDF parsing tool focused exclusively on fast and light parsing. It provides high-quality spatial text parsing with bounding boxes, without proprietary LLM features or ...
Swiftlet is a high-performance text-parsing library for Rust, inspired by Python’s Lark. It accepts a context-free grammar (CFG) as input and generates an efficient parser capable of analyzing and ...
Abstract: Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding and the emergence of large language models ...