LOADING...

New AI tool can help computers read documents better

Technology

DeepSeek, on Monday, released DeepSeek-OCR, a new open-source AI tool that changes how computers read documents.
Instead of the usual method, it turns text into "vision tokens" using optical 2D mapping—so huge files get processed with way fewer tokens and less computing power.
It's up on GitHub now under the MIT license for anyone to use, whether you're a student or building something commercial.

It can process PDFs at about 2,500 tokens per 2nd

DeepSeek-OCR comes with different modes (like "Gundam" for super high-res docs) and handles OCR tasks like a pro—accurately turning images into text.
On a standard A100-40G GPU, it can process PDFs at about 2,500 tokens per second and crank out over 200K pages of training data every day.
Plus, it's lightweight and works smoothly with vLLM and Transformers frameworks.

It beats other popular models in benchmarks

What makes this model special? It beats other popular models in benchmarks by using fewer visual tokens while keeping accuracy high.
By embedding visual info straight into AI workflows (instead of old-school tokenizers), it makes everything more efficient—and could really simplify future AI tools.