One Line of Code to Translate arXiv Papers into Chinese · English (unofficial) translations of posts at kexue.fm

Long-time readers might know that I am a somewhat staunch "old-school programming enthusiast"—to this day, I still don’t use IDEs or code completion; my editor only needs syntax highlighting. Even the blog posts on Scientific Space are written by me character by character in HTML source code (partly for historical reasons). Furthermore, the entire site code for Cool Papers, from front-end to back-end, was also hand-coded by me.

But even a "traditionalist" like me has to admit: AI Agents have unparalleled advantages for certain tasks—tasks where you feel daunting even to start writing by hand. This article introduces a classic example: using a single line of code to translate arXiv papers.

The Method First

First, deploy kimi-cli locally and configure it to use kimi-k2.5. You also need to install a LaTeX compilation environment (I use MacTeX). Finally, download the source code of a paper from arXiv, unzip it, enter the source directory, and execute:

kimi --print --prompt "The current directory contains the LaTeX source of an English paper. Your task is to translate the paper into a Chinese version. All English text content must be translated into Chinese, while formulas remain unchanged. For tables and figures, only translate the necessary captions. For code and algorithm blocks, only translate necessary comments. Names should not be translated. Save the new translated source in a directory named 'paper_cn' within the current directory. Ensure the source remains compilable. After translation, check carefully for any missing sections or files. Finally, use xelatex to recompile the translation into a PDF and return the path to the generated PDF. If the content is extensive, use multiple subagents for parallel translation."

Then, you can wait for the paper translation to complete. During this process, Kimi will automatically analyze the directory structure, find the files to be translated, attempt to compile them after translation, and further adjust the translation based on compilation error messages until it successfully generates a PDF. If you want to save the trouble of even downloading the source code, you can try:

kimi --print --prompt "Download the source code of the paper with arXiv ID 2502.16982 from arXiv, then unzip and translate it into a Chinese version. All English text content must be translated into Chinese, while formulas remain unchanged. For tables and figures, only translate the necessary captions. For code and algorithm blocks, only translate necessary comments. Names should not be translated. Save the new translated source in a directory named 'paper_cn' within the unzipped directory. Ensure the source remains compilable. After translation, check carefully for any missing sections or files. Finally, use xelatex to recompile the translation into a PDF and return the path to the generated PDF. If the content is extensive, use multiple subagents for parallel translation."

Using kimi-cli as an example is not entirely for advertising; it’s simply because I have only used kimi-cli. This was a "convenient" test I did for work rather than actively seeking AI Agent assistance, which further proves my "old-school" nature. If you have other AI agents you are comfortable with, feel free to try them yourself.

Comparison of Effects

Among the existing paper translation websites on the market, I believe the best one is "Hallucination Translation (hjfy.top)." Below are the results of Kimi and HJFY translating two different papers:

You can compare them yourselves:

A Few More Words

Regarding the principles behind Hallucination Translation, the author once shared them in a Zhihu article titled "Consumed 300 Million Tokens: I Translated 10,000 arXiv Papers Using Large Models." In fact, most people can probably guess the basic idea: start from the LaTeX source code of the paper (which almost all arXiv papers provide), extract the paragraphs that need translation, and translate them section by section.

However, when it comes to actual practice, we experience that "daunting" feeling. LaTeX is a syntax with a very high degree of freedom; it is not just a document syntax but has implicitly become a programming tool. In this context, manually decomposing "extraction-translation" into a finite number of steps is extremely difficult. As the author mentioned, he still fixes bad cases of translation errors every day—this is a long-term, tedious, and never-ending task.

These tasks that are "not easily decomposed into enumerable steps by humans" are exactly what AI Agents excel at! The AI automatically finds and judges the files to be translated, then translates them one by one. Even better, it automatically attempts to compile them and adjusts the document based on error messages until compilation passes. If one were to write rules manually, handling compilation error messages alone would involve endless work; its inherent difficulty might be no less than rewriting a LaTeX compiler. One can imagine the hardships involved.

Besides automation, another benefit of using kimi-cli is the translation quality. Compared to translating paragraph by paragraph or even sentence by sentence, kimi-cli, which possesses global context information, offers much better overall translation quality. Of course, there are problems, namely the common ailment of large models—hallucinations. This manifests as potential omissions or mistranslations. In an interactive environment, one can repeatedly ask Kimi to check for missing parts until satisfied, but for mistranslations, there might not be a good solution. Additionally, there is the issue of speed; going "all in" with kimi-cli is somewhat slower.

Summary

This article provides a basic usage of kimi-cli for translating arXiv papers. We also welcome everyone to provide more refined use cases.

Please include the original link when reprinting: https://kexue.fm/archives/11578

For more details on reprinting, please refer to: Scientific Space FAQ