Happy New Year! Recording the Development Experience of Cool Papers · English (unofficial) translations of posts at kexue.fm

Last week, in the post "Created an Auxiliary Website for Browsing Papers: Cool Papers", I shared a paper-browsing website I developed called Cool Papers, which received recognition from several users. However, "the more people use it, the more problems are exposed." Once the user volume increased, I realized how imprecise my previous code was. Consequently, I spent the entire past week continuously fixing bugs, even discovering and fixing one as late as this afternoon. This article briefly summarizes my thoughts during the development and bug-fixing process.

Cool Papers: https://papers.cool

Technology

In fact, the domain name "papers.cool" has been registered for over four years. This shows that I had planned to create a website like Cool Papers a long time ago and had even built some prototypes. However, the fundamental reason why this website was only officially born four years later is simple: my technical skills were insufficient.

On one hand, my web development skills were lacking. I don’t actually know how to build websites; at most, I can perform simple patches in specific areas. Although this blog, "Scientific Space," has been running for over ten years, it is based on an existing blog system installed like software, and the website template is a modified version of someone else’s open-source theme. Developing a complete website involves various technical aspects that I, as an outsider, simply couldn’t manage. On the other hand, model technology wasn’t advanced enough. Without a sufficiently intelligent model to assist in browsing papers, even if I had cobbled a website together, what would be its highlight? How could I make browsing papers truly "Cool"?

Fortunately, the emergence of Large Language Models (LLMs) has, to some extent, solved these two problems. Regarding web development, if there is anything I don’t know, I can directly ask GPT-4 or Kimi. As long as one has patience, ideas, and a basic foundation in programming/web pages, a website can be developed. It must be said that LLMs are a powerful productivity tool for programming. Almost all the source code for Cool Papers was written with the help of GPT-4 and Kimi. Regarding the models, Kimi supports a maximum length of 128k, which is enough to feed an entire paper directly for accurate FAQs. This is undoubtedly a very "Cool" way to quickly understand a paper, providing the necessary highlight. Thus, in the context of LLMs, Cool Papers seemed "ready to emerge."

Art

However, things were not that simple. LLMs solved the "technical" problems but have not yet solved the "artistic" ones. For example, with the assistance of an LLM and an existing website as a reference, I might be able to copy about 70-80% of it. But writing one from scratch is completely overwhelming. This is a matter of "art" or "aesthetics," known in web development as the "front-end." Many readers might find Cool Papers a bit ugly, but I am truly sorry—I’ve done my best. The current effect is the result of repeated fine-tuning based on templates written by GPT-4. LLMs can save my technical deficiencies, but they cannot save my non-existent artistic cells.

To make matters worse, I often fall into obsessive-compulsive tendencies regarding details. For instance, I might spend half a day unable to write a single line of code because I haven’t decided how to name a variable. Or I might spend ages adjusting a margin or padding by half a pixel. For website development, which values workload and throughput, this kind of obsession is clearly disadvantageous. For front-end development, which focuses more on overall aesthetics than local details, it is even worse. I simply wasn’t born for this line of work. Throughout the development process, I had to constantly tell myself to "just make do," and I can only ask users to "make do" along with me. If any front-end experts are willing to help beautify it, I would be extremely grateful.

Backend

Having finished talking about the front-end, let’s discuss the backend. Simply put, "Website = Front-end + Backend," and "Front-end = HTML + CSS + JS." These are universal web programming components. The backend is more diverse; for example, this blog uses PHP, once hailed as the "best language." For me, the only programming language I am familiar with is Python, so I naturally chose Python for development. There are many frameworks for Python web development, such as Django, Flask, and Tornado, but I chose a very niche option—Bottle.

The reason for using Bottle is simple: it was the first Python web development framework I encountered years ago, so I stuck with it. Most frameworks are similar; for Cool Papers, something lightweight is better. What matters more is the working logic behind the entire website.

The biggest difference between Cool Papers and a typical website is that it has no content (papers) of its own. Its content comes from other websites (currently arXiv). Therefore, the backend involves operations to download content from other sites. Initially, when used internally with few users, this code was written directly into the page routing—meaning content was downloaded in real-time when a user visited the page. Although arXiv allows data retrieval via an API, there are rate limits. When the user volume increased, the access intervals became very short, leading to a high download frequency that could result in arXiv banning all download activities.

Therefore, for stability, all operations involving network communication must be completed through queues with stable access intervals. Specifically, three parts require queues: 1. Fetching the daily paper list from arXiv; 2. Downloading paper PDFs from arXiv (for Kimi); 3. Conversing with Kimi to implement the FAQ. Designing these three queues to run and interact stably without "dragging each other down" took a significant amount of my time. In particular, many errors only manifest when the traffic is high, so I have been "running non-stop" to fix bugs recently. Finally, even with many considerations, any network operation carries a risk of failure, so the processes in the queue must include a watchdog function to automatically restart after an interruption.

Updates

After last week’s release, Cool Papers was fortunate enough to receive recognition from many readers, who also provided many suggestions for improvement. Some of these have already been integrated into the latest version:

Opening All Categories: At launch last week, only a few categories on arXiv were supported. Subsequently, many readers requested their preferred categories. I have now attempted to open all categories and allowed users to select which categories to display on the homepage.
Feed Subscription Support: Many readers have RSS subscription habits and suggested adding RSS links. These have been added, though I used the more standard Atom format instead of RSS (almost all aggregators support both).
Markdown Parsing: The FAQs generated by Kimi are essentially in Markdown format. Parsing them provides a better reading experience.
Click Count Display: A number has been added after [PDF] and [Kimi], representing the number of clicks on those buttons. To some extent, this represents the popularity of the paper.
Other Detail Optimizations: Such as mobile experience optimization and [Kimi] stability improvements.

Additionally, a new GitHub project has been created to record future update logs and collect user feedback:

GitHub: https://github.com/bojone/papers.cool

There are still some valuable suggestions that have not yet been implemented in Cool Papers. Some are under development or design, while others might not fit the positioning of Cool Papers. As of now, the positioning of Cool Papers is to "browse (screen)" papers, not to "read" them. It tries to consider features that other paper-reading websites lack to "browse" papers quickly. For example, the focus of "browsing" is "timeliness" and "comprehensiveness." Changes that might introduce delays or the possibility of missing papers may not be adopted.

Finally, some readers hope to access historical papers. In fact, for historical papers already in the database, they can be accessed via https://papers.cool/arxiv/<paper_id>. For papers not in the database, the access pressure is still being evaluated (mainly concerning the abuse of [Kimi] by crawlers). Testing may be opened tentatively later.

Summary

That concludes the summary. Although I call it a summary, it is really just a journal of a website development novice, hardly fit for a grand stage. I hope the experts will simply enjoy the read. Finally, I wish everyone a Happy New Year. May everything go smoothly in the new year, may your technical skills soar, and may errors and omissions vanish!