But the list shows that research articles can be just as influential as books, notes Acharya. Xls, economics papers have more prominence. And at the top of both Google’s and Thomson Reuters’ rankings are the same three research articles — albeit in different order. It lists papers published from 1900 to the present day. Paul Wouters, director of the Centre for Science and Technology Studies in Leiden, the Netherlands, says that many methods papers “become dna research papers full text a standard reference that one cites in order to make clear to other scientists what kind of work one is doing”. ) The search covered all of Thomson Reuter’s Web of Science, an online version of the SCI that also includes databases covering the social sciences, arts and humanities, conference proceedings and some books. At number 4, the most-cited book is the manual Molecular Cloning, a mainstay of molecular-biology laboratories. Hence the theory’s name. The Web of Science is not the only index of citations available. In that list, available at Google Scholar Top 100. Yet none of the papers that announced them comes anywhere close to ranking among the 100 most highly cited papers of all time. So popular is BLAST that versions 8, 9 of the program feature twice on the list, at spots 12 and 14. But among the science papers, many of the same titles show up. Citations, in which one paper refers to earlier works, are the standard means by which authors acknowledge the source of their methods, ideas and findings, and are often used as a rough measure of a paper’s importance. ” But over 40 years, his work gave rise to the regularly updated SHELX suite of computer programs, which has become one of the most popular tools for analysing the scattering patterns of X-rays that are shot through a crystal — thereby revealing the atomic structure. Nor is Thomson Reuters’ list the only ranking system available. Modern bibliometricians therefore recoil from methods as crude as simply counting citations when they want to measure a paper’s value: instead, they prefer to compare counts for papers of similar age, and in comparable fields. Another common practice in science ensures that truly foundational discoveries — Einstein’s special theory of relativity, for instance — get fewer citations than they might deserve: they are so important that they quickly enter the textbooks or are incorporated into the main text of papers as terms deemed so familiar that they do not need a citation. The technique made it possible to devise trees from large data sets without eating up computer resources. (See the full list at Web of Science Top 100. My job was to teach chemistry, and I wrote the programs as a hobby in my spare time. "It was a program written by biologists; I'm trying to find a nice way to say that," says Thompson, who is now at the Institute of Genetics and Molecular and Cellular Biology in Strasbourg, France. Nobody fully understands what distinguishes the sliver at the top from papers that are merely very well known — but researchers' customs explain some of it. The second (number 24) was British statistician David Cox's 1972 paper 16 that expanded these survival analyses to include factors such as gender and age. Thomson Reuter's Web of Science holds some 58 million items. And not all fields produce the same number of publications. A prime example is BLAST (Basic Local Alignment Search Tool), which for two decades has been a household name for biologists wanting to work out what genes and proteins do. George Sheldrick, a chemist at the University of Göttingen in Germany, began to write software to help solve crystal structures in the 1970s. The colossal size of the scholarly literature means that the top-100 papers are extreme outliers. Within seconds, they will be shown related sequences from thousands of organisms — along with information about the function of those sequences and even links to relevant literature. The team that developed ClustalW, at the European Molecular Biology Laboratory in Heidelberg, Germany, had created the program to work on a personal computer, rather than a mainframe. Clustal allows researchers to describe the evolutionary relationships between sequences from different organisms, to find matches among seemingly unrelated sequences and to predict how a change at a specific point in a gene or protein might affect its function. Kohn realized that he could calculate a system's properties, such as its lowest energy state, by assuming that each electron reacts to all the others not as individuals, but as a smeared-out average. Much of this crossover success stems from the ever-expanding stream of data coming out of biomedical labs. But a few decades passed before researchers found ways to implement the idea for real materials, says Giustino. One (number 8) is by Axel Becke, a theoretical chemist at Dalhousie University in Halifax, Canada, and the other (number 7) is by US-based theoretical chemists Chengteh Lee, Weitao Yang and Robert Parr. Biologists tend to cite one another’s work more frequently than, say, physicists. Xls). Xls or the interactive graphic, below. “Folks have focused on journals, but there is this other world of books out there,” says Anurag Acharya, a software engineer who leads the Google Scholar team in Mountain View, California. Google Scholar compiled its own top-100 list for Nature (see ‘An alternative ranking’). Meanwhile, the foothills comprise works that have been cited only once, if at all — a group that encompasses roughly half of the items. In those days, he says, “you couldn’t get grant money for that kind of project. A 1997 paper 11 on a later version called ClustalX is number 28. The volume of citations has increased, for example — yet older papers have had more time to accrue citations. ) Citation counts are riddled with other confounding factors. “We physical anthropologists were facing kind of the big data of that time,” says Saitou, now at Japan’s National Institute of Genetics in Mishima. Two 22, 23 top-100 papers are technical recipes on which the most popular DFT methods and software packages are built. In 1992, computational chemist John Pople (who would share the 1998 Nobel prize with Kohn) included a form of DFT in his popular Gaussian software package. A 1994 paper 10 describing ClustalW, a user-friendly version of the software, is currently number 10 on the list. If that corpus were scaled to Mount Kilimanjaro, then the 100 most-cited papers would represent just 1 centimetre at the peak. But the software was transformed when Julie Thompson, a computer scientist from the private sector, joined the lab in 1991. To mark the anniversary, Nature asked Thomson Reuters, which now owns the SCI, to list the 100 most highly cited papers of all time. Theoretical physicist Walter Kohn led the development of DFT half a century ago in papers 20, 21 that now rank as numbers 34 and 39. The discovery of high-temperature superconductors, the determination of DNA's double-helix structure, the first observations that the expansion of the Universe is accelerating — all of these breakthroughs won Nobel prizes and international acclaim. In principle, the mathematics are straightforward: the system behaves like a continuous fluid with a density that varies from point to point. Users simply have to open the program in a web browser and plug in a DNA, RNA or protein sequence. The rapid expansion of genetic sequencing since Sanger's contribution has helped to boost the ranking of papers describing ways to analyse the sequences. (And, in a nice cross-fertilization within the top-100, Clustal's algorithms use the same strategy. Google Scholar's list also features books, which Thomson Reuters did not analyse. It is based on many more citations because the search engine culls references from a much greater (although poorly characterized) literature base, including from a large range of books. Thompson rewrote the program to ready it for the volume and complexity of the genome data being generated at the time, while also making it easier to use. Only 14,499 papers — roughly a metre and a half's worth — have more than 1,000 citations (see 'The paper mountain'). For example, the most frequently cited statistics paper (number 11) is a 1958 publication 15 by US statisticians Edward Kaplan and Paul Meier that helps researchers to find survival patterns for a population, such as participants in clinical trials. That introduced what is now known as the Kaplan–Meier estimate. Fifty years ago, Eugene Garfield published the Science Citation Index (SCI), the first systematic effort to track citations in the scientific literature. Google Scholar has also generated a list of the 'most-cited' articles of all time for Nature ( Google Scholar Top 100. But owing to the vagaries of citation habits, BLAST has been bumped down the list by Clustal, a complementary programme for aligning multiple sequences at once. Two-thirds of the entries are books, which Thomson Reuters did not include.