前段時間看了 hedong對于PageRank算法學(xué)習(xí)的文章http://hedong.3322.org/archives/000199.html,參考了 PageRank的英文原始資料,感覺hedong寫的內(nèi)容稍微少了點(diǎn),能有原版譯文就更好了!Google了一下,沒任何資料……還是自己開金山詞霸看吧-.-
想想反正都看了,索性再花點(diǎn)時間寫成文字記下來,方便今后的同道者。可是……555,偶e文實(shí)在太Poor了,因此將原文一段段附上,如有嚴(yán)重錯誤,請一定留言指正!
這是第一段,譯自:Google PageRank Introduction - http://pr.efactory.de/
Within the past few years, Google has become the far most utilized search engine worldwide. A decisive factor therefore was, besides high performance and ease of use, the superior quality of search results compared to other search engines. This quality of search results is substantially based on PageRank, a sophisticated method to rank web documents.
在過去幾年之內(nèi),Google成為了全世界被使用的最多的搜索引擎。與其它搜索引擎比較,除高性能和易用以外,一個決定性的因素是它的優(yōu)秀的搜索結(jié)果。搜索結(jié)果的這質(zhì)量極大地來源于PageRank——一個精密的排序網(wǎng)頁文件等級的方式。
The aim of these pages is to provide a broad survey of all aspects of PageRank. The contents of these pages primarily rest upon papers by Google founders Lawrence Page and Sergey Brin from their time as graduate students at Stanford University.
本文的主要目的就是對PageRank的各個方面做一次廣泛的勘測。本文內(nèi)容主要依據(jù)Google創(chuàng)始人Lawrence Page和Sergey Brin在他們作為斯坦福大學(xué)研究生時的文章。
It is often argued that, especially considering the dynamic of the internet, too much time has passed since the scientific work on PageRank, as that it still could be the basis for the ranking methods of the Google search engine. There is no doubt that within the past years most likely many changes, adjustments and modifications regarding the ranking methods of Google have taken place, but PageRank was absolutely crucial for Google's success, so that at least the fundamental concept behind PageRank should still be constitutive.
經(jīng)常被討論的是,尤其是考慮到互聯(lián)網(wǎng)的動態(tài)性,自從PageRank科學(xué)工作開始,許多時間被浪費(fèi)了,因?yàn)樗匀豢梢允荊oogle搜索引擎的等級等級的基本依據(jù)。毋庸置疑,在過去幾年內(nèi)有許多關(guān)于Google等級方法的調(diào)整和修改,但PageRank是Google成功的絕對關(guān)鍵,因此至少PageRank的根本概念在之后應(yīng)該仍然不會改變的。
Since the early stages of the world wide web, search engines have developed different methods to rank web pages. Until today, the occurence of a search phrase within a document is one major factor within ranking techniques of virtually any search engine. The occurence of a search phrase can thereby be weighted by the length of a document (ranking by keyword density) or by its accentuation within a document by HTML tags.
PageRank的概念
從萬維網(wǎng)的早期,搜索引擎開發(fā)不同的方法排序網(wǎng)頁。實(shí)際上,直到今天,任一個搜索引擎對網(wǎng)頁的排序,是根據(jù)搜索的詞組短語在頁面中的出現(xiàn)次數(shù),并用頁面長度和html標(biāo)簽的重要性提示等進(jìn)行權(quán)重修訂。
For the purpose of better search results and especially to make search engines resistant against automatically generated web pages based upon the analysis of content specific ranking criteria (doorway pages), the concept of link popularity was developed. Following this concept, the number of inbound links for a document measures its general importance. Hence, a web page is generally more important, if many other web pages link to it. The concept of link popularity often avoids good rankings for pages which are only created to deceive search engines and which don't have any significance within the web, but numerous webmasters elude it by creating masses of inbound links for doorway pages from just as insignificant other web pages.
為了得到更好的搜索結(jié)果,尤其是使搜索引擎自動抵制那些基于對詳細(xì)等級標(biāo)準(zhǔn)頁面(入口頁)內(nèi)容的分析而自動生成的網(wǎng)頁,連接人氣值的概念開始被開發(fā)了。根據(jù)這個概念,一個網(wǎng)頁文件的入鏈數(shù)量通常表示此文件的重要程度。因此,一般地,如果從其他網(wǎng)頁鏈接到一個網(wǎng)頁的數(shù)量越多,那么這個網(wǎng)頁就越重要。鏈接人氣值的概念通常可以避免那些只被創(chuàng)造出來欺騙搜索引擎并且沒有任何實(shí)際意義的網(wǎng)頁得到好的等級,然而,許多網(wǎng)站管理員為了避免發(fā)生這種情況,他們從其他沒有意義的網(wǎng)頁創(chuàng)建大量入站鏈接,而不是從入口頁(doorway pages)。
Contrary to the concept of link popularity, PageRank is not simply based upon the total number of inbound links. The basic approach of PageRank is that a document is in fact considered the more important the more other documents link to it, but those inbound links do not count equally. First of all, a document ranks high in terms of PageRank, if other high ranking documents link to it.
與鏈接人氣值向比較,PageRank的概念并不是簡單地根據(jù)入站鏈接的總數(shù)。PageRank基本的方法是,越是重要的文件鏈接一個文件,則這個文件就越重要,但那些入站鏈接并不是被平等計算的。首先,如果其他高等級的文件連接到它,那么根據(jù)PageRank的規(guī)則,此文件的等級也高。
So, within the PageRank concept, the rank of a document is given by the rank of those documents which link to it. Their rank again is given by the rank of documents which link to them. Hence, the PageRank of a document is always determined recursively by the PageRank of other documents. Since - even if marginal and via many links - the rank of any document influences the rank of any other, PageRank is, in the end, based on the linking structure of the whole web. Although this approach seems to be very broad and complex, Page and Brin were able to put it into practice by a relatively trivial algorithm.
如此, 在PageRank概念中,文件的等級由與它連接那些文件的等級決定的。它們的等級再由與他們連接文件的等級決定。因此, 文件的PageRank由其他文件的PageRank總遞歸之和確定。因?yàn),即使是在邊緣的少量鏈接,任一個文件的等級都會影響些其他文件的等級,概言之,PageRank的等級是由整個網(wǎng)的連接結(jié)構(gòu)決定的。雖然這種方法似乎是非常寬泛和復(fù)雜的, Page和Brin已經(jīng)能夠通過一個微不足道的運(yùn)算法則將它投入實(shí)踐了。
個人總結(jié):PageRank絕對是個很科學(xué)的小創(chuàng)意。說他科學(xué),你會在我以后的文章中看到Google是如何將數(shù)學(xué)(具體來說多數(shù)是統(tǒng)計學(xué))理論淋漓盡致地發(fā)揮在搜索技術(shù)之中。說他“小”,因?yàn)檫@些理論對于搞數(shù)學(xué)的人來說實(shí)在太微不足道了,甚至稍微有些科學(xué)高數(shù)知識的人都能理解。
我一向認(rèn)為,搜索引擎對于互聯(lián)網(wǎng)的價值就好比桌面操作系統(tǒng)對于計算機(jī)的價值,微軟已經(jīng)無可爭議地占領(lǐng)PC桌面之后,互聯(lián)網(wǎng)的桌面之爭從Internet誕生起就異常慘烈,后來Yahoo!因?yàn)檫M(jìn)入互聯(lián)網(wǎng)最早而取得階段性勝利。不過那時候的搜索引擎對于我們來說好比是馬桶……不得不用,一用就惡心。那時無論是Yahoo! 、AltaVista、AllTheWeb或者Lycos,搜索出來幾乎都是大便。
對于我來說,生命中出現(xiàn)搜索引擎的一天,是我同學(xué)的一個英國的同學(xué)告訴我用用看www.google.com。
出處:藍(lán)色理想
責(zé)任編輯:藍(lán)色
上一頁 下一頁 Google 的 PageRank 算法 [1]
|