对于搜索引擎, 在索引量和搜索量大到一定程度的时候, 索引更新的效率会逐渐降低, 服务器的压力逐渐升高, 因此基本上整个搜索引擎的利用率可以说是越来越低了, 并且随着海量数据存储带来的困难, 设计一个良好的分布式搜索引擎将是一个搜索引擎能否面相未来发展的关键因素了.
那么分布式搜索引擎的最主要的核心问题是哪些呢?
1. 分布的信息获取和计算以及对此进行的数据统一
这里面包括爬虫/或者相应的数据获取机制的分布, 对信息进行加工的统一管理
2. 数据处理后的分布存储和管理
主要是文件的准确定位和更新,增加,删除,移动的机制
3. 前端搜索服务的分布
主要处理大规模并发请求时的分发机制
基于以上3个基本需求, 基本上可以构造如下4类的分布式搜索引擎:
1. 分布式元搜索引擎
2. 散列分布搜索引擎
3. P2P 分布搜索引擎
4. 局部遍历型搜索引擎
下面逐步介绍以上4类可扩展的搜索引擎:
1. 分布式元搜索:
拥有多个单个的搜索引擎, 中心搜索引擎是利用这些分布的单个的搜索引擎的结果进行撮合得到完整的结果.
这样的设计方案要求各个单元的搜索引擎拥有相同的排序算法和基本相同的数据输出结构,以便由中心搜索进行整理。
对于这类的搜索引擎,关键的设计是要求每一个单元所拥有的索引不构成重复,但是进行数据的采集(爬虫)时可以采取独立的系统获取后再按照规则分布到各个单元上。
优点,设计简单,快速,并且任何一个单元可以随时的摘掉但并不影响太大。
缺点,对于大规模的并发并非好的解决办法
2.散列分布搜索引擎
根据Query对索引服务器和文档服务器进行散列,做到对于任何的索引词能够准确的定位到具体的索引服务器并从而定位到正确的文档服务器。
优点,抗压,设计简单
缺点,对于单个索引服务器或者文档服务器的容量等动态的调整较困难
3.Peer 2 peer 搜索引擎
著名的Napster就是这样的一种设计,利用集中方式的索引,配合分布于世界各地的单个的计算机形成的文件源,构成了世界上最庞大的p2p搜索引擎之一。
这种设计里的中心索引服务器只记录一些相对关键的信息,例如位置(IP,序列号),歌曲的名字,作者等,其它的信息一概可以从任何在线并且拥有本条全面信息的计算机上获取。同时p2p也可以根据搜索建立一些中间路由的缓存,即将一些搜索结果存在单个或者相近的节点上,加快搜索速度。
优点,可以超级大,基本上不需要有维护成本
缺点,中心服务器的更新效率很低,信息源不稳定
4. 局部遍历型搜索引擎
这类的搜索引擎又可以采用多种设计方案,其中比较可行的是对信息进行聚类后建立信息树,搜索时只需要从树的一个分支下去遍历便可以了。局部遍历应当有一定的规则,并且在设计初期就需要对每一个加入的索引进行相对准确的位置安排,使得放置在合适的节点上,以保证搜索的效率。
优点,容易解决抗压,搜索精度高,搜索效率高
缺点,设计复杂,调整索引所在节点的位置不易
总体来说,搜索引擎的设计方法可以很多,这里只是抛砖引玉,相信未来会有更多的巧妙的设计方案出现。
Every female student had an issue of Linglong magazine in hand during the 1930s. On the one hand, Ling Long imparted the beauty secrets of movie stars, and on the other hand instructed "beautified" and "made up" girls how to keep close guard against the attacks of men, because all men harbor bad intentions. True dating is dangerous, but marriage is even more dangerous, because marriage is the tomb of dating.- Female Shanghai author Zhang Ailing "Talking About Women" (1944)
More about Ling Long, from the Columbia University website:
Between 1931 and 1937, the Sanhe publishing company, located on Nanjing Road in Shanghai, published Ling long magazine, which they called Linloon magazine in English. This pocket-sized weekly stood only 13 centimeters high.
According to the first issue, the magazine cost seven fen (7/100ths) of a foreign ounce of silver or 21 copper coins and an extra two fen (2/100ths) of a foreign ounce of silver in other cities. Mr. Lin Zecang was the main backer of the magazine. The editorial board included Mr. Zhou Shexun (entertainment), Ms. Chen Zhenling (women's features), and Mr. Lin Zemin (photography). Both men and women contributed photographs and articles, though the majority of articles appear to have been written by women as indicated by the title nushi (lady) placed next to their name.The goal of the magazine was "to promote the exquisite life of women, and encourage lofty entertainment in society." The magazine was divided into two parts, indicated by the front and back covers. The front cover usually featured a photograph of a woman who represented the magazine's ideal of the modern woman, while content on the back cover was usually related to the cinema. The magazine was read in both directions. The articles that read from front to back were usually more instructional and related to women's issues. Articles and photographs that read from the back cover were often concerned with entertainment or unusual feature stories.
The word ling long (elegant and fine) has an etymology that reaches back to a collection of onomatopoetic words from the Ming dynasty (1368–1644) signifying the sounds of pieces of jade clinking together.1 The classical meaning of ling long also connoted delicate female handiwork. The editors of Ling long magazine redefined this word to mean modern female style. Just like the onomatopoetic sound of the word ling long, articles and photographs on the magazine's pages reverberated like clinking jade. Although certain columns on movies, child-rearing, and legal advice appeared with some regularity, the magazine did not maintain a standard format, and articles often contradicted one another. For example, one article might have showcased the latest movies from Hollywood, while another article attempted to drum up xenophobic patriotism. These different viewpoints came together like clinking pieces of jade in the cacophony that was Ling long magazine.
1 Craig Clunas, Superfluous Things (Urbana: University of Illinois Press, 1991), 85.
Two other noteworthy sites that have extensive archives of 20th Chinese historical materials are:
The Return of the Travelling Daughter: Abigail Washburn Brings her Banjo Back to Beijing
Those of you around last November might recall the brief Chinese-spiced bluegrass and old-time invasion of a four-piece outfit lead by banjo player and singer-songwriter (and former Beijing resident) Abigail Washburn. Those of you who weren’t may have heard of her imminent arrival. On this, her second China tour in just over a year, she’s backed up by a new full-length album (Song of the Traveling Daughter, on Nettwerk Records,) and a flurry of American (and Chinese) media attention
Her original take on traditional folk and old-time American music has an added twist: A melding of the musical and linguistic traditions of the Middle Kingdom. Two songs on her solo debut are original compositions with Chinese lyrics including the title track, a riff on the classic Meng Jiao poem 游子吟 (‘Song of the Traveling Son’).
This year, she'll be appearing with very special guest, eight-time Grammy winner Bela Fleck for a limited number of shows; at all four shows, Abigail will be joined by fiddler Casey Driessen (who also came last year) and cellist Ben Sollee.
Gig schedule and links:
Friday, November 18
With Iz
South Gate Space
4 Jiuxianqiao Lu (798 Art District south gate) 6437 9737
RMB50 (40 for students)
8.30pm
Soundcheck/rehearsal w/Iz: 3.30pm
Saturday, November 19
With Hangai
Yu gong yi shan
1 Gongti Beilu (at rear of Gongti Parking Lot) 6415 0687
RMB50
9.30pm
Soundcheck/rehearsal w/Hanggai 4.30
Wednesday, November 23
4pm
Master’s Class/performance
Midi School of Modern Music
Rui wang fen, Haidian District 6259 0101/0007/0008
RMB30 (Midi Students free)
Pick-up at 2.30
Saturday November 26
Peking University Hall (South side of library); www.pku-hall.com, 6275 2279, 6275 8452
RMB20-150
For tickets: Piaowutong or 6406 8888/9999;
Fusheng Record Shop SE corner of Ping'anli intersection (Ping'an Dadao and Xinjiekou): 6613 6182;
Yu Gong Yi Shan, 1 Gongti Beilu (north end of parking lot opposite north gate Workers' Stadium): 6415 0687
不是我不明白,这世界变化快,2年前我用过 EurekSter的搜索引擎,1年前我用过一个叫SWiki的在线服务,前一段时间挂了,转向去了另外一个地方,今天从WebLeOn那里看到:原来变成了SWicki。太奇怪了,一个基于社交网络的搜索引擎(你可以看到你朋友的搜索推荐)怎么变成了一个面向个人的搜索引擎呢?