《大數(shù)據(jù)技術(shù)如何應(yīng)用于傳統(tǒng)的信息系統(tǒng)》由會員分享,可在線閱讀,更多相關(guān)《大數(shù)據(jù)技術(shù)如何應(yīng)用于傳統(tǒng)的信息系統(tǒng)(46頁珍藏版)》請?jiān)谘b配圖網(wǎng)上搜索。
1、,6/17/2014,#,大數(shù)據(jù)技術(shù)如何應(yīng)用亍,傳統(tǒng)信息系統(tǒng),提綱,大數(shù)據(jù)技術(shù)研究背景和問題,開源軟件,A,p,ache,Hadoop,大數(shù)據(jù)處理系統(tǒng)關(guān)鍵技術(shù) 大數(shù)據(jù)技術(shù)從互聯(lián)網(wǎng)走向傳統(tǒng)應(yīng)用,背景:大規(guī)模數(shù)據(jù)計(jì)算,通信、網(wǎng)絡(luò)、存儲、傳感器等電子信息技術(shù)飛速發(fā)展導(dǎo)致,數(shù)據(jù)規(guī)模極大增加,B,i,g,Data,傳統(tǒng)的存儲并處理這些數(shù)據(jù)的技術(shù)手段遇到瓶頸,S,ea,r,c,h,Engine,D,a,t,a,W,a,r,ehousing,Log,P,r,o,cess,i,ng/,U,ser,B,ehavi,o,r,An,a,lyzing,Process,i,ng,10,0,TB,datasets,On,
2、l,ine,/,R,ealt,i,m,e/St,r,eam,ing,D,a,t,a,Analysis,數(shù)據(jù) 為王,One,node,Scanning,50MB,/,s,=,35,000,min,1000,node,Scanning,50MB,/,s,=,35,min,背景 大數(shù)據(jù)的大問題,202,0,年:,數(shù)據(jù)量將達(dá),到,35Z,B,,,較,200,9,年增,大,4,4,倍,來源:,I,D,C,D,igital,U,ni,v,erse,Stud,y,M,ay,2010,2020,年:,60%,以上的,創(chuàng),造數(shù)據(jù)將因無法存儲而丟失。,Fac,e,b,o,o,k,用戶每天上傳,3,億,張照片,超過
3、,5,0,0,T,B,的數(shù)據(jù)增長量,,1,0,0,P,B,單集群存儲 容量,Go,o,g,l,e,索引的在線數(shù)據(jù),2002,年是,5,E,B,,到,2009,年增長到,2,8,0,E,B,淘寶網(wǎng)注冊用戶達(dá)到,3.,7,億,,在線商品數(shù)達(dá),到,9,億,,14,P,B,海量數(shù)據(jù)存儲,數(shù)據(jù)爆炸對數(shù)據(jù),存儲,不,處理,效能提出了挑戰(zhàn)!,背景 大數(shù)據(jù)的大問題,202,0,年:,數(shù)據(jù)量將達(dá),到,35Z,B,,,較,200,9,年增,大,4,4,倍,來源:,I,D,C,D,igital,U,ni,v,erse,Stud,y,M,ay,2010,2020,年:,60%,以上的,創(chuàng),造數(shù)據(jù)將因無法存儲而丟失。,
4、Fac,e,b,o,o,k,用戶每天上傳,3,億,張照片,超過,5,0,0,T,B,的數(shù)據(jù)增長量,,1,0,0,P,B,單集群存儲 容量,Go,o,g,l,e,索引的在線數(shù)據(jù),2002,年是,5,E,B,,到,2009,年增長到,2,8,0,E,B,淘寶網(wǎng)注冊用戶達(dá)到,3.,7,億,,在線商品數(shù)達(dá),到,9,億,,14,P,B,海量數(shù)據(jù)存儲,數(shù)據(jù)爆炸對數(shù)據(jù),存儲,不,處理,效能提出了挑戰(zhàn)!,高速發(fā)展的數(shù)據(jù)型互聯(lián)網(wǎng)企業(yè)需要,連,續(xù)的,系,統(tǒng)擴(kuò),展,能力,數(shù)據(jù)快速增長不數(shù)據(jù)中心擴(kuò)容周期,緩,慢的,矛,盾,如何維,持,低成本曲線,和,高性能曲線,是,現(xiàn)實(shí)問題,數(shù)據(jù)業(yè)務(wù)深度的丌斷加強(qiáng)和數(shù)據(jù)處,理,性能,
5、現(xiàn),狀的,矛,盾,背景:解決大數(shù)據(jù)問題的思路,背景:解決大數(shù)據(jù)問題的思路,海量數(shù)據(jù)存儲,海量數(shù)據(jù)計(jì)算,提綱,大數(shù)據(jù)研究背景和問題,開源軟件,A,p,ache,Hadoop,大數(shù)據(jù)處理系統(tǒng)關(guān)鍵技術(shù) 大數(shù)據(jù)技術(shù)從互聯(lián)網(wǎng)走向傳統(tǒng)應(yīng)用,Hadoop,Apa,c,he,Nutc,h,2,0,02,NDFS,+M,a,pRedu,c,e,2,0,04,Hadoo,p,2,0,06,Apa,c,he,Hadoo,p,2,0,08,http,:/,h,ad,o,op,.,ap,ache.org/,B,o,ok:,http:/ l,a,rge,sca,l,e,W,eb,p,a,g,e,s,Runs,o,n,L,
6、i,n,u,x,Win,d,o,w,s,a,n,d m,o,re,C,o,mm,od,ity,ha,r,d,w,a,re,w,ith,h,i,g,h,fai,l,ure,rate,D,o,ug,C,utti,n,g,,,A,p,ac,he,軟件基 金會主席,H,a,do,o,p,is the,m,ost,successful,op,e,n,source sof,t,w,a,re a,f,ter,Li,n,u,x,.,Had,o,op,組成部分,HadoopisthemostsuccessfulopensourcesoftwareafterLinux.,MapReduce,HDFS,HBase,
7、Hive,Hadoop,組成部,分,分,HDFS,ftwareafterLinux.,apReduce,mostsuccessfulopensourceso,M,HBase,Hive,Hadoopisthe,Hadoop,組成部,分,分,HadoopHDFS,體系結(jié),構(gòu),構(gòu),規(guī)模:,10K nodes,100millionfiles,10PB,特性:,適合數(shù),據(jù),據(jù)批處,理,理;最,大,大化吞,吐,吐率;,允,允,許計(jì)算,向,向數(shù)據(jù),遷,遷移,優(yōu)化:,數(shù)據(jù)塊,副,副本、,數(shù),數(shù)據(jù)塊,放,放置策,略,略、,緩存策,略,略等,SanjayGhemawat,et.al.,TheGoogleFileS
8、ystem,SOSP03,HadoopMapReduce,處理流,程,程,Dean&Ghemawat:“MapReduce:SimplifiedDataProcessingonLargeClusters”,OSDI2004,提綱,大數(shù)據(jù),研,研究背,景,景和問,題,題,開源軟,件,件ApacheHadoop,大數(shù)據(jù),處,處理系,統(tǒng),統(tǒng)關(guān)鍵,技,技術(shù)大數(shù)據(jù),技,技術(shù)從,互,互聯(lián)網(wǎng),走,走向傳,統(tǒng),統(tǒng)應(yīng)用,Joblaun,1,ch,6,關(guān)系數(shù),據(jù),據(jù)按行,序,序運(yùn)行,時,時重建,16,原始關(guān),系,系表,列式存,儲,儲結(jié)構(gòu)(ApachePig,Zebra),行列混,合,合式存,儲,儲結(jié)構(gòu)(RCFil
9、e),JobSchedule,Task Launch,Task Launch,Task Launch,RowConstruction,RowConstruction,RowConstruction,Read DatafromDistributedFile System,Subsequent Processing,Subsequent Processing,Subsequent Processing,RowReconstruction,ParallelTasks,RCFile,將關(guān)系,數(shù),數(shù)據(jù)水,平,平分塊,,,,塊內(nèi),按,按列序,存,存儲,實(shí)現(xiàn)文件,級邏輯,結(jié),結(jié)構(gòu)優(yōu)化,行式存,儲,儲結(jié)構(gòu)(A
10、pacheHive,,SequenceFile),行列混,合,合式數(shù),據(jù),據(jù)存儲,技,技術(shù),RCFile,18,互補(bǔ)式,聚,聚簇索,引,引技術(shù),CCIndex,CCIndex,利用冗余的,副,副本數(shù),據(jù),據(jù)塊為多數(shù),據(jù),據(jù)列構(gòu),建,建聚簇,索,索引,,不,不增加,額,額外存,儲,儲空間,,實(shí)現(xiàn)數(shù)據(jù)塊,級,級布局,結(jié),結(jié)構(gòu)優(yōu),化,化以及分,布,布式實(shí),時,時查詢,統(tǒng),統(tǒng)計(jì)能,力,力,CCIndex保持了BigTable數(shù)據(jù)模,型,型高可,擴(kuò),擴(kuò)展和高吞吐,率,率特性,,,,同,時,時具有,關(guān),關(guān)系數(shù),據(jù),據(jù)模型,的,的查詢,統(tǒng),統(tǒng)計(jì)能,力,力,CCIndex將二級,索,索引變,為,為一級,索,索
11、引,,丌,丌訪問,原,原表直,接,接 進(jìn),行,行區(qū)間,查,查詢和,統(tǒng),統(tǒng)計(jì),以CCIndex為核心,的,的系統(tǒng),支,支持多,維,維區(qū)間,實(shí),實(shí)時查,詢,詢統(tǒng)計(jì),示例:selectcount(cl1)fromTABwherecl1B,Ir,亙補(bǔ)式,m,古古古工?,CClndex,利用冗余的副本數(shù)據(jù)塊為多數(shù),據(jù),據(jù)列構(gòu)建聚簇索引,不增加額外存儲空間,,實(shí)現(xiàn)數(shù)據(jù)塊級布局結(jié)構(gòu)優(yōu)化以及分布式實(shí)時查詢統(tǒng)計(jì)能,力,力,U,WM,礫,g,a,a,元,噎領(lǐng)性”,2,a,畸,帆,,t,吃草,酌,?暴阻,。.,邸,,1111,民,Id,I,飛,I,o,Cnd,創(chuàng)盼到,gTible,措,1,前擴(kuò)前四二辛苦主,同,ti
12、,主關(guān)系據(jù)膽,frj,鼓,II,自動,押回翩然回,f,:i,圓圓,I,:,,i,J,二;:,IAt,創(chuàng)口,u:,I.!,解決,7,海量數(shù)據(jù)實(shí)時分析計(jì)期快性問題,1/W,電,d,(.制:,,:1.,T!,嚀,.;刷,),-,hI,f,.I,伊麗,E,嚼訕,t,豆豆:歸路,以(創(chuàng)喇,圳,j,支持施,i,司主,lfi,袋,lt,一一一一一,Cl!nde,精衛(wèi)生,51,袁世,,E,脅撞擊,呂能司腦,lt,詞。由,i,ornt(cll)fromTMwheecllB,缸囂陽回國、問陽,.,回.、白啊.,U,國,:立白宮,J,囂瞿,.曲,陰陽,mmm,刷,刷,。,擬出,那,就,您叫,陽,剛剛,放,9,陸“禽
13、缸捐,MdS.,陽曬,Scan.,脅,Scan,陸,Ml-dim,刷,sior,四陽叫,l,臼捕魚,ThfOUJh,膽,My,舍,C,陽.,(301A),=,崎,s,。,CluI,(四叫,MyS,。也,Cluser(7,”,A,),C,旬以.,(00,叫,c=:i,優(yōu),lndex(70,峭,cc,河內(nèi),de)1411|(rang),銀行:,冠,冠字號,查,查詢,測試結(jié),論,論:,在,在并發(fā)200的情況,下,下,依,然,然能夠,提,提供秒,級,級的數(shù),據(jù),據(jù)訪問,效,效率,,可,可以,預(yù),預(yù)見完,全,全能夠,滿,滿足冠,字,字信息,查,查詢的,性,性能需,求,求。,數(shù)據(jù)規(guī)則,按照冠字信息數(shù)據(jù)模擬
14、,測試記錄數(shù),81,億,測試文件大小,610G,導(dǎo)入后大小,2.4T,索引后大小,4.7T,測試并發(fā)數(shù),200,查詢模式,單項(xiàng)查詢、組合查詢,測試,環(huán)境,IP,配置,172.16.5.50,2*Inte,l,Xeo,n,E5-265,0,8,核,/2.00GH,Z,64G,B,內(nèi)存,52TB,172.16.5.51,2*Inte,l,Xeo,n,E5-265,0,8,核,/2.00GH,Z,64G,B,內(nèi)存,52TB,172.16.5.52,2*Intel,Xeon,E,5,-2650,8,核,/2.00GH,Z,64G,B,內(nèi)存,52TB,172.16.5.53,2*Inte,l,Xeo,n
15、,E5-265,0,8,核,/2.00GH,Z,64G,B,內(nèi)存,52TB,網(wǎng)絡(luò),6G,網(wǎng)絡(luò),查詢,性能,數(shù)據(jù)量,查,詢,并發(fā),(,查詢,內(nèi),容),平均,完成,耗,時,最快,完成,耗,時,最慢,完成,耗,時,(,m,s,),(,m,s,),(m,s),81,億,200,(地區(qū)號單項(xiàng)),834,11,1665,81,億,200,(網(wǎng)點(diǎn)號單項(xiàng)),816,17,1609,81,億,200,(錯誤碼單項(xiàng)),604,2,1390,81,億,200,(冠字號碼單項(xiàng)),1149,3,2069,81,億,200,(等于某個地區(qū)號,+,等于,某個錯,誤,碼,+,等,于某,個種,類,單條),924,15,1779
16、,81,億,200,(等于某個地區(qū)號,+,等于,某個錯,誤,碼,+,等,于某,個種,類,,100,條),1763,320,3939,81,億,200,(等于某個地區(qū)號,+,小于,某個錯,誤,碼,+,等,于某,個種,類,單條),887,19,1740,81,億,200,(等于某個地區(qū)號,+,小于,某個錯,誤,碼,+,等,于某,個種,類,,100,條),2077,66,4625,互聯(lián)網(wǎng),應(yīng),應(yīng)用不,傳,傳統(tǒng)信,息,息系統(tǒng)應(yīng)用的區(qū)別,互聯(lián)網(wǎng),應(yīng),應(yīng)用,自行開,發(fā),發(fā)系統(tǒng),、,、快速,迭,迭代,,持,持續(xù)交,付,付,持,續(xù),續(xù)維護(hù),直接到,達(dá),達(dá)客戶,端,端,對,接,接Mass用戶,業(yè)務(wù)逡,輯,輯簡單,Straightforward,一,致,致性要,求,求較低,系統(tǒng)內(nèi),部,部模塊,間,間可以,無,無標(biāo)準(zhǔn)(REST),,性,性能/用戶體,驗(yàn),驗(yàn)至,上,上(KISS),傳統(tǒng)信,息,息系統(tǒng),多方開,發(fā),發(fā),依,賴,賴集成,商,商,基,于,于版本,交,交付和,維,維護(hù),Vendor對接集,成,成商,,集,集成商,對,對接最,終,終用戶,業(yè)務(wù)逡,輯,輯復(fù)雜,,,,系統(tǒng),要,要求穩(wěn),定,定可靠,,,,安全