python通过hive transform处理数据
生活随笔
收集整理的這篇文章主要介紹了
python通过hive transform处理数据
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
[python]?view plaincopy
自己寫的一個簡單例子,用來做話題描述去重,表中的desc字段?“a-b-a-b-b-c”需要去重???? python代碼如下:???? #!/usr/bin/python???? import?sys???? reload(sys)???? sys.setdefaultencoding('utf8')???? def?quchong(desc):???? ????a=desc.split('-')???? ????return?'-'.join(set(a))???? while?True:???? ????????line?=?sys.stdin.readline()???? ????????if?line?==?"":???? ????????????????break???? ????????line?=?line.rstrip('\n')???? ????????#?your?process?code?here???? ????????parts?=?line.split('\t')???? ????????parts[2]=quchong(parts[2])???? ????????print?"\t".join(parts)???? ???? 下面是轉載過來的,比較詳細???? 二、hive?map中字段自增的寫法(轉)???? ???? 1、建立表結構???? ???? hive>?CREATE?TABLE?t3?(foo?STRING,?bar?MAP<STRING,INT>)???? ????>?ROW?FORMAT?DELIMITED???? ????>?FIELDS?TERMINATED?BY?'/t'???? ????>?COLLECTION?ITEMS?TERMINATED?BY?','???? ????>?MAP?KEYS?TERMINATED?BY?':'???? ????>?STORED?AS?TEXTFILE;???? OK???? ???? ????? ???? 2、建成的效果???? ???? hive>?describe?t3;???? OK???? foo?????string???? bar?????map<string,int>???? ???? ????? ???? 3、生成test.txt???? ???? jeffgeng????????click:13,uid:15???? ???? ????? ???? 4、把test.txt?load進來???? ???? hive>?LOAD?DATA?LOCAL?INPATH?'test.txt'?OVERWRITE?INTO?TABLE?t3;???? Copying?data?from?file:/root/src/hadoop/hadoop-0.20.2/contrib/hive-0.5.0-bin/bin/test.txt???? Loading?data?to?table?t3???? OK???? ???? ????? ???? load完效果如下???? ???? hive>?select?*?from?t3;???? OK???? jeffgeng????????{"click":13,"uid":15}???? ???? ????? ???? 5、可以這樣查map的值???? ???? hive>?select?bar['click']?from?t3;???? ???? ...一系列的mapreduce...???? ???? OK???? 13???? ???? ????? ???? 6、編寫add_mapper???? ???? #!/usr/bin/python???? import?sys???? import?datetime???? ???? for?line?in?sys.stdin:???? ????line?=?line.strip()???? ????foo,?bar?=?line.split('/t')???? ????d?=?eval(bar)???? ????d['click']?+=?1???? ????print?'/t'.join([foo,?str(d)])???? ???? ????? ???? 7、在hive中執行???? ???? hive>?CREATE?TABLE?t4?(foo?STRING,?bar?MAP<STRING,INT>)???? ????>?ROW?FORMAT?DELIMITED???? ????>?FIELDS?TERMINATED?BY?'/t'???? ????>?COLLECTION?ITEMS?TERMINATED?BY?','???? ????>?MAP?KEYS?TERMINATED?BY?':'???? ????>?STORED?AS?TEXTFILE;???? ???? ????? ???? hive>?add?FILE?add_mapper.py???? ???? ????? ???? INSERT?OVERWRITE?TABLE?t4???? ????>?SELECT???? ????>???TRANSFORM?(foo,?bar)???? ????>???USING?'python?add_mapper.py'???? ????>???AS?(foo,?bar)???? ????>?FROM?t3;???? FAILED:?Error?in?semantic?analysis:?line?1:23?Cannot?insert?into?target?table?because?column?number/types?are?different?t4:?Cannot?convert?column?1?from?string?to?map<string,int>.???? ???? ????? ???? 8、為什么會報出以上錯誤?貌似add_mapper.py的輸出是string格式的,hive無法此這種格式的map認出。后查明,AS后邊可以為字段強制指定類型???? ???? INSERT?OVERWRITE?TABLE?t4???? SELECT???? ??TRANSFORM?(foo,?bar)???? ??USING?'python?add_mapper.py'???? ??AS?(foo?string,?bar?map<string,int>)???? FROM?t3;???? ???? ????? ???? 9、同時python腳本要去除字典轉換后遺留下來的空格,引號,左右花排號等???? ???? #!/usr/bin/python???? import?sys???? import?datetime???? ???? for?line?in?sys.stdin:???? ????line?=?line.strip('/t')???? ????foo,?bar?=?line.split('/t')???? ????d?=?eval(bar)???? ????d['click']?+=?1???? ????d['uid']?+=?1???? ????strmap?=?''???? ????for?x?in?str(d):???? ????????if?x?in?('?',?"'"):???? ????????????continue???? ????????strmap?+=?x???? ????print?'/t'.join([foo,?strmap])???? ???? ????? ???? 10、執行后的結果???? ???? hive>?select?*?from?t4;???? OK???? jeffgeng????????{"click":14,"uid":null}???? Time?taken:?0.146?seconds? ?
總結
以上是生活随笔為你收集整理的python通过hive transform处理数据的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java 四则运算 栈的实现
- 下一篇: 从HBase中移除WAL?3D XPoi