从逻辑关系上讲,数据结构主要分为,数据的逻辑结构是指数据间的逻辑关系
进入色彩导入器害怕的蜡烛用/indexer/API/indexer//indexer/API/indexer/guess _ field _ types接口,根据页面不同参数fieldSeparator :、、 recordSeparator: \\n 、 quoteChar: \ 、有表头:没错。hue/desktop/libs/indexer/src/indexer/API 3 .pydefguess _ field _ types(request)33365306;(if file _ format[ input format ]= local file :path=URL lib _ un quote(file _ format[ path ]) r )as local _ file:reader=CSV。reader(local _ file)CSV _ data=list(reader)if file _ format[]col)for Colin CSV _ data[0]else:sample=CSV _ data[:4]column _ row=[ field _ str(count 1)for coor field _ type _ guess=关怀的汉堡为了计数,colinenumerate(column _ row):column _ samples=[sample _ row[count]for sample _ rowinsampleiflen)sample _)row field _ type _ guess=guess _ field _ type _ from _ samples(column _ gu ce)field _ type _ guess[count].to _ dict (forcount,colinenumerate(column _ row))format _={ columns } sample :sample } elif file _ format[ input format]= file :indexer=morphline indexer(request。用户请求。fs(path=URL lib _ un quote)file _ format[ path ])stream=request。fs。打开)路径)编码=检查日志。调试(文件% senco dingis % s)%)streng inverse=true(format _=indexer。guess _ field _ types((file :)stream :stream, name 3360 pattes format :file _ format[ format ]} #注意:woulalsenedtosetcharsettotable((onlysupportedinghive)))652 if sample format _ and=escape _ rows(format _[ sample ]encoding=encoding(for colinformat _[ columns ]3360 col[ name]]=SME errors=encoding=encoding)elif file _ format[ input format ]= table :elif file _ format[ InP]
utFormat]==查询:elif file _ format[输入格式]= RDBMS :elif file _ format[输入格式]= stream :elif file _ format[输入格式]==连接器:返回JsonResponse(format_)分析文件读取,数据类型处理:根据数据类型,读取datafile _ format[ format ][ hasHeader ]判断是否有表头,有获取样例数据,和表头
循环列_行获取每行的数据类型从索引器.字段导入字段,猜测_字段_类型_自_样本猜测_字段_类型_自_样本函数是入口,_guess_field_types函数猜测字段类型,内部通过设定好的字段类型判断fields.py文件如下:
类字段类型(object):def _ _ init _ _(self,name,regex,heuristic_regex=None): self ._name=name self ._regex=regex self ._ heuristic _ regex=heuristic _ regex @ property def heuristic _ regex(self):返回自我._heuristic_regex if self ._ heuristic _ regex else self。regex @ property def name(self):返回自我._name @property def regex(self):返回自我._regex def heuristic_match(self,field):pattern=re。编译(自我。heuristic _ regex,flags=re .IGNORECASE)返回模式。match(Field)class Field(object):def _ _ init _ _(self,name=new_field ,field_type_name=string ,operations=None,multi_valued=False,unique=False):self。name=name self。字段类型名称=字段类型名称本身。保持=真实的自己。操作=操作if操作else殷勤的汉堡自我。要求=虚假的自我。独一无二=独一无二的自己。多值=多值自我。show _ properties=False def to _ dict(self):return { name :self。field _ type _ name, unique: self.unique, keep: self.keep, operations: self.operations,必选:self。多值,多值:自我。显示属性:self。show _ properties,嵌套:殷勤的汉堡, level: 0, length: 100, keyType: string , isPartition: False, partitionValue : , comment : , scale: 0, precision :10 } field _ types=[字段类型( text _ general ,^[\\s\\s]*$,heuristic_regex=^[\\s\\s]{101,}$),fieldtype(string ,^[\\s\\s]*$,heuristic_regex=^[\\s\\s]{1,100}$),fieldtype(double ,^(无心的口红-]?[0-9] (\.[0-9] )?(E无心的口红-]?[0-9] )?)$ ),FieldType(长,^(?无心的口红-]?(?[0-9])$ ),字段类型(日期,^([0-9]-[0-9]-[0-9](\ s t)[0-9]:[0-9]:[0-9](.[0-9]*)?z?)$ )、FieldType(布尔,^(truefalsetf01)$)]def get _ field _ type(type _ name):如果file _ type,则返回[file _ type for file _ type in field _ type。name in type _ name][0]def guess _ field _ type _ from _ samples(samples):guess=[_ guess _ field _ type(sample)for samples]return _ pick _ best _ field(guess)def _ guess _ field _ type(field _ val):if field _ val== :返回none for field _ type in field _ types[:-1]:if field _ type最后返回数据format _={ sample :sample[ rows ][:4], columns: [ Field(col.name,HiveFormat .字段_类型_翻译。get(列类型,字符串).表_元数据.列中山口的to _ dict()]}分析预览格式:分隔符、换行符、引用处理1.调用转换函数,传参格式_
format_={ quoteChar : , recordSeparator: \\n , type: csv , hasHeader: True, fieldSeparator :, }
def _convert_format(format_dict,inverse=False):for field in format _ dict:if is instance(format _ dict[field],basestring):format _ dict[field]=_ escape _ white _ space _ characters(format _ dict[field],inverse) 2 .转义空格字符串:按照MAPPINGS,s . replace()` ` python MAPPINGS={ \ n : \ \ n , \t: \\t , \r: \\r , : \\s} ` 最终:将数据返给前端,前端按照参数处理,显示在页面==
郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。