当前查询到4条专利与查询词 "郭立【EN】Ye Kai"相关,搜索用时0.3125011秒!排序方式:
发明专利:4实用新型: 0外观设计: 0
4 条,当前第 1-4 条 返回搜索页
申请号:202010038141.8 公开号:CN111210873A 主分类号:G16B20/10
申请人:【中文】西安交通大学【EN】XI'AN JIAOTONG University 申请日:2020.01.14 公开日:2020.05.29
摘要:【中文】本发明公开了一种基于外显子测序数据的拷贝数变异检测方法及系统、终端和存储介质。方法包括:对正常样本的外显子测序数据进行数据清理,然后对数据进行标准化处理,得到正常样本集数据矩阵;根据每个外显子区域在所有样本中的离散程度,将外显子区域划分成稳定与不稳定的区域;正常样本集数据矩阵在外显子稳定的区域中处理批次效应进而构建参考数据矩阵;使用PCA方法对参考数据矩阵进行处理,通过用主成分重构原始数据,将参考数据矩阵转换到其他空间并得到新的参数;将测试数据变换到参考数据矩阵使用PCA转换后的空间中,然后使用Z‑score方法得到测试数据与参考数据矩阵在当前空间中的差异程度,完成对测试样本的拷贝数变异的检测。采用该方法可以降低成本,实现外显子测序数据拷贝数变异检测的准确性和有效性。 【EN】The invention discloses a copy number variation detection method and system based on exon sequencing data, a terminal and a storage medium. The method comprises the following steps: performing data cleaning on exon sequencing data of a normal sample, and then performing standardization processing on the data to obtain a normal sample set data matrix; dividing the exon regions into stable and unstable regions according to the discrete degree of each exon region in all samples; processing batch effect of the normal sample set data matrix in an exon stable region to further construct a reference data matrix; processing the reference data matrix by using a PCA method, and transforming the reference data matrix into other spaces by reconstructing original data by using principal components to obtain new parameters; and converting the test data into a space obtained by converting the reference data matrix by using PCA, and then obtaining the difference degree of the test data and the reference data matrix in the current space by using a Z-score method to finish the detection of the copy number variation of the test sample. By adopting the method, the cost can be reduced, and the accuracy and the effectiveness of exon sequencing data copy number variation detection can be realized.
详细信息 下载全文

申请号:202010121579.2 公开号:CN111243663A 主分类号:G16B20/20
申请人:【中文】西安交通大学【EN】XI'AN JIAOTONG University 申请日:2020.02.26 公开日:2020.06.05
摘要:【中文】一种基于模式增长算法的基因变异检测方法,在预处理后的测序比对数据中提取有变异特征信号的数据并进行聚类,将聚类后的每一类数据中所有的短读段序列依据比对状态拆分成状态分别为S和M的两段,将所有的S段的序列压缩建立一致性序列,将所有的M段的序列压缩建立一致性序列;并对同一类中的数据信息进行计算,则形成超项目,根据每个超项目在基因组上的位置将每个超项目按顺序存入变异信号数据库中;使用模式增长算法,建立断点间的比对关系;建立变异模型,确定变异的类型。本发明直接将过滤后数据库中的所有断点为对象进行比对来获取相互比对关系,从而得到全局比对信息,实现更精准的变异检测效果,对较小或较大变异均有很好的检测效果。 【EN】A gene variation detection method based on a pattern growth algorithm comprises the steps of extracting data with variation characteristic signals from preprocessed sequencing comparison data, clustering, splitting all short read segment sequences in each type of clustered data into two segments with states of S and M respectively according to a comparison state, compressing all sequences of the S segments to establish a consistent sequence, and compressing all sequences of the M segments to establish a consistent sequence; calculating data information in the same class to form superitems, and sequentially storing each superitem into a variation signal database according to the position of each superitem on a genome; establishing a comparison relation between breakpoints by using a pattern growth algorithm; and establishing a variation model and determining the type of variation. According to the invention, all breakpoints in the filtered database are directly compared as objects to obtain the mutual comparison relationship, so that the global comparison information is obtained, a more accurate variation detection effect is realized, and a good detection effect is achieved on small or large variations.
详细信息 下载全文

申请号:201911347153.2 公开号:CN111199777A 主分类号:G16B50/00
申请人:【中文】西安交通大学【EN】XI'AN JIAOTONG University 申请日:2019.12.24 公开日:2020.05.26
摘要:【中文】面向生物大数据的流式传输与变异实时挖掘系统及方法,传输层读取数据层中的测序数据文件,通过生物数据流式传输算法生成测序数据流,再将测序数据流,发送给计算层;计算层接收来自传输层的实时测序读段数据,根据基于Map Reduce的删除变异实时挖掘算法,实时计算该局部测序区域是否存在删除变异并输出删除变异的左右端点,并将删除变异的左右端点传输给用户层。通过基于Map Reduce的删除变异实时挖掘算法,根据实时接收到的局部测序数据流,即可实时判断局部区域是否存在删除变异,而不需要整个基因组中的上下文信息,实现了对测序数据的解耦,降低了传统测序数据处理算法对计算资源的高要求和高依赖。 【EN】The method comprises the steps that a transmission layer reads a sequencing data file in a data layer, a sequencing data stream is generated through a biological data streaming algorithm, and then the sequencing data stream is sent to a calculation layer; and the calculation layer receives real-time sequencing read data from the transmission layer, calculates whether the local sequencing region has deletion variation or not in real time according to a deletion variation real-time mining algorithm based on Map Reduce, outputs left and right endpoints of the deletion variation, and transmits the left and right endpoints of the deletion variation to the user layer. By adopting the Map Reduce-based deletion variation real-time mining algorithm, whether deletion variation exists in a local region can be judged in real time according to a real-time received local sequencing data stream without context information in the whole genome, so that decoupling of sequencing data is realized, and high requirements and high dependence of a traditional sequencing data processing algorithm on computing resources are reduced.
详细信息 下载全文

申请号:202010081979.5 公开号:CN111261225A 主分类号:G16B20/20
申请人:【中文】西安交通大学【EN】XI'AN JIAOTONG University 申请日:2020.02.06 公开日:2020.06.09
摘要:【中文】一种基于二代测序数据的反转相关复杂变异检测方法,在滑动窗口内,根据给定的bam文件与选定的参考基因组进行比对,得到Read Pair信号,并以Read Pair信号对不能完全匹配的Read进行Split Read信号分析,得到对应的断点匹配情况;建立Split Read信号理论模型;将断点匹配情况经过建立的模型,如果符合某个模型时,记录下相应的变异类型和位置,再判断是否是可信的变异。本发明根据理论信号建立了变异模型信号,因此可以很准确地提出变异类型;本发明使用Split Read信号,以模式增长算法寻找字符串的最大最小唯一子串,所以能够很精确地指出变异的位置信息。 【EN】A reverse correlation complex variation detection method based on second-generation sequencing data is characterized in that in a sliding window, a given bam file is compared with a selected reference genome to obtain a Read Pair signal, and the Read Pair signal is used for performing Split Read signal analysis on Read which cannot be completely matched to obtain a corresponding breakpoint matching condition; establishing a Split Read signal theoretical model; and recording the corresponding mutation type and position if the breakpoint matching condition meets a certain model, and judging whether the mutation is credible. According to the method, the variation model signal is established according to the theoretical signal, so that the variation type can be accurately provided; the invention uses the Split Read signal to search the maximum and minimum unique substrings of the character string by a pattern growth algorithm, so that the position information of the variation can be accurately pointed out.
详细信息 下载全文

4 条,当前第 1-4 条 返回搜索页