一、写在前面原来我们介绍过通过biomart进行同源基因转换但是总是会出一些网络bugbiomart同源基因转换的HTTP 404 Not found解决方案。最近发现了一个发布在预印刊上的新工具——orthogene[1]一款用于简化跨760个物种基因映射的R包。基因的跨物种同源转化远不是一对一的简单对应关系orthogene整合了自动化的物种与基因标识符标准化、跨多数据库的同源基因推断HomoloGene、gProfiler、OrtholoGene、灵活处理模糊同源关系的策略以及将基因列表、表格、高维矩阵转换为可直接分析格式的功能。你也不需要再去学习每一个数据库的标记方式(Ensembl 转录本、Ensembl 基因、HGNC 基因名、Entrez、UniProt 等)。也就是说有了orthogene后我们可以轻松完成多物种的基因名/矩阵转化这样不支持非模式物种的虚拟基因敲除、药物敏感性计算、细胞通讯、转录因子分析等操作就可以一定程度上通过同源基因转化得到解决如果需要单细胞数据分析教学、生信热点全文复现、自测数据个性化分析辅导、实验科研服务和常态化实验学习欢迎联系[Biomamba_zhushou]。二、实战流程Github上的教学也很简单[2]# orthogene需要依赖BiocManager安装if(!requireNamespace(BiocManager, quietlyTRUE))install.packages(BiocManager)# orthogene只能在Bioconductor3.14时获得如果你的Bioconductor版本较老可以更新一下if(BiocManager::version()3.14)BiocManager::install(updateTRUE, askFALSE)# 安装orthogeneif(!requireNamespace(orthogene, quietlyTRUE))BiocManager::install(orthogene)测试数据# 加载包library(orthogene)# 加载内置测试数据data(exp_mouse)# 查看内置测试数据exp_mouse[1:4,1:4]## 4 x 4 sparse Matrix of class dgCMatrix## astrocytes_ependymal endothelial-mural interneurons microglia## Tspan12 0.330357100 0.58723400 0.6413793 0.1428571## Tshz1 0.428571430 0.44680851 1.1551724 0.4387755## Fnbp1l 0.397321400 0.71914890 2.3758621 0.3367347## Adamts15 0.008928571 0.09787234 0.2206897 .# 设置转换数据库为同源转换homologene默认为gprofilermethod-homologene三种方法的支持物种、基因比对情况、更新频率、同源数据库、数据坐标、链接方式、速度信息记录如下orthogene::map_species()## Retrieving all organisms available in homologene.## Returning table with all species.## scientific_name taxonomy_id source id## 1 Mus musculus 10090 homologene mmusculus## 2 Rattus norvegicus 10116 homologene rnorvegicus## 3 Kluyveromyces lactis 28985 homologene klactis## 4 Magnaporthe oryzae 318829 homologene moryzae## 5 Eremothecium gossypii 33169 homologene egossypii## 6 Arabidopsis thaliana 3702 homologene athaliana## 7 Oryza sativa 4530 homologene osativa## 8 Schizosaccharomyces pombe 4896 homologene spombe## 9 Saccharomyces cerevisiae 4932 homologene scerevisiae## 10 Neurospora crassa 5141 homologene ncrassa## 11 Caenorhabditis elegans 6239 homologene celegans## 12 Anopheles gambiae 7165 homologene agambiae## 13 Drosophila melanogaster 7227 homologene dmelanogaster## 14 Danio rerio 7955 homologene drerio## 15 Xenopus (Silurana) tropicalis 8364 homologene xtropicalis## 16 Gallus gallus 9031 homologene ggallus## 17 Macaca mulatta 9544 homologene mmulatta## 18 Pan troglodytes 9598 homologene ptroglodytes## 19 Homo sapiens 9606 homologene hsapiens## 20 Canis lupus familiaris 9615 homologene clfamiliaris## 21 Bos taurus 9913 homologene btaurus## scientific_name_formatted## 1 mus musculus## 2 rattus norvegicus## 3 kluyveromyces lactis## 4 magnaporthe oryzae## 5 eremothecium gossypii## 6 arabidopsis thaliana## 7 oryza sativa## 8 schizosaccharomyces pombe## 9 saccharomyces cerevisiae## 10 neurospora crassa## 11 caenorhabditis elegans## 12 anopheles gambiae## 13 drosophila melanogaster## 14 danio rerio## 15 xenopus tropicalis## 16 gallus gallus## 17 macaca mulatta## 18 pan troglodytes## 19 homo sapiens## 20 canis lupus familiaris## 21 bos taurusorthogene的主函数为convert_orthologs指出处理数据框、表格、tibble、稀疏矩阵、列表、向量等多种格式gene_df- orthogene::convert_orthologs(gene_dfexp_mouse, gene_inputrownames,# 输入的基因名为行名gene_outputrownames,# 输出的基因名也作为行名input_speciesmouse,# 输入数据的物种output_specieshuman,# 输出数据的物种non121_strategydrop_both_species, methodmethod)## Preparing gene_df.## sparseMatrix format detected.## Extracting genes from rownames.## 15,259 genes extracted.## Converting mouse human orthologs using: homologene## Retrieving all organisms available in homologene.## Mapping species name: mouse## Common name mapping found for mouse## 1 organism identified from search: 10090## Retrieving all organisms available in homologene.## Mapping species name: human## Common name mapping found for human## 1 organism identified from search: 9606## Checking for genes without orthologs in human.## Extracting genes from input_gene.## 13,416 genes extracted.## Extracting genes from ortholog_gene.## 13,416 genes extracted.## Checking for genes without 1:1 orthologs.## Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).## Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).## Filtering gene_df with gene_map## Setting ortholog_gene to rownames.#### REPORT SUMMARY ## Total genes dropped after convert_orthologs :## 2,016 / 15,259 (13%)## Total genes remaining after convert_orthologs :## 13,243 / 15,259 (87%)可以看出这个矩阵的行名成功从小鼠基因名被转换为了人类基因名gene_df[1:4,1:4]## 4 x 4 sparse Matrix of class dgCMatrix## astrocytes_ependymal endothelial-mural interneurons microglia## TSPAN12 0.330357100 0.58723400 0.6413793 0.14285710## TSHZ1 0.428571430 0.44680851 1.1551724 0.43877551## ADAMTS15 0.008928571 0.09787234 0.2206897 .## CLDN12 0.223214290 0.11489362 0.5517241 0.05102041需要注意的是non121_strategy这个选项用于选择解决非1 to 1即11对应的同源基因。策略包括填写”drop_both_species”或”dbs”或1时会同时丢弃输入物种和输出物种中重复比对到的基因填写drop_input_species”或”dis”或2时会丢弃输入物种中重复比对到的基因填写”drop_output_species”或”dos”或3时会丢弃输出物种中重复比对到的基因填写”keep_both_species”或”kbs”或4 时会保留两个物种中所有的基因不管它是否重复填写”keep_popular”或”kp”或5时会保留两个物种间最常用的同源比对该方法通常能返回更多的基因但代价是其中许多并非真正的生物学一对一直系同源基因。填写”sum”,“mean”,“median”,“min”,“max”时当输入的数据是矩阵或数据框这类有表达量的对象时会将”多对一”关系的基因做对应的处理例如sum就是将多对一的表达量求和。用起来吧体验感会比biomart好很多三、参考[1]Brian M. Schilder, Alan E. et al. (2026). orthogene: a Bioconductor package to easily map genes within and across hundreds of species. bioRxiv, https://doi.org/10.64898/2026.01.17.700094[2]https://github.com/neurogenomics/orthogene
orthogene:一个包搞定760个物种的基因转化
发布时间:2026/6/12 13:28:10
一、写在前面原来我们介绍过通过biomart进行同源基因转换但是总是会出一些网络bugbiomart同源基因转换的HTTP 404 Not found解决方案。最近发现了一个发布在预印刊上的新工具——orthogene[1]一款用于简化跨760个物种基因映射的R包。基因的跨物种同源转化远不是一对一的简单对应关系orthogene整合了自动化的物种与基因标识符标准化、跨多数据库的同源基因推断HomoloGene、gProfiler、OrtholoGene、灵活处理模糊同源关系的策略以及将基因列表、表格、高维矩阵转换为可直接分析格式的功能。你也不需要再去学习每一个数据库的标记方式(Ensembl 转录本、Ensembl 基因、HGNC 基因名、Entrez、UniProt 等)。也就是说有了orthogene后我们可以轻松完成多物种的基因名/矩阵转化这样不支持非模式物种的虚拟基因敲除、药物敏感性计算、细胞通讯、转录因子分析等操作就可以一定程度上通过同源基因转化得到解决如果需要单细胞数据分析教学、生信热点全文复现、自测数据个性化分析辅导、实验科研服务和常态化实验学习欢迎联系[Biomamba_zhushou]。二、实战流程Github上的教学也很简单[2]# orthogene需要依赖BiocManager安装if(!requireNamespace(BiocManager, quietlyTRUE))install.packages(BiocManager)# orthogene只能在Bioconductor3.14时获得如果你的Bioconductor版本较老可以更新一下if(BiocManager::version()3.14)BiocManager::install(updateTRUE, askFALSE)# 安装orthogeneif(!requireNamespace(orthogene, quietlyTRUE))BiocManager::install(orthogene)测试数据# 加载包library(orthogene)# 加载内置测试数据data(exp_mouse)# 查看内置测试数据exp_mouse[1:4,1:4]## 4 x 4 sparse Matrix of class dgCMatrix## astrocytes_ependymal endothelial-mural interneurons microglia## Tspan12 0.330357100 0.58723400 0.6413793 0.1428571## Tshz1 0.428571430 0.44680851 1.1551724 0.4387755## Fnbp1l 0.397321400 0.71914890 2.3758621 0.3367347## Adamts15 0.008928571 0.09787234 0.2206897 .# 设置转换数据库为同源转换homologene默认为gprofilermethod-homologene三种方法的支持物种、基因比对情况、更新频率、同源数据库、数据坐标、链接方式、速度信息记录如下orthogene::map_species()## Retrieving all organisms available in homologene.## Returning table with all species.## scientific_name taxonomy_id source id## 1 Mus musculus 10090 homologene mmusculus## 2 Rattus norvegicus 10116 homologene rnorvegicus## 3 Kluyveromyces lactis 28985 homologene klactis## 4 Magnaporthe oryzae 318829 homologene moryzae## 5 Eremothecium gossypii 33169 homologene egossypii## 6 Arabidopsis thaliana 3702 homologene athaliana## 7 Oryza sativa 4530 homologene osativa## 8 Schizosaccharomyces pombe 4896 homologene spombe## 9 Saccharomyces cerevisiae 4932 homologene scerevisiae## 10 Neurospora crassa 5141 homologene ncrassa## 11 Caenorhabditis elegans 6239 homologene celegans## 12 Anopheles gambiae 7165 homologene agambiae## 13 Drosophila melanogaster 7227 homologene dmelanogaster## 14 Danio rerio 7955 homologene drerio## 15 Xenopus (Silurana) tropicalis 8364 homologene xtropicalis## 16 Gallus gallus 9031 homologene ggallus## 17 Macaca mulatta 9544 homologene mmulatta## 18 Pan troglodytes 9598 homologene ptroglodytes## 19 Homo sapiens 9606 homologene hsapiens## 20 Canis lupus familiaris 9615 homologene clfamiliaris## 21 Bos taurus 9913 homologene btaurus## scientific_name_formatted## 1 mus musculus## 2 rattus norvegicus## 3 kluyveromyces lactis## 4 magnaporthe oryzae## 5 eremothecium gossypii## 6 arabidopsis thaliana## 7 oryza sativa## 8 schizosaccharomyces pombe## 9 saccharomyces cerevisiae## 10 neurospora crassa## 11 caenorhabditis elegans## 12 anopheles gambiae## 13 drosophila melanogaster## 14 danio rerio## 15 xenopus tropicalis## 16 gallus gallus## 17 macaca mulatta## 18 pan troglodytes## 19 homo sapiens## 20 canis lupus familiaris## 21 bos taurusorthogene的主函数为convert_orthologs指出处理数据框、表格、tibble、稀疏矩阵、列表、向量等多种格式gene_df- orthogene::convert_orthologs(gene_dfexp_mouse, gene_inputrownames,# 输入的基因名为行名gene_outputrownames,# 输出的基因名也作为行名input_speciesmouse,# 输入数据的物种output_specieshuman,# 输出数据的物种non121_strategydrop_both_species, methodmethod)## Preparing gene_df.## sparseMatrix format detected.## Extracting genes from rownames.## 15,259 genes extracted.## Converting mouse human orthologs using: homologene## Retrieving all organisms available in homologene.## Mapping species name: mouse## Common name mapping found for mouse## 1 organism identified from search: 10090## Retrieving all organisms available in homologene.## Mapping species name: human## Common name mapping found for human## 1 organism identified from search: 9606## Checking for genes without orthologs in human.## Extracting genes from input_gene.## 13,416 genes extracted.## Extracting genes from ortholog_gene.## 13,416 genes extracted.## Checking for genes without 1:1 orthologs.## Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).## Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).## Filtering gene_df with gene_map## Setting ortholog_gene to rownames.#### REPORT SUMMARY ## Total genes dropped after convert_orthologs :## 2,016 / 15,259 (13%)## Total genes remaining after convert_orthologs :## 13,243 / 15,259 (87%)可以看出这个矩阵的行名成功从小鼠基因名被转换为了人类基因名gene_df[1:4,1:4]## 4 x 4 sparse Matrix of class dgCMatrix## astrocytes_ependymal endothelial-mural interneurons microglia## TSPAN12 0.330357100 0.58723400 0.6413793 0.14285710## TSHZ1 0.428571430 0.44680851 1.1551724 0.43877551## ADAMTS15 0.008928571 0.09787234 0.2206897 .## CLDN12 0.223214290 0.11489362 0.5517241 0.05102041需要注意的是non121_strategy这个选项用于选择解决非1 to 1即11对应的同源基因。策略包括填写”drop_both_species”或”dbs”或1时会同时丢弃输入物种和输出物种中重复比对到的基因填写drop_input_species”或”dis”或2时会丢弃输入物种中重复比对到的基因填写”drop_output_species”或”dos”或3时会丢弃输出物种中重复比对到的基因填写”keep_both_species”或”kbs”或4 时会保留两个物种中所有的基因不管它是否重复填写”keep_popular”或”kp”或5时会保留两个物种间最常用的同源比对该方法通常能返回更多的基因但代价是其中许多并非真正的生物学一对一直系同源基因。填写”sum”,“mean”,“median”,“min”,“max”时当输入的数据是矩阵或数据框这类有表达量的对象时会将”多对一”关系的基因做对应的处理例如sum就是将多对一的表达量求和。用起来吧体验感会比biomart好很多三、参考[1]Brian M. Schilder, Alan E. et al. (2026). orthogene: a Bioconductor package to easily map genes within and across hundreds of species. bioRxiv, https://doi.org/10.64898/2026.01.17.700094[2]https://github.com/neurogenomics/orthogene