① Hybrid sequencing reveals insight into heat sensing and signaling of bread
期刊：The Plant Journal. DOI: 10.1111/tpj.14299
Wheat (Triticum aestivum L.), a globally important crop, is challenged by increasing temperatures (heat stress, HS); however, its polyploid nature, the incompleteness of its genome sequences and annotation, the lack of comprehensive HS-responsive transcriptomes and the unexplored heat sensing and signaling of wheat hinder our full understanding of its adaptations to HS. The recently released genome sequences of wheat, as well as the emerging single-molecular sequencing technologies, provides an opportunity to thoroughly investigate the molecular mechanisms of the wheat response to HS. We generated a high-resolution spatio-temporal transcriptome map of wheat flag leaves and filling grain under HS at 0 minute (m), 5 m, 10 m, 30 m, 1 hour (h) and 4 h by combining full-length single-molecular sequencing and Illumina short reads sequencing. This hybrid sequencing newly discovered 4,947 loci and 70,285 transcripts, generating the comprehensive and dynamic list of HS-responsive full-length transcripts and complementing the recently released wheat reference genome. Large-scale analysis revealed a global landscape of heat adaptations, uncovering unexpected rapid heat sensing and signaling, significant changes of more than half of HS-responsive genes within 30 m, heat shock factor (HSF)-dependent and -independent heat signaling, and metabolic alterations in early HS-responses. Integrated analysis also demonstrated the differential responses and partitioned functions between organs and subgenomes, and suggested a differential pattern of transcriptional and alternative splicing regulation in the HS response. This study provided comprehensive data for dissecting molecular mechanisms of early HS-responses in wheat and highlighted the genomic plasticity and evolutionary divergence of polyploidy wheat.
②Comprehensive identification of the full-length transcripts and alternative splicing related to the secondary metabolism pathways in the tea plant (Camellia sinensis)
期刊：Sci Rep. DOI: 10.1038/s41598-019-39286-z
Flavonoids, theanine and caffeine are the main secondary metabolites of the tea plant (Camellia sinensis), which account for the tea's unique flavor quality and health benefits. The biosynthesis pathways of these metabolites have been extensively studied at the transcriptional level, but the regulatory mechanisms are still unclear. In this study, to explore the transcriptome diversity and complexity of tea plant, PacBio Iso-Seq and RNA-seq analysis were combined to obtain full-length transcripts and to profile the changes in gene expression during the leaf development. A total of 1,388,066 reads of insert (ROI) were generated with an average length of 1,762 bp, and more than 54% (755,716) of the ROIs were full-length non-chimeric (FLNC) reads. The Benchmarking Universal Single-Copy Orthologue (BUSCO) completeness was 92.7%. A total of 93,883 non-redundant transcripts were obtained, and 87,395 (93.1%) were new alternatively spliced isoforms. Meanwhile, 7,650 differential expression transcripts (DETs) were identified. A total of 28,980 alternative splicing (AS) events were predicted, including 1,297 differential AS (DAS) events. The transcript isoforms of the key genes involved in the flavonoid, theanine and caffeine biosynthesis pathways were characterized. Additionally, 5,777 fusion transcripts and 9,052 long non-coding RNAs (lncRNAs) were also predicted. Our results revealed that AS potentially plays a crucial role in the regulation of the secondary metabolism of the tea plant. These findings enhanced our understanding of the complexity of the secondary metabolic regulation of tea plants and provided a basis for the subsequent exploration of the regulatory mechanisms of flavonoid, theanine and caffeine biosynthesis in tea plants.
③Study of the whole genome, methylome and transcriptome of Cordyceps militaris
期刊：Sci Rep. DOI: 10.1038/s41598-018-38021-4
The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57 Mb, and 14 contigs ranging from 0.35 to 4.58 Mb with an N50 of 2.86 Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5' or 3' end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation motif is GC or CG/GC. In the C. militaris genome DNA, there were four types of methylated nucleotides that we confirmed using high-resolution LCMS-IT-TOF. Using PacBio Iso-Seq, a total of 31,133 complete cDNA sequences were obtained in the fruiting body. The conserved domains of the nontranscribed regions of the genome include TATA boxes, which are the initial regions of genome replication. There were 406 structural variants between the HN and CM01 strains, and there were 1,114 structural variants between the HN and ATCC strains.
④The interplay between microRNA and alternative splicing of linear and circular RNAs in eleven plant species
Bioinformatics. DOI: 10.1093/bioinformatics/btz038
MicroRNA (miRNA) and alternative splicing (AS)-mediated post-transcriptional regulation has been extensively studied in most eukaryotes. However, the interplay between AS and miRNAs has not been explored in plants. To our knowledge, the overall profile of miRNA target sites in circular RNAs (circRNA) generated by alternative back splicing has never been reported previously. To address the challenge, we identified miRNA target sites located in alternatively spliced regions of the linear and circular splice isoforms using the up-to-date single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) and Illumina sequencing data in eleven plant species.
⑤Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development
期刊：G3 (Bethesda). DOI: 10.1534/g3.118.201008
Transcriptomics has been widely applied to study grape berry development. With few exceptions, transcriptomic studies in grape are performed using the available genome sequence, PN40024, as reference. However, differences in gene content among grape accessions, which contribute to phenotypic differences among cultivars, suggest that a single reference genome does not represent the species' entire gene space. Though whole genome assembly and annotation can reveal the relatively unique or "private" gene space of any particular cultivar, transcriptome reconstruction is a more rapid, less costly, and less computationally intensive strategy to accomplish the same goal. In this study, we used single molecule-real time sequencing (SMRT) to sequence full-length cDNA (Iso-Seq) and reconstruct the transcriptome of Cabernet Sauvignon berries during berry ripening. In addition, short reads from ripening berries were used to error-correct low-expression isoforms and to profile isoform expression. By comparing the annotated gene space of Cabernet Sauvignon to other grape cultivars, we demonstrate that the transcriptome reference built with Iso-Seq data represents most of the expressed genes in the grape berries and includes 1,501 cultivar-specific genes. Iso-Seq produced transcriptome profiles similar to those obtained after mapping on a complete genome reference. Together, these results justify the application of Iso-Seq to identify cultivar-specific genes and build a comprehensive reference for transcriptional profiling that circumvents the necessity of a genome reference with its associated costs and computational weight.
⑥The genome of the soybean cyst nematode (Heterodera glycines) reveals complex patterns of duplications involved in the evolution of parasitism genes
期刊：BMC Genomics. DOI: 10.1186/s12864-019-5485-8
Heterodera glycines, commonly referred to as the soybean cyst nematode (SCN), is an obligatory and sedentary plant parasite that causes over a billion-dollar yield loss to soybean production annually. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. The parasitic success of H. glycines relies on the comprehensive re-engineering of an infection site into a syncytium, as well as the long-term suppression of host defense to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms of effector acquisition, diversification, and selection need to be understood before effective control strategies can be developed, but the lack of an annotated genome has been a major roadblock. This advance provides a glimpse into the host and parasite interplay by revealing a diversity of mechanisms that give rise to virulence genes in the soybean cyst nematode, including: tandem duplications containing over a fifth of the total gene count, virulence genes hitchhiking in transposons, and 107 horizontal gene transfers not reported in other plant parasitic nematodes thus far. Through extensive characterization of the H. glycines genome, we provide new insights into H. glycines biology and shed light onto the mystery underlying complex host-parasite interactions. This genome sequence is an important prerequisite to enable work towards generating new resistance or control measures against H. glycines.
① Full-length transcript sequencing and comparative transcriptomic analysis to evaluate the contribution of osmotic and ionic stress components towards salinity tolerance in the roots of cultivated alfalfa (Medicago sativa L.)
期刊：BMC Plant Biology. DOI: 10.1186/s12870-019-1630-4
Alfalfa is the most extensively cultivated forage legume. Salinity is a major environmental factor that impacts on alfalfa's productivity. However, little is known about the molecular mechanisms underlying alfalfa responses to salinity, especially the relative contribution of the two important components of osmotic and ionic stress.In this study, we constructed the first full-length transcriptome database for alfalfa root tips under continuous NaCl and mannitol treatments for 1, 3, 6, 12, and 24 h (three biological replicates for each time points, including the control group) via PacBio Iso-Seq. This resulted in the identification of 52,787 full-length transcripts, with an average length of 2551 bp. Global transcriptional changes in the same 33 stressed samples were then analyzed via BGISEQ-500 RNA-Seq. Totals of 8861 NaCl-regulated and 8016 mannitol-regulated differentially expressed genes (DEGs) were identified. Metabolic analyses revealed that these DEGs overlapped or diverged in the cascades of molecular networks involved in signal perception, signal transduction, transcriptional regulation, and antioxidative defense. Notably, several well characterized signalling pathways, such as CDPK, MAPK, CIPK, and PYL-PP2C-SnRK2, were shown to be involved in osmotic stress, while the SOS core pathway was activated by ionic stress. Moreover, the physiological shifts of catalase and peroxidase activity, glutathione and proline content were in accordance with dynamic transcript profiles of the relevant genes, indicating that antioxidative defense system plays critical roles in response to salinity stress.
②Full-length transcriptome analysis of Litopenaeus vannamei reveals transcript variants involved in the innate immune system
Fish Shellfish Immunol. DOI: 10.1016/j.fsi.2019.01.023
To better understand the immune system of shrimp, this study combined PacBio isoform sequencing (Iso-Seq) and Illumina paired-end short reads sequencing methods to discover full-length immune-related molecules of the Pacific white shrimp, Litopenaeus vannamei. A total of 72,648 nonredundant full-length transcripts (unigenes) were generated with an average length of 2545 bp from five main tissues, including the hepatopancreas, cardiac stomach, heart, muscle, and pyloric stomach. These unigenes exhibited a high annotation rate (62,164, 85.57%) when compared against NR, NT, Swiss-Prot, Pfam, GO, KEGG and COG databases. A total of 7544 putative long noncoding RNAs (lncRNAs) were detected and 1164 nonredundant full-length transcripts (449 UniTransModels) participated in the alternative splicing (AS) events. Importantly, a total of 5279 nonredundant full-length unigenes were successfully identified, which were involved in the innate immune system, including 9 immune-related processes, 19 immune-related pathways and 10 other immune-related systems. We also found wide transcript variants, which increased the number and function complexity of immune molecules; for example, toll-like receptors (TLRs) and interferon regulatory factors (IRFs). The 480 differentially expressed genes (DEGs) were significantly higher or tissue-specific expression patterns in the hepatopancreas compared with that in other four tested tissues (FDR <0.05). Furthermore, the expression levels of six selected immune-related DEGs and putative IRFs were validated using real-time PCR technology, substantiating the reliability of the PacBio Iso-seq results. In conclusion, our results provide new genetic resources of long-read full-length transcripts data and information for identifying immune-related genes, which are an invaluable transcriptomic resource as genomic reference, especially for further exploration of the innate immune and defense mechanisms of shrimp.
 Dong L, Liu H, Zhang J, et al. Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research[J]. Bmc Genomics, 2015, 16(1):1-13.