Epigenetic Transfiguration of H3K4me2 to H3K4me3 During Differentiation of Embryonic Stem Cell into Non-embryonic Cells
Smarajit Das1, *, Pijush Das2, Sanga Mitra3, Medhanjali Dasgupta4, Jayprokas Chakrabarti3, 5,
1Department of Genetics, University of Georgia, Athens GA, USA
2Cancer Biology & Inflammatory Disorder Division, Indian Institute of Chemical Biology, Kolkata, India
3Computational Biology Group, Indian Association for the Cultivation of Science, Kolkata, India
4Department of Chemical Engineering (Bioprocess Engineering), Jadavpur University, Kolkata, India
5Gyanxet, Salt Lake, Kolkata,India
6Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
To cite this article:
Smarajit Das, Pijush Das, Sanga Mitra, Medhanjali Dasgupta, Jayprokas Chakrabarti, Eric Larsson. Epigenetic Transfiguration of H3K4me2 to H3K4me3 During Differentiation of Embryonic Stem Cell into Non-embryonic Cells. Biomedical Sciences. Vol. 1, No. 3, 2015, pp. 18-33.doi: 10.11648/j.bs.20150103.11
Abstract: Chromatin immune precipitation followed by high-throughput sequencing (Chip-Seq), investigate the genome-wide distribution of all histone modifications. Lysine residues within histones di or tri-methylated in Saccharomyces cerevisiae have been studied earlier. Tri-methylation of Lys 4 of histone H3K4me3 correlates with transcriptional activity, but little is known about this methylation state in human. It was also previously proved that deposition of H3K4me2 modification at TSS is associated with gene repression in the yeast cell. Overlapping non-coding RNA (ncRNA) transcript assumes a crucial role in this repression. Here, we examine the H3K4me2 and H3K4me3 methylation dynamics at the TSS region of human genes across the ENCODE (https://www. encode project. org/) Consortium 8 cell lines GM12878, H1-hESC, HeLa-S3, HepG2, HSMM, HUVEC, K562 and NHEK, we identified clear divergence of histone modification profiles in H1-hESC with respect to others. While, H3K4me2 modifications were found to be associated with the vast majority of genes in the H1-hESC with a significantly decreased amount in other differentiated cell lines, H3K4me3 modification showed completely reverse trends. By the process of differentiation, a distinct set of genes lose H3K4me2 in H1-hESCand gain H3K4me3 in differentiated cell, thereby, enhancing the expression level of the corresponding genes. On the level of gene ontology molecular function classification, these genes are mostly associated with protein binding, nucleotide binding, DNA binding and ATP binding. Other than that, signaling and receptor activity, metal ion binding and phosphorylation-dephosphorylating action can be correlated with these genes. We expect a crosstalk between the change of methylation status and gene functionality, as all these functions can be allied to transcriptional regulation and gene activation, which once again is linked to H3K4me3 mark.
Keywords: Epigenetic, H3K4me2, H3K4me3, RNA-Seq, Chip-Seq, UCSC, Methylation Dynamics
A nucleosome consists of two copies of four core histones, namely H2A, H2B, H3, and H4, wrapped by 147bp of DNA . The N-terminal tails of these histones are processed with different types of post-translational modifications such as acetylation, methylation, phosphorylation, ubiquitination, glycosylation, and sumoylation . These modifications correlate with the transcriptional efficiency of the gene ie., expression or repression. The dynamic integration and dissociation of these modifications have been known to change the chromatin structure that provides binding sites for proteins and thereby regulate cellular processes such as transcription, repair, replication, and genome stability [3-6].
Genome-wide approaches of profiling histone modifications, initialized by tiling array analysis (chip-chip) and later followed by new generation sequencing technique chromatin immune-precipitation followed by high-throughput sequencing (Chip-Seq), have revealed the characteristic genomic distribution and the association of gene functions and activities in various model organisms [7-10]. It has emerged from the analyses that there are six classes of histone H3 modifications that are subjected to epigenome profiling by the International Human Epigenome Consortium (http://ihec-epigenomes. org/). In general, the particular histone methylation H3K4me3 and many other histone acetylations usually enrich the transcription start site (TSS) and positively correlate with genes expression . In fact, active enhancers can be identified by the enrichment of both H3K4me1 and H3K27ac modifications. However, there are a lot of silent gene promoters with active markers, those can be found in Embryonic Stem cells (H1-hESC or simply ESC) and T-cells and active transcription can be addressed by an additional modification, H3K36me3, over transcribed gene body [12-14]. Gene repression can be mediated through two distinct mechanisms that involve tri-methylated H3K9 (H3K9me3) and tri-methylated H3K27 (H3K27me3). Interestingly H3K4me2 plays multiple roles; sometimes it is associated with activation, sometimes with repression and sometimes a combination of both [15-17]. Moreover, H3K4me2 marking precedes and persists transcription.
The availability of numerous histone modification data from different cell lines of the human genome facilitates the discovery of functionally significant sequence stretches via comparative genomics approach. Besides evolutionary conserved sequences, many novel elements can be found by examining chromatin accessibility and histone modification or DNA methylation patterns [18–22]. A representative international project aiming to find all the functional elements in the human genome, called the Encyclopedia of DNA Elements (ENCODE) pilot project, has examined human genomic sequences using a number of existing techniques . Many functional elements examined by the ENCODE project are likely unconstrained across mammalian evolution and comprise a large reservoir of functionally conserved but non-orthologous elements between species as well as lineage-specific elements. The histone modification mapping could provide highly informative signatures to the estimation of presence and activity of gene promoters and distal regulatory sites .
The main objective of our Chip-Seq analysis project is to examine histone modification patterns and their dynamics as Embryonic stem cell differentiates into other normal and cancer cell lines and their correlation with gene expression level at these specific cell lines. It was previously proved that H3K4me2 modification is associated with gene repression in the yeast cell. Overlapping non-coding RNA transcript assumes a crucial role in this repression . Ouraim is to study the Chip-Seq data to investigate the genome-wide distribution of di and tri-methylation of H3K4 (H3K4me2 and H3K4me3) in different normal, embryonic and cancer cell lines in human. By analyzing published Chip-Seq data from UCSC genome browser (http://genome. ucsc. edu/cgi-bin/hg Gateway) this study validated that there is a clean divergence of H3K4me2 distribution in embryonic verses differentiated cell lines.By the process of differentiation, a distinct set of genes lose H3K4me2 in H1-hESCand gain H3K4me3 in differentiated cells, thereby, enhancing the expression level of the corresponding genes. To define this change of histone mark along with expression change from stem cell line to differentiated cell lines, we use the termtransfiguration. The histone marks that appear mainly in generic regions were studied around the transcription start sites (TSSs) of the genes. Thus, this analysis demonstrates that H3K4me2depositions around TSS (s) are associated with gene repression in human H1-hESC.
2.1. The Epigenetic Landscape of H3K4me2 and H3K4me3 Modifications
Epigenetic mechanism is emerging as one of the major factors of the dynamics of gene expression in different human cells. To elucidate the role of chromatin remodeling in transcriptional regulation associated with gene expression, we mapped the spatial pattern of chromosomal association with histone H3 modifications using Chip-Seq. Here, we concentrated on the epigenetic map of two histone modifications, namely, H3K4me2 and H3K4me3 of the protein coding genes for 8 different cell lines,namely, H1-hESC, GM12878, HeLa, HepG2, HUVEC, HSMM, NHEK and K562 [Fig. 1].
While, H3K4me2 modifications were found to be associated with the vast majority of genes in the H1-hESC with a significantly decreased amount in other differentiated cell lines, H3K4me3 modification showed completely reversetrends. Analyzing a 2*2 Fisher’s Exact Test revealed, the association between groups (H1-hESC and other cell lines) and outcomes (me2 exclusive and me3 exclusive) is considered to be very statistically significant. The two-tailed P values are less than 0.01 for all groups with one exception between H1-hESC and HepG2.Thisdata display the dynamic distribution, which is the focus of our study [Table 1]. More importantly, H3k4me2 and H3K4me3 modifications, both displaying tight correlations with transcript levels, show differential affinity to distinct genomic regions while occupying predominantly the transcription start site (TSS). These promoter occupancies of H3K4me2 at different loci indicate the repression of specific DNA elements in H1-hESC, which is ultimately nullified by the loss of H3K4me2 in differentiated cell lines. The repressed genes became hyperactive with the introduction of H3K4me3 exclusively instead of H3k4me2.In addition, we brought to light the effect of the presence of multivalent domains, focusing on the importance of combinatorial effects on transcription. H3K4me2 and H3K4me3 mixed TSS have an intermediate effect of H3k4me2and H3K4me3. Overall, our work portrays a substantial association between the chromosomal locations of these two epigenetic markers, transcriptional activity and cell type specific transitions in the epigenome.
Here Exclusive H3K4me2 [me2 (EX)] and H3K4me3 [me3 (EX)] methylations in the 8 cell lines chosen for this study have been shown. It has been observed that out of 19999 protein-coding genes there are 867 genes, which are exclusively me2 modified in the ESC line. Only 49 of the protein coding genes are exclusively me3 modified and the rest 14066 genes have both me2 and me3 modification at the TSS within the range of +/- 2Kb. Nearly 5,000 genes have no me2 or me3 modification. These studies have been similarly performed for other cell lines (GM, HeLa, HuVec, HepG2, HSMM, K562 and NHEK) and the corresponding Venn diagrams have been represented in this Fig1.
|K 562||782||128||12293||6. 10|
|HeLa S3||750||300||12001||2. 5000|
2.2. Methylation Dynamics from Embryonic to Differentiated Cell
We studied gene methylation profiles using Chip-Seq to identify pair wise histone dynamics [Fig. 2]. E.g., 867 genes expressed in H1-hESC were exclusively H3K4me2 modified, where as 339 genes were found in GM cell having H3K4me3 exclusively. This analysis indicates that 30 genes lose their H3K4me2 modification and gain H3K4me3 modification exclusively as the ESC differentiates into GM.
A total of 30 genes with H3K4 me2methylations in the ESC line have been converted into H3K4me3 methylation when theH1-hESCdifferentiates intoaGM12878cell line.
Out of the 30 genes, one of them, namely GH2, has been represented in [Fig. 3]. The UCSC Genome image of GH2 distinctly shows the presence of H3K4me2 modification in the +/- 2 KB region of the TSS in the H1-hESC while no such peak or signal is observed in case of H3K4me3 modification. Again, for the GM cell line, a clear presence of H3K4me3 methylation is observed in the +/- 2 KB region of the TSS while no such peak or signal is noted in case of H3K4me2 methylation. This clearly shows the loss of H3K4me2 methylation in GH2 in the embryonic stem cell line and incorporation of H3K4me3 as it differentiates into the GM cell line. We recorded this differentiated dynamics and calculated the epigenetic dissociation of H3K4me2 for the other 6 pair of cell lines. [Table S1 and Fig. S1].
The UCSC Genome image of GH2 shows a clear peak and signal of H3K4me2 methylation in the +/-2 KB region of the TSS in the embryonic cell line. As the ES cell differentiates into GM cell line, a clear peak and signal of H3K4me3 methylation is noticed in the +/- 2KB region of the TSS while no such peak or signal of H3K4me2 methylation is noticed in the GM cell line.
After identifying the dynamic genes, i.e. those genes that have H3K4me2 in only H1-hESC, we intersected the consequence of these genes when they dissolve the H3K4me2 and incorporateH3K4me3 in other differentiated normal/ cancer cell type.
Analysis of whether these genes are repressed or highly expressed in differentiated cell was done by pair-wise comparisons. We calculated gene expression profiles using RNA-Seq to identify pair wise differential expression. In addition, we showed that the methylation of both modifications in common domains have important combinatorial effects on transcription. Here, using CDF Plots, we have showed that H3K4me2 + H3K4me3 mixed TSS has an intermediate effect of H3k4me2 and H3K4me3 [Fig. 4 and Fig. S2]. Indeed, their overall expression was significantly higher in non-embryonic cell type [Fig. S3]. The gene lists per cell line where mixed me2-me3 modifications were converted into exclusive me3have been tabulated [Table S2]. CDF plot analysis depicts four conclusive results in human cell lines. Presence of me2 modification, especially in H1-hESC, makes the characteristics of most gene sets repressive. On the other hand, switch over of me2 modification to me3 increases the expression of the sets of genes. Conversion of me2modification to me3 modification up regulates the gene expression [Blue line in CDF plot] to a relatively greater extent compared to me2+3 transfigurations in to me3 modifications [Green line in CDF plot].
A total of 143 genes with me2+3 methylations in the H1-hESC cell line have been converted into me3 methylation when the H1-hESC differentiates into a GM 12878 cell line. The expression values of the exclusively me2 modified as well as both me2 and me3 modified genes that were identified to have lost their me2 modification and gained me3 modification during the conversion of ESC line into the GM cell was noted. The log 10 values of ratios of expression values of genes with exclusive me3 methylation in the GM cell line to that of the corresponding genes in the ESC line with both me2 and me3 methylations was generated. This ratio of FPKM values was generated for all the 30 genes that lost its me2 and gained me3 during differentiation of the H1-hESC to GM cell line. Both the ratios of FPKM values were plotted together, generating a CDF plot.
|CELL LINES||UCSCGENE NAME||KEGG PATHWAY ASSOSIATED WITH GENE|
|Protein Binding||ATP Binding||GENE NAME||PATHWAYS|
|GM12878||ZRANB1||RFC5||F8||Complement& coagulation cascades|
|HELA||DLG5||SRXN1 GCK||FGF13||MAPKSignaling Pathway|
|DNMT3L||Regulation of actincytoskeleton|
|Starch & Sucrose Metabolism|
|Insulin Signalling Pathway|
|Type 2 Diabetes Mellitus|
|Maturity onset diabetes of young|
|HUVEC||GSN||GSN||Regulation of actin cytoskeleton|
|HSMM||MYBPH||RFC5||PVPL1||Cell Adhesion molecules|
Gene Ontology revealed that Protein binding and ATP binding genes are mostly responsible for me2 to me3 modification. Genes for which pathways can be detected from KEGG are also represented.
2.3. Gene Ontology Analysis
To determine the function of the genes, which under goes transfiguration, we performed gene ontology analysis. We also retrieved the related pathways from KEGG database. The genes and their related function and pathways with respect to all the seven cell lines are noted in Table 2. On the level of gene ontology molecular function classification, these genes are mostly associated with protein binding, nucleotide binding, DNA binding and ATP binding. Other than that, signaling and receptor activity, metal ion binding and phosphorylation-dephosphorylating action can be correlated with these genes. These genes can be either cellular (nucleus/cytoplasm) or part of extracellular environment. GCK (Glucokinase) gene is found to be the most common gene shared by almost all the cell lines.
In this work, we investigated the dynamic relationship of histone modifications and genomic sequence contexts to DNA methylation patterns in 8 cell lines based on the marks at TSS. Although previous studies have found that histone modifications were correlated with DNA methylation, our work provides a genome-wide insight into their genomic region-specific and cell type-specific relationships. Recently, many whole genome DNA methylation profiles have been produced by the ENCODE project has provided a wealth of histone modification profiles by ChIP-Seq. Compared with the relationship between H1-hESC and others demonstrate that DNA methylation landscapes of these two cell types change dramatically. Venn Diagrams generated to identify the exclusively H3K4me2 and H3K4me3 modified genes in all the above-mentioned cell lines concluded that H1-hESC have more exclusively H3K4me2 modified genes as compared to most of the non embryonic cells [Fig. 1, Table1]. Further, it has also been identified that most of the exclusively H3K4me2 modified as well as both H3K4me2+3 modified genes in the H1-hESC lose their me2 methylation and gain me3 methylation in non embryonic cells during stem cells differentiation into non embryonic stem cells [Fig. 2, Table S1]. However, the reverse process, that is, switch over of me3 marked tome2 or both me2+3 during the differentiation process was seen to be negligible [Fig. 5]. [Table S3]. It has been further identified that this loss of me2 methylation and the gain of me3 methylation during differentiation ultimately leads to an up regulation of the genes in the differentiated cell lines. We show here the prominent change of histone modification status, particularly Lysine 4 methylation, when there is a transition from embryonic stem cell to seven different cell lines. We observe here that when H1-hESC diverges to normal and cancer cell lines, the number of Lysine 4 tri-methylated genesincreases compared to Lysine 4 di-methylated genes, along with the overall expression level of genes. It is expected that with transition from embryonic state to differentiated state, more genes start functioning and thus need to be active. As a result it is justified that in differentiated cell lines there will be a predominance of H3K4me3 , as it is well known that H3K4me3 is associated with only active state of genes whereas H3K4me2 marks both active and inactive genes . On the other hand, genes responsible to maintain stemness of cell are expected to show prevalence of H3K4me3 in H1-hESC in comparison to other cell lines. To confirm this, we checked the expression status and modification pattern of SOX2 and NANAOG. We observed that SOX2 and NANOG are highly expressed in H1-hESC as compared to rest of the seven cell lines. Moreover, Lysine 4 of both these genes are tri-methylated abundantly in H1-hESC but not in others. Strikingly, for NANOG, repressive marker H3K27me3 is present in all cell lines except H1-hESC. This scenario reinforces that H3K4me3 is required for gene activation [28,29]. Along with the association of Lysine 4 methylation (di- and tri-) state with gene expression status, we also correlate our observation with the gene ontology classification at individual gene level . In differentiated cell lines, though mostly H3K4me2 persists along with H3K4me3, there are genes, which completely lose the di-methylation status and become tri-methylated when the cell line identity changes from H1-hESC to other cell lines. Although the phenomenon experienced by these seven cell lines are same, but the gene sets involved are mostly non-overlapping. By applying this method to find the contra-variation relevant genes between embryonic, normal and cancer cell types may help to obtain potential cancer-related marks. Therefore, it is essential to identify the contra-variations of paired cell types to gain new understanding of biological processes from the large amounts of data that is now publicly available. We expect a crosstalk between the change of methylation status and gene functionality, as all these functions can be allied to transcriptional regulation and gene activation, which once again is linked to H3K4me3 mark.
5a: Venn diagram showing no exclusively H3K4me3 methylated ESC genes converted to exclusive H3K4me2 methylation during the differentiation of ESC into GM cell line.
5b: Venn diagram showing only 8 exclusively H3K4me3 methylated ESC genes converted to H3K4me2+3 methylation during the differentiation of H1-hiESC into GM cell line.
In accordance with the assumption made earlier in this study, that is, if a gene has at least one protein coding transcript then it is considered to be always protein coding, ENCODE transcript hg19 revealed 19,999 protein coding genes and 10,419 non protein coding genes in the human genome. For the purpose of this study, only the protein coding genes have been taken into consideration. The lengths of these 19,999 identified protein coding genes have been determined by the second assumption made in this study, that is, if a protein coding gene has multiple transcripts then the gene length is considered by connecting extreme 5´ and extreme 3´ coordinate among those transcripts.
UCSC provides Chip-Seq data of all the H3K4me2 and H3K4me3 present in 8 different cell lines of the human genome hg19. Computationally we selected those H3K4me2 and H3K4me3 markers that fall within the +/- 2KB region of the TSS of the protein coding genes. The genes with H3K4me2 modifications in their TSS have been considered "me2" modified, while those with H3K4me3 modifications in their TSS have been termed "me3" modified and finally, those with both me2 and me3 modifications in their TSS have been termed me2+3 modified. Such genes have been identified for the entire 8 cell lines used in this study, namely, H1-hESC, GM12878, HELA, HepG2, HUVEC, HSMM, NHEK, K562.
Quantification of mRNA expression or reads in each cell lines were PolyA-trimmed. Mapping of reads to the human genome (hg19) was performed with TOPHAT (https://ccb. jhu. edu/software/tophat/index. shtml). The mapping coordinates of each read were overlapped with the refseq annotation track from the UCSC table browser (http://genome. ucsc. edu/cgibin/hgTables?command=start) to quantify mature mRNA expression. Normalization and test for differential expression was performed in cufflinks (http://cole-trapnell-lab. github. io/cufflinks/), cuffmerge and cuffdiff the statistical programming language R (www. rproject. org. GO-enrichment analysis was performed using the GO-enrichment toolkit from http://genxpro. ath. cx. The divergence of CDF plot was based on the KS test among those dynamic genes with a P value less than 0.05.