Skip to content

Commit 76afdaf

Browse files
authored
Merge pull request #410 from deeptools/develop
Develop
2 parents 67c2369 + cad3ad4 commit 76afdaf

33 files changed

+152102
-173
lines changed

bin/hicCompartmentsPolarization

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
4+
from hicexplorer.hicCompartmentsPolarization import main
5+
6+
if __name__ == "__main__":
7+
main()

bin/hicValidateLocations

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
4+
from hicexplorer.hicValidateLocations import main
5+
6+
if __name__ == "__main__":
7+
main()
8+

docs/content/News.rst

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,37 @@
11
News and Developments
22
=====================
33

4+
Release 3.1
5+
-----------
6+
**9 July 2019**
7+
8+
- KR correction improvements: It is now able to process larger data sets like GM12878 primary+replicate on 10kb resolution.
9+
- Adding script for validation of loop locations with protein peak locations
10+
- Adding script hicCompartmentsPolarization: Rearrange the average interaction frequencies using the first PC values to represent the global compartmentalisation signal
11+
12+
13+
Release 3.0.2
14+
-------------
15+
**28 June 2019**
16+
17+
- Pinning dependencies to:
18+
19+
- hicmatrix version 9: API changes in version 10
20+
- krbalancing version 0.0.4: API changes in version 0.0.5
21+
- matplotlib version 3.0: Version 3.1 raises 'Not implemented error' for unknown reasons.
22+
23+
- Set fit_nbinom to version 1.1: Version 1.0 Had deprecated function call of scipy > 1.2.
24+
- Small documentation fixes and improvements.
25+
26+
27+
Release 3.0.1
28+
-------------
29+
**5 April 2019**
30+
31+
- Fixes KR balancing correction factors
32+
- Deactivates log.debug
33+
34+
435
Release 3.0
536
-----------
637
**3 April 2019**
@@ -13,14 +44,14 @@ Release 3.0
1344

1445

1546
Release 2.2.3
16-
---------------------
47+
-------------
1748
**22 March 2019**
1849

1950
- This bug fix release patches an issue with cooler files, hicBuildMatrix and the usage of a restriction sequence file instead of fixed bin size.
2051

2152

2253
Release 2.2.2
23-
---------------------
54+
--------------
2455
**27 February 2019**
2556

2657
- This bug fix release removes reference to hicExport that were forgotten to delete in 2.2. Thanks @BioGeek for this contribution.

docs/content/example_usage.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ diagnostic plot as follows:
111111

112112
.. code-block:: bash
113113
114-
$ hicCorrectMatrix diagnostic_plot -m hic_matrix.h5 -o hic_corrected.h5
114+
$ hicCorrectMatrix diagnostic_plot -m hic_matrix.h5 -o hic_corrected.png
115115
116116
117117
The plot should look like this:
@@ -235,7 +235,7 @@ The A / B compartments can be plotted with :ref:`hicPlotMatrix`.
235235
236236
$ hicPlotMatrix -m pearson_all.h5 --outFileName pca1.png --perChr --bigwig pca1.bw
237237
238-
//.. figure:: ../images/eigenvector1_lieberman.png
239-
// :scale: 90 %
240-
// :align: center
238+
.. figure:: ../images/eigenvector1_lieberman.png
239+
:scale: 60 %
240+
:align: center
241241

docs/content/installation.rst

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,26 @@ Requirements
88
-------------
99

1010
* Python 3.6
11-
* numpy >= 1.15
12-
* scipy >= 1.1
13-
* matplotlib >= 3.0
14-
* pysam >= 0.14
15-
* intervaltree >= 2.1
16-
* biopython >= 1.72
17-
* pytables >= 3.4
11+
* numpy >= 1.16
12+
* scipy >= 1.2
13+
* matplotlib == 3.0
14+
* pysam >= 0.15
15+
* intervaltree >= 3.0
16+
* biopython >= 1.73
17+
* pytables >= 3.5
1818
* pyBigWig >= 0.3
1919
* future >= 0.17
20-
* six >= 1.11
20+
* six >= 1.12
2121
* jinja2 >= 2.10
22-
* pandas >= 0.23
23-
* unidecode >= 1.0
24-
* hicmatrix = 9
25-
* pygenometracks >= 2.1
26-
* psutil >= 5.4.8
27-
* hic2cool >= 0.5
28-
* cooler >= 0.8.3
29-
* krbalancing >= 0.0.3 (Needs the library eigen; openmp is recommended for linux users. No openmp support on macOS.)
30-
* fit_nbinom >= 1.0
22+
* pandas >= 0.24
23+
* unidecode >= 1.1
24+
* hicmatrix = 10
25+
* pygenometracks >= 3.0
26+
* psutil >= 5.6
27+
* hic2cool >= 0.7
28+
* cooler >= 0.8.5
29+
* krbalancing >= 0.0.5 (Needs the library eigen; openmp is recommended for linux users. No openmp support on macOS.)
30+
* fit_nbinom >= 1.1
3131

3232

3333
**Warning:** Python 2.7 support is discontinued. Moreover, the support for pip is discontinued too.

docs/content/list-of-tools.rst

Lines changed: 65 additions & 55 deletions
Large diffs are not rendered by default.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. _hicCompartmentsPolarization:
2+
3+
hicCompartmentsPolarization
4+
============================
5+
6+
.. argparse::
7+
:ref: hicexplorer.hicCompartmentsPolarization.parse_arguments
8+
:prog: hicCompartmentsPolarization

docs/content/tools/hicCorrectMatrix.rst

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,21 @@ The iterative correction can be used via:
2424

2525
.. code:: bash
2626
27-
$ hicCorrectMatrix correct --matrix matrix.cool --correctionMethod ICE --chromosomes chrUextra chr3LHet --iterNum 500 --outFileName corrected_ICE.cool --filterThreshold -1.5 5.0
27+
$ hicCorrectMatrix correct --matrix matrix.cool --correctionMethod ICE --chromosomes chrUextra chr3LHet --iterNum 500 --outFileName corrected_ICE.cool --filterThreshold -1.5 5.0
28+
29+
30+
HiCExplorer version 3.1 changes the way data is transfered from Python to C++ for the KR correction algorithm. With these changes
31+
the following runtime and peak memory usage on Rao 2014 GM12878 primary + replicate data is possible:
32+
33+
- KR on 25kb: 165 GB, 1:08 h
34+
- ICE on 25kb: 224 GB, 3:10 h
35+
- KR on 10kb: 228 GB, 1:42 h
36+
- ICE on 10kb: 323 GB, 4:51 h
37+
38+
- KR on 1kb: 454 GB, 16:50 h
39+
- ICE on 1kb: >600 GB, > 2.5 d (we interrupted the computation and strongly recommend to use KR on this resolution)
40+
41+
For HiCExplorer versions <= 3.0 KR performs as follows:
42+
43+
- KR on 25kb: 159 GB, 57:11 min
44+
- KR on 10kb: >980 GB, -- (out of memory on 1TB node, we do not have access to a node with more memory on our cluster)

docs/content/tools/hicFindTADs.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ The ``zscore_matrix.h5`` file contain a z-score matrix that is useful to quickly
5252
5353
$ hicFindTADs -m myHiCmatrix.h5 \
5454
--outPrefix myHiCmatrix_min10000_max40000_step1500_thres0.01_delta0.01_fdr \
55-
--TAD_sep_score_prefix myHiCmatrix_min10000_max40000_step1500_thres0.001_delta0.01_fdr_zscore_matrix.h5
55+
--TAD_sep_score_prefix myHiCmatrix_min10000_max40000_step1500_thres0.001_delta0.01_fdr
5656
--thresholdComparisons 0.01 \
5757
--delta 0.01 \
5858
--correctForMultipleTesting fdr \
@@ -180,14 +180,14 @@ The process to identify boundaries is as follows:
180180
* everything between 2 consecutive boundaries is a TAD
181181

182182
For the computation of the p-values, the distribution of the z-scores at the 'diamond' above the local minimum is compared
183-
with the distribution of z-scores that are `min_depth` downstream using the Wilcoxon rank-sum test. Simarlty, the
184-
distribution of z-scores is computed with the z-scores `min_dep` upstream of the local mininum. The smallest of the
183+
with the distribution of z-scores that are `min_depth` downstream using the Wilcoxon rank-sum test. Similarly, the
184+
distribution of z-scores is computed with the z-scores `min_depth` upstream of the local minimum. The smallest of the
185185
two p-values is assigned to the local minimum.
186186

187187
If `min_depth` is not given, this is computed as bin size * 30
188188
(if the bins are smaller than 1000), as bin size * 10 if the bins are between
189189
1000 and 20.000 and as bin size * 5 if the bin size is bigger than 20.000.
190190

191-
If `min_depth` is not given, this is computed as bin size * 60
191+
If `max_depth` is not given, this is computed as bin size * 60
192192
(if the bins are smaller than 1000), as bin size * 40 if the bins are between
193193
1000 and 20.000 and as bin size * 10 if the bin size is bigger than 20.000.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
.. _hicValidateLocations:
2+
3+
hicValidateLocations
4+
=====================
5+
6+
.. argparse::
7+
:ref: hicexplorer.hicValidateLocations.parse_arguments
8+
:prog: hicValidateLocations
9+
10+
11+
hicValidateLoops is a tool to compare the detect loops from hicDetectLoops (or from any other software as long as the data format is followed, see below)
12+
with known peak protein locations to validate if the computed loops do have the expected anchor points. Loops are usually bound by CTCF or Cohesin,
13+
therefore it is important to know if the detect loops have protein peaks at their X and Y position.
14+
15+
.. figure:: ../../images/loops_bonev_cavalli.png
16+
17+
Loops in Hi-C, graphic from Bonev & Cavalli, Nature Reviews Genetics 2016
18+
19+
20+
Data format
21+
===========
22+
23+
The data format of hicDetectLoops output is:
24+
25+
chr_x start_x end_x chr_y start_y end_y p-value
26+
27+
As protein input narrowPeak or broadPeak files are tested. However, as long as the protein data contains in the first three columns the
28+
chromosome, start and end it should work too.
29+

0 commit comments

Comments
 (0)