• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

sequana / sequana / 16568275773

28 Jul 2025 11:51AM UTC coverage: 69.115% (-0.6%) from 69.722%
16568275773

push

github

web-flow
Merge pull request #870 from cokelaer/dev

add threshold parameter in G4 module

1 of 2 new or added lines in 1 file covered. (50.0%)

725 existing lines in 17 files now uncovered.

14389 of 20819 relevant lines covered (69.11%)

2.07 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

62.79
/sequana/vcftools.py
1
#
2
#  This file is part of Sequana software
3
#
4
#  Copyright (c) 2016-2021 - Sequana Development Team
5
#
6
#  Distributed under the terms of the 3-clause BSD license.
7
#  The full license is in the LICENSE file, distributed with this software.
8
#
9
#  website: https://github.com/sequana/sequana
10
#  documentation: http://sequana.readthedocs.io
11
#
12
##############################################################################
13
"""
3✔
14
Python script to filter a VCF file
15
"""
16

17
import colorlog
3✔
18

19
from sequana.lazy import pylab
3✔
20
from sequana.utils.fisher import fisher_exact
3✔
21

22
logger = colorlog.getLogger(__name__)
3✔
23

24

25
__all__ = [
3✔
26
    "strand_ratio",
27
    "compute_frequency",
28
    "compute_strand_balance",
29
    "compute_fisher_strand_filter",
30
]
31

32

33
def strand_ratio(number1, number2):
3✔
34
    """Compute ratio between two number. Return result between [0:0.5]."""
35
    try:
3✔
36
        division = float(number1) / (number1 + number2)
3✔
37
        if division > 0.5:
3✔
38
            division = 1 - division
3✔
39
    except ZeroDivisionError:
×
40
        return 0
×
41
    return division
3✔
42

43

44
def compute_frequency(record):
3✔
45
    """Compute frequency of alternate allele.
46
        alt_freq = Count Alternate Allele / Depth
47

48
    :param record: variant record
49
    """
50
    try:
3✔
51
        info = record.info
3✔
52
    except:
×
53
        info = record.INFO
×
54

55
    # For freebayes:
56
    try:
3✔
57
        return [float(count) / info["DP"] for count in info["AO"]]
3✔
UNCOV
58
    except:
×
59
        # sniffles
UNCOV
60
        if "VAF" in info:
×
UNCOV
61
            return [float(info["VAF"])]
×
62
        else:
UNCOV
63
            return []
×
64

65

66
def compute_fisher_strand_filter(record):
3✔
67
    """Fisher strand filter (FS).
68

69
    Sites where the numbers of reference/non-reference reads are highly correlated
70
    with the strands of the reads. Counting the number of reference reads on the forward
71
    strand and on the reverse strand, and the number of alternate reads on the forward and
72
    reverse strand should be equivalent. With these four numbers, we
73
    construct a 2 x 2 contingency table and used the P-value from a Fisher’s exact test
74
    to evaluate the correlation.
75

76
        from sequana import freebayes_vcf_filter
77
        v = freebayes_vcf_filter.VCF_freebayes("data/JB409847.vcf")
78
        r = next(v)
79
        compute_fisher_strand_filter(r)
80

81
    If the pvalue is less than 0.05 we should reject the variant since the alternate and reference
82
    do not behave in the same way. Typically found if the alternate has a poor strand balance. Used with
83
    care if frequency of alternate or reference is low or deth of coverage is low.
84

85

86
    .. note:: fisher terst with two-sided hypothesis"""
87
    try:
3✔
88
        info = record.info
3✔
UNCOV
89
    except:
×
UNCOV
90
        info = record.INFO
×
91

92
    try:
3✔
93
        return [
3✔
94
            fisher_exact([[x, y], [info["SRF"], info["SRR"]]], "two-sided") for x, y in zip(info["SAF"], info["SAR"])
95
        ]
UNCOV
96
    except KeyError:
×
UNCOV
97
        return [1]
×
98

99

100
def compute_strand_balance(record):
3✔
101
    """Compute strand balance of alternate allele include in [0,0.5].
102
    strand_bal = Alt Forward / (Alt Forward + Alt Reverse)
103

104
    :param record: variant record
105

106

107
    FYI: in freebayes, the allele balance (reported under AB), strand
108
    bias counts (SRF, SRR, SAF, SAR) and bias estimate (SAP)
109
    can be used as well for filtering. Here, we use the strand balance
110
    computed as SAF / (SAF + SAR)
111

112

113
    """
114
    try:
3✔
115
        info = record.info
3✔
UNCOV
116
    except:
×
UNCOV
117
        info = record.INFO
×
118

119
    try:
3✔
120
        return [strand_ratio(x, y) for x, y in zip(info["SAF"], info["SAR"])]
3✔
UNCOV
121
    except KeyError:
×
UNCOV
122
        return [0.5]
×
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2026 Coveralls, Inc