AnnoTier¶
-
class
epitator.annotier.
AnnoTier
(spans=None, presorted=False)[source]¶ Bases:
object
A group of AnnoSpans stored sorted by start offset.
-
chains
(at_least=1, at_most=None, max_dist=1)[source]¶ Create a new tier from all chains of spans within max_dist of eachother.
-
combined_adjacent_spans
(max_dist=1)[source]¶ Create a new tier from groups of spans within max_dist of eachother.
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three four') >>> tier = AnnoTier([AnnoSpan(0, 3, doc), ... AnnoSpan(8, 13, doc), ... AnnoSpan(14, 18, doc)]) >>> tier.combined_adjacent_spans() AnnoTier([SpanGroup(text=one, label=None, AnnoSpan(0-3, one)), SpanGroup(text=three four, label=None, AnnoSpan(8-13, three), AnnoSpan(14-18, four))])
-
group_spans_by_containing_span
(other_tier, allow_partial_containment=False)[source]¶ Group spans in other_tier by the spans that contain them in this one.
Parameters: - other_tier (AnnoTier) – The spans to be grouped together
- allow_partial_containment – Include spans in groups for spans that partially overlap them.
Returns: An iterator that returns pairs of values, the first of which is the containing span from this tier, the second is an array of spans from other_tier that the span from this tier contans.
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three') >>> tier_a = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)]) >>> tier_b = AnnoTier([AnnoSpan(0, 1, doc)]) >>> list(tier_a.group_spans_by_containing_span(tier_b)) [(AnnoSpan(0-3, one), [AnnoSpan(0-1, o)]), (AnnoSpan(4-7, two), [])]
-
label_spans
(label)[source]¶ Create a new tier based on this one with labeled spans that can be looked up by groupdict.
-
match_subspans
(regex)[source]¶ Create a new tier from the components of spans matching the given regular expression.
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three four') >>> tier = AnnoTier([AnnoSpan(0, 3, doc), ... AnnoSpan(4, 13, doc), ... AnnoSpan(14, 18, doc)]) >>> tier.match_subspans(r"two") AnnoTier([AnnoSpan(4-7, two)])
-
optimal_span_set
(prefer='text_length')[source]¶ Parameters: perfer – A function that takes a span and returns a numeric tuple score. The following predefined functions may be specified via string: text_length, text_length_min_spans, num_spans, and num_spans_and_no_linebreaks Returns: A tier with the set of non-overlapping spans from this tier that maximizes the prefer function. Return type: AnnoTier >>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three') >>> tier = AnnoTier([AnnoSpan(0, 3, doc, 'odd'), ... AnnoSpan(4, 7, doc, 'even'), ... AnnoSpan(3, 13, doc, 'long_span'), ... AnnoSpan(8, 13, doc, 'odd')]) >>> tier.optimal_span_set() AnnoTier([AnnoSpan(0-3, odd), AnnoSpan(3-13, long_span)])
-
search_spans
(regex, label=None)[source]¶ Search spans for ones matching the given regular expression.
-
span_before
(target_span, allow_overlap=True)[source]¶ Find the nearest span that comes before the target span.
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three four') >>> tier = AnnoTier([AnnoSpan(0, 3, doc), ... AnnoSpan(8, 13, doc), ... AnnoSpan(14, 18, doc)]) >>> tier.span_before(AnnoSpan(4, 7, doc)) AnnoSpan(0-3, one)
-
spans_contained_by_span
(selector_span)[source]¶ Return a list of spans that are contained by a “selector span”.
>>> from epitator.annospan import AnnoSpan >>> from epitator.annodoc import AnnoDoc >>> from epitator.annotier import AnnoTier >>> doc = AnnoDoc('one two three') >>> tier1 = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)]) >>> span1 = AnnoSpan(3, 9, doc) >>> tier1.spans_contained_by_span(span1) AnnoTier([AnnoSpan(4-7, two)])
-
spans_overlapped_by_span
(selector_span)[source]¶ Return a list of spans that overlap a “selector span”.
>>> from epitator.annospan import AnnoSpan >>> from epitator.annodoc import AnnoDoc >>> from epitator.annotier import AnnoTier >>> doc = AnnoDoc('one two three') >>> tier1 = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)]) >>> span1 = AnnoSpan(0, 1, doc) >>> tier1.spans_overlapped_by_span(span1) AnnoTier([AnnoSpan(0-3, one)])
-
subtract_overlaps
(other_tier)[source]¶ Parameters: other_tier (AnnoTier) – The spans to be removed from the territory of this tier Returns: A copy of this tier with spans truncated and split so that none of the new spans overlap a span in other_tier Return type: AnnoTier >>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three four') >>> tier_a = AnnoTier([AnnoSpan(0, 18, doc)]) >>> tier_b = AnnoTier([AnnoSpan(3, 8, doc), AnnoSpan(13, 18, doc)]) >>> tier_a.subtract_overlaps(tier_b) AnnoTier([AnnoSpan(0-3, one), AnnoSpan(8-13, three)])
-
with_contained_spans_from
(other_tier, allow_partial_containment=False)[source]¶ Create a new tier from pairs spans in this tier and the other tier where the span in this tier contains one in the other tier.
-
with_following_spans_from
(other_tier, max_dist=1, allow_overlap=False)[source]¶ Create a new tier from pairs of spans where the one in the other tier follows a span from this tier.
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three four') >>> tier1 = AnnoTier([AnnoSpan(0, 3, doc), ... AnnoSpan(8, 13, doc)]) >>> tier2 = AnnoTier([AnnoSpan(14, 18, doc)]) >>> tier1.with_following_spans_from(tier2) AnnoTier([SpanGroup(text=three four, label=None, AnnoSpan(8-13, three), AnnoSpan(14-18, four))])
-
with_label
(label)[source]¶ Create a tier from the spans which have the given label
>>> from .annospan import AnnoSpan >>> from .annodoc import AnnoDoc >>> doc = AnnoDoc('one two three') >>> tier = AnnoTier([AnnoSpan(0, 3, doc, 'odd'), ... AnnoSpan(4, 7, doc, 'even'), ... AnnoSpan(8, 13, doc, 'odd')]) >>> tier.with_label("odd") AnnoTier([AnnoSpan(0-3, odd), AnnoSpan(8-13, odd)])
-