AnnoTier

class epitator.annotier.AnnoTier(spans=None, presorted=False)[source]

Bases: object

A group of AnnoSpans stored sorted by start offset.

chains(at_least=1, at_most=None, max_dist=1)[source]

Create a new tier from all chains of spans within max_dist of eachother.

combined_adjacent_spans(max_dist=1)[source]

Create a new tier from groups of spans within max_dist of eachother.

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three four')
>>> tier = AnnoTier([AnnoSpan(0, 3, doc),
...                  AnnoSpan(8, 13, doc),
...                  AnnoSpan(14, 18, doc)])
>>> tier.combined_adjacent_spans()
AnnoTier([SpanGroup(text=one, label=None, AnnoSpan(0-3, one)), SpanGroup(text=three four, label=None, AnnoSpan(8-13, three), AnnoSpan(14-18, four))])
group_spans_by_containing_span(other_tier, allow_partial_containment=False)[source]

Group spans in other_tier by the spans that contain them in this one.

Parameters:
  • other_tier (AnnoTier) – The spans to be grouped together
  • allow_partial_containment – Include spans in groups for spans that partially overlap them.
Returns:

An iterator that returns pairs of values, the first of which is the containing span from this tier, the second is an array of spans from other_tier that the span from this tier contans.

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three')
>>> tier_a = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)])
>>> tier_b = AnnoTier([AnnoSpan(0, 1, doc)])
>>> list(tier_a.group_spans_by_containing_span(tier_b))
[(AnnoSpan(0-3, one), [AnnoSpan(0-1, o)]), (AnnoSpan(4-7, two), [])]
label_spans(label)[source]

Create a new tier based on this one with labeled spans that can be looked up by groupdict.

match_subspans(regex)[source]

Create a new tier from the components of spans matching the given regular expression.

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three four')
>>> tier = AnnoTier([AnnoSpan(0, 3, doc),
...                  AnnoSpan(4, 13, doc),
...                  AnnoSpan(14, 18, doc)])
>>> tier.match_subspans(r"two")
AnnoTier([AnnoSpan(4-7, two)])
nearest_to(target_span)[source]

Find the nearest span to the target span.

optimal_span_set(prefer='text_length')[source]
Parameters:perfer – A function that takes a span and returns a numeric tuple score. The following predefined functions may be specified via string: text_length, text_length_min_spans, num_spans, and num_spans_and_no_linebreaks
Returns:A tier with the set of non-overlapping spans from this tier that maximizes the prefer function.
Return type:AnnoTier
>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three')
>>> tier = AnnoTier([AnnoSpan(0, 3, doc, 'odd'),
...                  AnnoSpan(4, 7, doc, 'even'),
...                  AnnoSpan(3, 13, doc, 'long_span'),
...                  AnnoSpan(8, 13, doc, 'odd')])
>>> tier.optimal_span_set()
AnnoTier([AnnoSpan(0-3, odd), AnnoSpan(3-13, long_span)])
search_spans(regex, label=None)[source]

Search spans for ones matching the given regular expression.

span_after(target_span)[source]

Find the nearest span that comes after the target span.

span_before(target_span, allow_overlap=True)[source]

Find the nearest span that comes before the target span.

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three four')
>>> tier = AnnoTier([AnnoSpan(0, 3, doc),
...                  AnnoSpan(8, 13, doc),
...                  AnnoSpan(14, 18, doc)])
>>> tier.span_before(AnnoSpan(4, 7, doc))
AnnoSpan(0-3, one)
spans_contained_by_span(selector_span)[source]

Return a list of spans that are contained by a “selector span”.

>>> from epitator.annospan import AnnoSpan
>>> from epitator.annodoc import AnnoDoc
>>> from epitator.annotier import AnnoTier
>>> doc = AnnoDoc('one two three')
>>> tier1 = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)])
>>> span1 = AnnoSpan(3, 9, doc)
>>> tier1.spans_contained_by_span(span1)
AnnoTier([AnnoSpan(4-7, two)])
spans_overlapped_by_span(selector_span)[source]

Return a list of spans that overlap a “selector span”.

>>> from epitator.annospan import AnnoSpan
>>> from epitator.annodoc import AnnoDoc
>>> from epitator.annotier import AnnoTier
>>> doc = AnnoDoc('one two three')
>>> tier1 = AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)])
>>> span1 = AnnoSpan(0, 1, doc)
>>> tier1.spans_overlapped_by_span(span1)
AnnoTier([AnnoSpan(0-3, one)])
subtract_overlaps(other_tier)[source]
Parameters:other_tier (AnnoTier) – The spans to be removed from the territory of this tier
Returns:A copy of this tier with spans truncated and split so that none of the new spans overlap a span in other_tier
Return type:AnnoTier
>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three four')
>>> tier_a = AnnoTier([AnnoSpan(0, 18, doc)])
>>> tier_b = AnnoTier([AnnoSpan(3, 8, doc), AnnoSpan(13, 18, doc)])
>>> tier_a.subtract_overlaps(tier_b)
AnnoTier([AnnoSpan(0-3, one), AnnoSpan(8-13, three)])
with_contained_spans_from(other_tier, allow_partial_containment=False)[source]

Create a new tier from pairs spans in this tier and the other tier where the span in this tier contains one in the other tier.

with_following_spans_from(other_tier, max_dist=1, allow_overlap=False)[source]

Create a new tier from pairs of spans where the one in the other tier follows a span from this tier.

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three four')
>>> tier1 = AnnoTier([AnnoSpan(0, 3, doc),
...                   AnnoSpan(8, 13, doc)])
>>> tier2 = AnnoTier([AnnoSpan(14, 18, doc)])
>>> tier1.with_following_spans_from(tier2)
AnnoTier([SpanGroup(text=three four, label=None, AnnoSpan(8-13, three), AnnoSpan(14-18, four))])
with_label(label)[source]

Create a tier from the spans which have the given label

>>> from .annospan import AnnoSpan
>>> from .annodoc import AnnoDoc
>>> doc = AnnoDoc('one two three')
>>> tier = AnnoTier([AnnoSpan(0, 3, doc, 'odd'),
...                  AnnoSpan(4, 7, doc, 'even'),
...                  AnnoSpan(8, 13, doc, 'odd')])
>>> tier.with_label("odd")
AnnoTier([AnnoSpan(0-3, odd), AnnoSpan(8-13, odd)])
with_nearby_spans_from(other_tier, max_dist=100)[source]

Create a new tier from pairs spans in this tier and the other tier that are near eachother.

without_overlaps(other_tier)[source]

Create a copy of this tier without spans that overlap a span in the other tier.