AnnoDoc

class epitator.annodoc.AnnoDoc(text=None, date=None)[source]

Bases: object

A document to be annotated. The tiers property links to the annotations applied to it.

add_tier(annotator, **kwargs)[source]
add_tiers(annotator, **kwargs)[source]
create_regex_tier(regex, label=None)[source]

Create an AnnoTier from all the spans of text that match the regex.

filter_overlapping_spans(tiers=None, tier_names=None, score_func=None)[source]

Remove the smaller of any overlapping spans.

require_tiers(*tier_names, **kwargs)[source]

Return the specified tiers or add them using the via annotator.

to_dict()[source]

Convert the document into a json serializable dictionary. This does not store all the document’s data. For a complete serialization use pickle.

>>> from .annospan import AnnoSpan
>>> from .annotier import AnnoTier
>>> import datetime
>>> doc = AnnoDoc('one two three', date=datetime.datetime(2011, 11, 11))
>>> doc.tiers = {
...     'test': AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)])}
>>> d = doc.to_dict()
>>> str(d['text'])
'one two three'
>>> str(d['date'])
'2011-11-11T00:00:00Z'
>>> sorted(d['tiers']['test'][0].items())
[('label', None), ('textOffsets', [[0, 3]])]
>>> sorted(d['tiers']['test'][1].items())
[('label', None), ('textOffsets', [[4, 7]])]