AnnoDoc¶
-
class
epitator.annodoc.
AnnoDoc
(text=None, date=None)[source]¶ Bases:
object
A document to be annotated. The tiers property links to the annotations applied to it.
-
create_regex_tier
(regex, label=None)[source]¶ Create an AnnoTier from all the spans of text that match the regex.
-
filter_overlapping_spans
(tiers=None, tier_names=None, score_func=None)[source]¶ Remove the smaller of any overlapping spans.
-
require_tiers
(*tier_names, **kwargs)[source]¶ Return the specified tiers or add them using the via annotator.
-
to_dict
()[source]¶ Convert the document into a json serializable dictionary. This does not store all the document’s data. For a complete serialization use pickle.
>>> from .annospan import AnnoSpan >>> from .annotier import AnnoTier >>> import datetime >>> doc = AnnoDoc('one two three', date=datetime.datetime(2011, 11, 11)) >>> doc.tiers = { ... 'test': AnnoTier([AnnoSpan(0, 3, doc), AnnoSpan(4, 7, doc)])} >>> d = doc.to_dict() >>> str(d['text']) 'one two three' >>> str(d['date']) '2011-11-11T00:00:00Z' >>> sorted(d['tiers']['test'][0].items()) [('label', None), ('textOffsets', [[0, 3]])] >>> sorted(d['tiers']['test'][1].items()) [('label', None), ('textOffsets', [[4, 7]])]
-