Configure an Entity form campaign
You can create Entity form campaigns in order to annotate entities directly on documents, or to correct existing entities that were imported from the project's provider.
For contributors, this campaign mode displays an image and next to it a form to fill out with data from the image.
Configuration parameters
After having filled out the generic configuration parameters, including the optional Context ancestor type parameter, you need to specify the entities to be annotated.
Each specified entity corresponds to a dedicated field in the annotation form, on the contributor's side.
You can add, edit or remove such fields from your campaign's configuration by using respectively the Add a new field
button, the blue pencil
button and the red trash
button.
Required parameters
For each entity, you need to define:
- the Type of the entity: for example
first name
,last name
,occupation
, etc. - an Instruction text that will be displayed to the contributor, as the field label.
The Entity type will not be displayed to the contributors when they annotate: it is a unique identifier to identify each entity, the contributors will only see the Instruction text.
If your campaign's parent project is linked to an object from an external provider, and entity types were imported from that provider, then these entity types will appear as hints when you fill the Entity type field. You can use these existing types and/or input new ones.
Correction of imported entities
If the purpose of your campaign is to correct existing imported entities (see the dedicated section in the import documentation) then you need to use the imported entity types. Only the imported entities for which the corresponding type has been added in the campaign configuration will be available to be corrected by the contributors: the relevant fields in the annotation form will be pre-filled with their values.
Optional parameters
For each entity, you can also set:
- optional Further instructions that will be displayed upon hovering over a "help" icon, to give additional indications to the contributor: see the screenshot above,
- an optional Confidence threshold for imported entities: see details below,
- an optional Regular expression pattern to validate contributor inputs: see details below,
- an optional set of Predefined annotations to restrain the contributor's input: if predefined choices are set, then the contributor cannot enter text freely into the corresponding field, and can only select one of the configured options.
Note
You can drag the entity rows from the Form fields table up and down to re-order them, using the icon on their left. The order in which they are arranged in the campaign configuration form is the order in which the fields will be displayed to the contributors in the annotation form.
Confidence threshold to highlight imported entities needing validation
The Confidence threshold parameter is only relevant if your campaign is designed to correct existing imported entities. If these entities are associated to a confidence score (this may be the case if they were produced by a machine learning tool), by setting a Confidence threshold you can highlight the entities whose confidence score is inferior to this threshold in red, signalling to the contributors that they probably need to be corrected.
For example, if you set the Confidence threshold at 0.6
for an entity field, all imported entities for that field with a confidence score below 0.6
will be highlighted in red.
Regular expression to use to validate the input
If you want to restrict what your contributors can input in a given field, you can use a Regular expression. When a contributor tries to submit their annotation with an invalid value, they will be prevented from doing so, and an error message will appear. You may want to use the Further instructions field to explain the expected format of the contributor input.
For example, if you want contributors to input dates in the DD/MM/YYYY format, you can use the following regular expression: [0-9]{2}\/[0-9]{2}\/[0-9]{4}
. This however does not prevent contributors from entering something that is shaped like a date, but is in fact incorrect. For example, 41/15/2598
would match this regular expression and be accepted. You can use a more complex regular expression to limit contributor input further: [0-3][0-9]\/[0-1][0-9]\/1[8-9][0-9]{2}
would limit the contributors to year values ranging between 1800 and 1999, but would still validate aberrant value like 35/15/1989
. The following regular expression
^(0[1-9]|1\d|2\d|3[01])/(0[1-9]|1[0-2])/([1-9]\d{3})$
ensures that only valid dates in the DD/MM/YYYY format can be accepted by the field.
Here are a few examples of simple regular expressions that can be useful for Entity form annotation campaigns:
^[A-Z][a-z]+$
: only accept a single capitalized word.^[A-Z][a-z]+(-[A-Z][a-z]+)?$
: only accept a single capitalized word, or a single hyphenated and capitalized word (e.g.Ferdinand
, orFranz-Ferdinand
).^[0-9]+$
: only accept a sequence of numbers. You can use^[0-9]{i}$
to accept only a sequence ofi
numbers.
You can use an online validator to write and test your regular expressions.
Example annotation task
With the above campaign configuration, an annotation task on a table row would look like the image below. The form fields are in the same order as in the campaign configuration, and instead of a free input field there is a select, with predefined annotations, for the last name
entity.
Once your campaign is configured, you can create tasks to start gathering annotations on your documents.