From: Sentences, entities, and keyphrases extraction from consumer health forums using multi-task learning
Feature | Explanation | SR | MER | KE |
---|---|---|---|---|
token | current token | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
token_before | previous token | \(\checkmark\) | \(\checkmark\) | \(\times\) |
token_after | next token | \(\checkmark\) | \(\checkmark\) | \(\times\) |
is_digit | whether current token is digit | \(\checkmark\) | \(\times\) | \(\times\) |
is_begin | whether current token is the beginning of input text | \(\checkmark\) | \(\times\) | \(\times\) |
is_end | whether current token is the end of input text | \(\checkmark\) | \(\times\) | \(\times\) |
token_before_is_closure | whether previous token is one of the following characters:. (period),! (exclamation mark),? (question mark), or, (comma) | \(\checkmark\) | \(\times\) | \(\times\) |
token_length | length of current token | \(\times\) | \(\times\) | \(\checkmark\) |
absolute_position | index of current token, starting from 0 | \(\times\) | \(\times\) | \(\checkmark\) |
relative_position | index of current token divided by total number of tokens in the input | \(\checkmark\) | \(\times\) | \(\times\) |
site_code | code of the website from which the input text was obtained | \(\checkmark\) | \(\times\) | \(\times\) |
pos_tag | part-of-speech tag for current token | \(\checkmark\) | \(\times\) | \(\checkmark\) |
is_np | whether current token is part of a noun phrase | \(\times\) | \(\checkmark\) | \(\times\) |
is_vp | whether current token is part of a verb phrase | \(\times\) | \(\checkmark\) | \(\times\) |
is_stopword | whether current token is a stopword | \(\times\) | \(\checkmark\) | \(\times\) |
is_abbreviation | whether current token is an abbreviation | \(\times\) | \(\times\) | \(\checkmark\) |
abbreviation_inverse | full form of current token if it is an abbreviation | \(\checkmark\) | \(\times\) | \(\times\) |
is_disease | whether current token is in disease dictionary | \(\times\) | \(\checkmark\) | \(\times\) |
is_symptom | whether current token is in symptom dictionary | \(\times\) | \(\checkmark\) | \(\times\) |
is_treatment | whether current token is in treatment dictionary | \(\times\) | \(\checkmark\) | \(\times\) |
in_medical_dict | whether current token is in one of the disease, drug, symptom, or treatment dictionaries | \(\times\) | \(\times\) | \(\checkmark\) |
is_medical_entity | whether current token is part of a medical entity | \(\times\) | \(\times\) | \(\checkmark\) |
max_lcs_disease | maximum ratio of longest common substring between current token and entries in disease dictionary | \(\checkmark\) | \(\times\) | \(\times\) |
max_lcs_symptom | maximum ratio of longest common substring between current token and entries in symptom dictionary | \(\checkmark\) | \(\times\) | \(\times\) |
max_lcs_treatment | maximum ratio of longest common substring between current token and entries in treatment dictionary | \(\checkmark\) | \(\times\) | \(\times\) |
stickiness | stickiness value between cureent token with its preceding and succeeding tokens, computed using pointwise mutual information (PMI) | \(\times\) | \(\times\) | \(\checkmark\) |