SHADE: Semantic Hypernym Annotator for
Domain-specific Entities - DnD Domain Use Case
Akila Peiris, Nisansa de Silva
Department of Computer Science & Engineering,
University of Moratuwa, Sri Lanka
{akila.21,nisansadds}@cse.mrt.ac.lk
Abstract—Manual data annotation is an important NLP task
but one that takes considerable amount of resources and effort. In
spite of the costs, labeling and categorizing entities is essential for
NLP tasks such as semantic evaluation. Even though annotation
can be done by non-experts in most cases, due to the fact that
this requires human labor, the process is costly. Another major
challenge encountered in data annotation is maintaining the
annotation consistency. Annotation efforts are typically carried
out by teams of multiple annotators. The annotations need to
maintain the consistency in relation to both the domain truth
and annotation format while reducing human errors. Annotating
a specialized domain that deviates significantly from the general
domain, such as fantasy literature, will see a lot of human error
and annotator disagreement. So it is vital that proper guidelines
and error reduction mechanisms are enforced. One such way to
enforce these constraints is using a specialized application. Such
an app can ensure that the notations are consistent, and the labels
can be pre-defined or restricted reducing the room for errors. In
this paper, we present SHADE, an annotation software that can be
used to annotate entities in the high fantasy literature domain.
Specifically in Dungeons and Dragons lore extracted from the
Forgotten Realms Fandom Wiki.
Keywords—data annotation, data extraction, natural language
processing, fantasy literature, dungeons and dragons
I. INTRODUCTION
Dungeons and Dragons (also known as D&D or DnD) is
a turn-based tabletop role-playing game which has gained
immense popularity in the last 5 decades. In this game,
typically set in a fantasy setting, a group of players role-play
characters and go on adventures
1
conducted by one player
in the special role of Dungeon Master. It is a form of an
open-ended game, meaning that there are no correct paths to
play the game [1]. It is up to the player’s interpretation of the
game world (sandbox) and its underlying rules to progress in
the game. In D&D, the rules applicable for gameplay come
from the official resources such as the Player’s Handbook [2]
and the Dungeon Master’s Guide [3] which are categorized
as rule books. Apart from the main rule books, there can be
campaign
2
rules as well as non-official rules implemented by
the Dungeon Master. These game world rules not only refer to
1
Adventure or adventure module refers to a game guide that manages player
knowledge and activities for a specific scenario typically with a cohesive
narrative.
2
Campaign refers to a game guide for an overarching storyline across
multiple adventures, typically with the same set of characters.
the restrictions for the player actions but it is also intertwined
with the setting
3
.
The setting includes the lore (history and current status of
the setting), inhabitants, character classes, weapons, artifacts,
magical spells, potions and many more to help the gameplay.
Almost all of these come with their own statuses and other
measurements. Compared to the rule books, the setting in-
cludes most of the domain-specific named entities which may
differ from the general domain. Identifying the category of
a given entity is an essential part of the gameplay in order
to understand the semantics in relation to the game domain.
For example, while Merriam-Webster online dictionary defines
monstrosity
4
and monster
5
as synonyms, in D&D they have a
hierarchical relationship. In D&D, Monster
6
is the semantic
hypernym [4] (the superclass) of Monstrosity. Knowing the
semantic relationship between the two is a major part of
disambiguation rules. For example, a restriction or a benefit
applicable to monstrosity (sub-class) type does not necessarily
apply to monster (superclass) type. Although, a condition that
is applicable to monster (superclass), is typically applicable
to monstrosity (sub-class). Another close example would be
that in D&D, beasts also fall under the category of monsters
6
even though in general domain beast would refer to an animal,
typically four footed
7
, as opposed to a monster where the
defining trait would be the abnormality or the terrifying
nature
5
.
Data annotation which is typically done manually has the
possibility of generating inconsistent annotations. Inconsis-
tency in data annotation can occur in 2 ways. The first one is
non-expert annotations. If the annotator is not an expert in the
domain, the annotations may not be as accurate as one done
by an expert. The second type of inconsistency can occur due
to human error. Whether typos in the annotations, omissions
of parts of the label text, or even unwanted additions to the
label can all lead to inconsistency in annotations. The second
type of error can easily occur when annotating data in the
fantasy domain which has a considerable amount of deviation
from the general domain. For example, there can be different
3
The setting generally is the world of the game. In some instances, however,
a setting may incorporate multiple worlds.
4
https://www.merriam-webster.com/dictionary/monstrosity
5
https://www.merriam-webster.com/dictionary/monster
6
https://bit.ly/DnDBmonster
7
https://www.merriam-webster.com/dictionary/beast
arXiv:2407.00407v1 [cs.CE] 29 Jun 2024
spellings for words
8
. There can also be an inordinate amount
of accented words such as Faer
ˆ
un
9
as well as fantasy-esque
words violating spelling norms (e.g.Eilistraee
10
). These factors
should be considered when annotating such a data set.
We present SHADE, a web-based annotation application
to annotate entities with the category label in the D&D
domain. As the most popular out of all the D&D settings,
and the defacto default setting in D&D 5e, the Forgotten
Realms setting has a vast collection of resources including the
Forgotten Realms Wikia
11
which has over 47,800 articlesas
of February 2023. Hence, we have chosen this as our lexical
resource. SHADE populates 2 different lists of potential labels
for a given entity and is capable of capturing the annotation
tag along with a weight depending on the source of the label
on a 3 weight scale with 1 being the most important source (1:
from links, 2: from noun phrases, 3: manually typed in). By
limiting the manually typed in inputs and instead extracting
the tags from the lexical resource itself, we can minimize most
of the human errors we mentioned above.
Apart from the entity name and the multiple label tagging
option, the UI also renders the formatted clean text version
of the first paragraph of the wiki article associated with the
entity being tagged. This way the non-expert annotators can
get a context of the entity in question without being distracted
by the formatting, miscellaneous information in the wiki page
and the full article content. This is done to improve efficiency
of the annotators and as a small remedy to bridge the gap
between expert annotations and non-expert annotations.
II. RELATED WORK
Domain specific data sets [6, 7] are useful in a num-
ber of NLP tasks such as text generation and abstractive
summarization. Entity classification data sets can be used in
semantic similarity comparison evaluations in text generation
and abstractive summarization tasks where the generated text
must adhere to semantics according to the given domain.
The biggest challenge in manual annotation is the cost and
time required for the task. To address this, researchers have
explored other avenues. One such alternative is the use of
active learning techniques to reduce the amount of annotation
required [8]. It involve selecting examples for annotation
based on the current state of the classifier, with the goal of
minimizing the overall annotation effort while still ensuring
that the data is annotated accurately. The learning process takes
in advice from the user on more complex queries.
Another alternative is crowd-sourcing [9, 10]. Crowd-
sourcing can offer several advantages, including increased
annotation efficiency and scalability, as well as access to a
wider range of annotators with different levels of expertise.
However, it is important to consider the potential biases and
limitations of crowd-sourced annotations, and to carefully
design and evaluate the annotation process to ensure that the
8
For example, the linguistic arguments on the plural of dwarf [5]
9
https://forgottenrealms.fandom.com/wiki/Faern
10
https://forgottenrealms.fandom.com/wiki/Eilistraee
11
https://forgottenrealms.fandom.com
data is annotated accurately and consistently. Snow et al. [9]
observes that it takes at least 4 non-expert annotations per item
to match that of an expert level annotation.
A. Annotation consistency
When considering manual annotation which is the most
common way of annotating data, whether it is via crowd
sourcing or by an dedicated annotators, multiple annotators
are needed to annotate a reasonably sized data set. There in
lies one of the other biggest challenge encountered in manual
data annotation; consistency of the annotations.
Inter-annotator agreement (IAA), refers to the consistency
of annotations produced by different annotators. Several stud-
ies have investigated the factors that affect IAA, such as anno-
tator expertise, annotation guidelines, and annotation complex-
ity [11, 12].The results of these studies suggest that providing
clear and detailed annotation guidelines and ensuring that
annotators have adequate training and expertise can improve
IAA.
B. FRW data set
The FRW data set
12
by Peiris and de Silva [7] is of great
relevance to our work. It focuses on the creation and evaluation
of a large domain specific data set for D&D. The authors
describe the process of collecting, cleaning, and synthesizing
data from the Forgotten Realms wikia
11
which had over 45,200
articles at the time of the paper’s publication. The authors use
this Wikia from Fandom, Inc.
13
to create a comprehensive
data set for use in various domain specific Natural Language
Processing tasks [1316] on D&D domain. The data set is
composed of multiple sub-data sets catering to different NLP
tasks and needs [1719].
The authors have also evaluated semantic similarity scores
on multiple metrics based on the hierarchy of first links. The
paper defines first links as “[the] first internal reference link
(refers to another article in the same wikia) found in an article
that is not a broken link or a miscellaneous link such as the
pronunciation guide” [7].
Although, this is still an excellent method of automatically
extracting a hierarchical structure from the data, we believe
that things could be improved with a manually annotated data
set. To give an example as to why that is, we will consider the
article on Tiamat
14
in which, the first paragraph or the lead
section is shown in Fig. 1. In the paragraph, the internal links
in order are, lawful evil, dragon, evil dragons, greater gods,
Bane, Asmodeus, Faer
ˆ
unian pantheon, Draconic pantheon,
and Untheric pantheon. If we were to simply select the first
link, it will point to the article Lawful evil, when in fact the
defining trait of the subject of this article is either dragon,
evil dragon or even goddess/ god which is not even properly
linked. This type of information extraction can be improved
with manual annotation.
12
https://huggingface.co/datasets/Akila/ForgottenRealmsWikiDataset
13
https://www.fandom.com
14
https://forgottenrealms.fandom.com/wiki/Tiamat
Tiamat was the lawful evil dragon goddess of greed, queen of evil dragons and,
for a time, reluctant servant of the greater gods Bane and later Asmodeus. Before
entering the Faer
ˆ
unian pantheon, she was a member of the Draconic pantheon,
and for some time she was also a member of the Untheric pantheon.
Fig. 1: Tiamat: Lead section (links are underlined)
III. WIKIPEDIA LEAD SECTION
In a Wikipedia article, one of the most important and
information-rich portion is the lead section. This refers to the
first paragraph or sometimes more which appears at the top
of the page before the table of contents. Compared to the rest
of the content found in a Wikipedia
15
or any Wikipedia-esque
website (that uses the MediaWiki stack and guidelines) article,
this section has it’s own set of specific guidelines
16
. According
to these guidelines, the lead section should be a summary for
the entire article. And it should try to place the subject matter
of the article in context with other concepts, preferably by
linking the articles dedicated to the said higher concepts.
As a result of that, typically the first link in a wiki article
dedicated to a particular topic points to another wiki article
dedicated to a higher concept/ broader category of the said
topic. Iteratively traversed, this first link traversal can lead
to higher and higher (more abstract) concepts as the articles
that follow the guidelines try to put each of the topics in
context. When first link traversal paths are extracted from
the entire Wikia, it will form a graph (or multiple disjoint
graphs) depicting the hierarchy of all the topics in the Wikia.
When Wikipedia itself is concerned, about 97% of first link
traversals lead to a cycle containing the page Philosophy
17
.
In comparison, only around 30% of first link traversals in
Forgotten Realms wiki lead to a single article/ a specific
traversal cycle [7]. This may be caused due to the first links
in articles pointing to an article dedicated to an associate, yet
not directly higher more abstract concept.
IV. METHODOLOGY
The lexical resource for SHADE as mentioned before, was
the Forgotten Realms Wikia
18
. Although as part of the Me-
diaWiki stack there is an option to export the pages in the
Wiki
19
, this feature is limited when exporting a large number
of pages. Due to this, we had to invoke the API endpoint of
the same to export the articles. The exports are in XML format
containing tags including title, redirect, revision, and text for
a given article. The text tag contains the content of the page.
The MediaWiki stack uses Markdown text as the formatting
option for the article contents. So the content in the text tag
contains Markdown text formatting.
To enforce IAA and reduce human error, we propose an
annotation application (SHADE) that lists out the possible
15
https://en.wikipedia.org
16
https://en.wikipedia.org/wiki/Wikipedia:Manual
of Style/Lead section
17
https://en.wikipedia.org/wiki/Philosophy
18
https://forgottenrealms.fandom.com
19
https://forgottenrealms.fandom.com/wiki/Special:Export
annotation tags/ labels for the annotators to choose from. The
labels are all selected from the first paragraph (lead section)
due to its property where it sets the context of the subject under
discussion in the said article by explaining it with broader
subject matter, in most cases referring/ linking the articles on
the broader subject matter itself. This way we can minimize
the suggestions to a list of highly probable labels. If for some
reason the lead section is missing, we take the first paragraph
under the first section and extract the labels from that. The
annotation labels that the system provides are split into 2
different Lists. The first list contains internal links. i.e. links
that refer to other articles form the same Wikia. The second list
contains noun phrases extracted from the same first paragraph.
A. Escaping the infobox
To extract the internal links in the first paragraph to populate
the first list, we need some additional prepossessing on the
Markdown text. Namely, we need to isolate the first paragraph
in order to do so. For this, we need to remove all preceding and
proceeding content. Once the beginning of the first paragraph
is identified or preceding content has been removed, isolating
the first paragraph by removing the proceeding text is easy
enough by splitting at the first carriage return (\n). Removing
the preceding content however, can be tricky as typically,
this includes the structures; infoboxes
20
and DEFAULTSORT.
Infobox is the table-like structure that can be found at the
top right corner of almost all Wikipedia or Wikia articles.
Although in any other context, this is one of the most valuable
resources, for this task, it is considered as unwanted content.
The infobox is contained between double curly braces (i.e.
between {{ and }}). Apart from plain text content, infoboxes
also contain links. Infoboxes and DEFAULTSORT are typically
at the very top of the Markdown content. These structures are
wrapped in two sets of curly braces. In any other scenario, to
remove such structures an existing MediaWiki parser such as
the MWParserFromHell can be used, It is not applicable
for this situation since we would lose the link information
along with the structures leaving only the plain text.
Due to the diverse nature of infoboxes it can include
other types of structures including grids. These structures
are also typically wrapped with the same double curly
brace format, making complex nested structures in the Mark-
down code. Hence, using regex to identify and remove
these structures is not a viable solution. For this we de-
vised a simple algorithm shown in Code 1. We iterate
through the text counting the numbers of double open
curly braces opening_curls and the double closing curly
braces closing_curls until we encounter a link and
opening_curls == closing_curls. This means that
all the structures that have opened till that point have all
been closed and we have encountered a link outside of the
structures. This way we can find the first instance of a link
outside of a structure. And using the last_curl_index
variable, we can also keep track of where the structures end.
20
https://en.wikipedia.org/wiki/Help:Infobox
Fig. 2: Annotation workflow
Code 1: Algorithm to remove structures
E s c a p e I n f o b o x ( s t r )
o p e n i n g c u r l s = 0
c l o s i n g c u r l s = 0
l a s t c u r l i n d e x = 0
For i = 1 To s t r . l e n g t h ( )
I f s t r [ i 1] == { And s t r [ i ] == {
o p e n i n g c u r l s = o p e n i n g c u r l s + 1
I f s t r [ i 1] == } And s t r [ i ] == }
c l o s i n g c u r l s = c l o s i n g c u r l s + 1
l a s t c u r l i n d e x = i
I f s t r [ i 1] == [ And s t r [ i ] == [
And o p e n i n g c u r l s == c l o s i n g c u r l s
r e t u r n s t r [ i 1 : ] , l a s t c u r l i n d e x
r e t u r n s t r , l a s t c u r l i n d e x
B. Internal links list
Internal links in MediaWiki format are included between
double square braces (i.e. between [[ and ]]). In some cases,
there is a vertical line character | or a “pipe” separating the
content in the link annotation into 2 sections. The first half is
the reference page title, the second is the text that will appear
in the rendered version in place of the referred page title. This
is what will remain when using a MediaWiki parser. Figure 3
shows how a piped link appears in Markdown text and in the
rendered version.
Languages, such as [[Druidic language|Druidic]].
(a) Markdown
Languages, such as Druidic.
(b) Rendered
Fig. 3: Example link. Markdown text vs rendered version.
Once the text items wrapped in double square braces have
been extracted, the next step involves extracting only the
reference title in cases where the link is “piped”. In such cases,
we split the text on the | character and retrieve only the first
half. This list is then saved in the database along with a weight
of 1.
Fig. 4: Fist link list containing correct label
This method does not always guarantee that all the articles
will produce results. The statistics in [7] show that not all
of the pages have links and even fewer has what can be
considered a first link in the Forgotten Realms Wiki. And
our algorithm will not consider any links that are found
after the first carriage return, hence there is a possibility that
the lead section spans multiple paragraphs. For this there
is a base assumption and a fail-safe mechanism. The base
assumption is that the lead section adheres to the lead section
guidelines discussed above. In such a case, the most important
links would be at the forefront explaining the subject matter
with reference to the more generalized concepts. The fail-safe
mechanism is the second list itself. If the first link list does not
contain the most suitable label, the annotator can use the “Not
in this list” button (blue colored button in Fig. 4) to populate
the next list.
C. Noun phrases list
The second list consists of noun phrases extracted from the
first paragraph/ lead section. The rationale for choosing this
is that even though the pages are not linked properly, the text
would reflect the intended higher classification. The article on
Tiamat shown in Fig. 1 is a good example for a cases such
as this. In this example, the word “goddess” can be linked to
reflect the article on Deity
21
but it has no such link configured.
On the other hand, if a human annotator were to look at this,
they could tag this to an article that reflects God or deity.
21
https://forgottenrealms.fandom.com/wiki/Deity
Fig. 5: Noun phrase list containing correct label
Due to the fact that the entities are described using nouns
and noun phrases, by extracting these, we can create a list of
potential labels that can act as a second alternative for the en-
tity labels. To extract the noun phrases, we first extract the first
paragraph. This is done so using the last_curl_index.
The algorithm returns the last index that it had observed a
curly brace in the text before encountering the first internal link
outside of a structure. Here we make the assumption that the
next segment is the first paragraph. Once this starting point is
identified, we split the text using the index and then to remove
the Markdown annotations, we use the MWParserFromHell
Python library. Once the text has been cleaned, we split it yet
again in the same way as when processing the link list; at the
first carriage return. The same assumptions apply here. Then
to extract the noun phrase list, we use the TextBlob Python
library with a supplementary corpora. The list of extracted
labels are given a weight of 2 according to the scale discussed
previously. Same as with the first list, there is a dedicated
button to indicate that the label is not found in this list.
Figure 5 showcase a situation where the correct label is found
in the second list.
D. Manual input labels
This is the last failsafe mechanism that we use to capture
the entity labels. By allowing the annotators to manually enter
the labels, we can ensure that the probability of annotating the
correct label for an entity while using the app is never 0. If
the first 2 lists somehow failed to provide the correct label
or the complete label (For example, in the entity Aarakocra,
the correct annotation would be avian humanoid. But the lists
suggest humanoid only), the annotator can manually define
the correct and complete label. These labels are captured with
a weight of 3.
The manual input field in unlocked only when the annotator
has confirmed that both of the lists do not contain the correct
label as shown in Fig. 6. If the annotator needs to go back to
the previous list, they can do so by refreshing the page. Once
an annotator has been assigned an entity, they will always be
redirected to the same entity until they complete the annotation
for the entity. If the annotator is unsure with the annotation of
a particular entity, they are given the option to skip that entity
as well.
Fig. 6: Manually entering the correct label
Fig. 7: ER diagram
V. ASSIGNING AND KEEPING TRACK OF ANNOTATIONS
Figure 7 depicts the entity relationship diagram for the
database which contains a list of entities (entity_page).
Aside from the entity_name and the first_paragraph
extracted from the wiki, this also maintains a m:1 mapping
with the annotators (assignee) and several states including
whether the annotation was completed or was marked as
skipped. By separating the completed and skipped flags, we
can ensure that we can identify the entities that the annotators
have deliberately marked with “Skip Page” button (Figure 6)
from unintentionally skipped entities, for example by way of
reloading the page.
To make sure that the entities are not skipped unintention-
ally, we have devised a mechanism where it will present the
user with the top most result that has been assigned to them
(typically one result), and has the completed and skipped
flags in False state. This way the annotator cannot skip a
given entity unless explicitly declaring their intention to do
so. By filtering out the entities where entity
p
age.skipped ==
T RU E, we can identify the entities the annotators have
trouble labeling.
VI. DISCUSSION
We have employed annotators to annotate the D&D entities
using the SHADE application. We have currently annotated
a total of 3984 where 2964 refer to articles on the fictional
timeline. 1020 are named entities that refer to other types of
concepts. Out of the 1020, 399 were picked from the links list,
242 were picked from noun phrases list, and 379 were typed in.
Figure 8 showcases the breakdown of the currently annotated
entities by the label source. The two lists providing nearly 2/3
of the labels prove that this system can can reduce the manual
text inputs during annotation significantly. It also provides
evidence that not all the correct annotations are properly linked
in the wikia, so an automatically extracted data set such as the
FRW-FL sub data set under FRW data set [7] can be enriched
via manual annotation.
Fig. 8: Current annotation breakdown by label source
VII. CONCLUSION AND FUTURE WORK
This project provides a glimpse on how we can structure
an application that is used for data annotations, especially
when there are text corpi associated with the entities to be
annotated. Although this was designed for a specific Fandom
Wikia, due to the MediaWiki guidelines, we can not only port
this to other Fandom Wikis, but also any Wiki-esque data
source that uses the MediaWiki stack, including Wikipedia
itself. Further improvements to the SHADE system includes
allowing multiple annotations to the same entity with different
priorities, and including has-a relationship annotations.
REFERENCES
[1] K. Squire, Open-ended video games: A model for de-
veloping learning for the interactive age. MacArthur
Foundation Digital Media and Learning Initiative, 2007.
[2] J. Crawford, J. Wyatt, R. J. Schwalb, and B. R. Cordell,
Player’s Handbook. Wizards of the Coast LLC, 2014.
[3] J. Crawford, C. Perkins, and J. Wyatt, Dungeon Master’s
Guide. Wizards of the Coast LLC, 2014.
[4] N. H. N. D. De Silva, A. S. Perera, and M. K. D. T.
Maldeniya, “Semi-supervised algorithm for concept on-
tology based word set expansion, in 2013 International
Conference on Advances in ICT for Emerging Regions
(ICTer). IEEE, 2013, pp. 125–131.
[5] M. Liberman, “Language Log: Dwarves vs. Dwarfs,
http://itre.cis.upenn.edu/
myl/languagelog/archives/
000293.html, January 2004, (Accessed on 03/14/2023).
[6] R. Rameshkumar and P. Bailey, “Storytelling with dia-
logue: A critical role dungeons and dragons dataset, in
58th ACL Meeting Proceedings, 2020, pp. 5121–5134.
[7] A. Peiris and N. de Silva, “Synthesis and Evaluation
of a Domain-specific Large Data Set for Dungeons &
Dragons, arXiv preprint arXiv:2212.09080, 2022.
[8] F. Olsson, A literature survey of active machine learning
in the context of natural language processing, 2009.
[9] R. Snow, B. O’Connor, D. Jurafsky, and A. Ng, “Cheap
and fast but is it good? evaluating non-expert an-
notations for natural language tasks, in 2008 EMNLP
Conference Proceedings, Honolulu, Hawaii, Oct. 2008,
pp. 254–263.
[10] A. Dumitrache, L. Aroyo, and C. Welty, Achieving
expert-level annotation quality with crowdtruth, in Proc.
of BDM2I Workshop, ISWC, 2015.
[11] R. Artstein and M. Poesio, “Inter-coder agreement
for computational linguistics, Computational linguistics,
vol. 34, no. 4, pp. 555–596, 2008.
[12] E. Ouyang, Y. Li, L. Jin, Z. Li, and X. Zhang, “Exploring
n-gram character presentation in bidirectional rnn-crf for
chinese clinical named entity recognition, in CEUR
workshop proceedings, vol. 1976, 2017, pp. 37–42.
[13] H. Chen, H. Takamura, and H. Nakayama, “SciXGen:
A scientific paper dataset for context-aware text genera-
tion, in Findings of ACL: EMNLP 2021, Nov. 2021, pp.
1483–1492.
[14] Y. Gu et al., “Domain-specific language model pretrain-
ing for biomedical natural language processing, ACM
Transactions on Computing for Healthcare (HEALTH),
vol. 3, no. 1, pp. 1–23, 2021.
[15] A. Amin-Nejad, J. Ive, and S. Velupillai, “Exploring
transformer text generation for medical dataset augmen-
tation, in Proceedings of the Twelfth Language Re-
sources and Evaluation Conference. Marseille, France:
European Language Resources Association, May 2020,
pp. 4699–4708.
[16] A. Ferrari, B. Donati, and S. Gnesi, “Detecting domain-
specific ambiguities: an nlp approach based on wikipedia
crawling and word embeddings, in 2017 IEEE 25th In-
ternational Requirements Engineering Conference Work-
shops (REW). IEEE, 2017, pp. 393–399.
[17] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and
Y. Artzi, “BERTScore: Evaluating Text Generation with
BERT, in International Conference on Learning Repre-
sentations, 2019.
[18] N. de Silva and D. Dou, “Semantic oppositeness assisted
deep contextual modeling for automatic rumor detection
in social networks, in 16th EACL Conference Proceed-
ings: Main Volume, Apr. 2021, pp. 405–415.
[19] K. Sugathadasa et al., “Legal Document Retrieval using
Document Vector Embeddings and Deep Learning, in
Science and information conference. Springer, 2018,
pp. 160–175.