Next Article in Journal
A Machine Learning Tool to Predict the Response to Neoadjuvant Chemotherapy in Patients with Locally Advanced Cervical Cancer
Next Article in Special Issue
Seg2pix: Few Shot Training Line Art Colorization with Segmented Image Data
Previous Article in Journal
An Immersive Virtual Reality Game for Predicting Risk Taking through the Use of Implicit Measures
Previous Article in Special Issue
Korean Historical Documents Analysis with Improved Dynamic Word Embedding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semantic Scene Graph Generation Using RDF Model and Deep Learning

1
Department of Big Data and AI, Hoseo University, Asan 31499, Korea
2
Department of Computer Engineering, Hoseo University, Asan 31499, Korea
3
Division of Future Convergence (HCI Science Major), Dongduk Women’s University, Seoul 02748, Korea
4
Department of Management Information Systems, Jeju National University, Jeju 63243, Korea
5
School of Information Convergence, Kwangwoon University, Seoul 01890, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(2), 826; https://doi.org/10.3390/app11020826
Submission received: 26 November 2020 / Revised: 5 January 2021 / Accepted: 14 January 2021 / Published: 17 January 2021
(This article belongs to the Special Issue Advances in Deep Learning Ⅱ)

Abstract

:
Over the last several years, in parallel with the general global advancement in mobile technology and a rise in social media network content consumption, multimedia content production and reproduction has increased exponentially. Therefore, enabled by the rapid recent advancements in deep learning technology, research on scene graph generation is being actively conducted to more efficiently search for and classify images desired by users within a large amount of content. This approach lets users accurately find images they are searching for by expressing meaningful information on image content as nodes and edges of a graph. In this study, we propose a scene graph generation method based on using the Resource Description Framework (RDF) model to clarify semantic relations. Furthermore, we also use convolutional neural network (CNN) and recurrent neural network (RNN) deep learning models to generate a scene graph expressed in a controlled vocabulary of the RDF model to understand the relations between image object tags. Finally, we experimentally demonstrate through testing that our proposed technique can express semantic content more effectively than existing approaches.

1. Introduction

The total amount of digital image content has increased exponentially in recent years, owing to the advancement of mobile technology and, simultaneously, content consumption has greatly increased on several social media platforms. Thus, it becomes increasingly more imperative to effectively store and manage these large amounts of images. Hitherto, image annotation techniques able to efficiently search through large amounts of image content have been proposed [1,2,3,4,5]. These techniques represent images in several ways by storing semantic information describing the image content along with the images themselves. This enables users to accurately search for and classify a desired image within a large amount of image data. Recently, scene graph generation methods for detecting objects in images and expressing their relations have been actively studied with advancements in deep learning technology [6,7]. Scene graph generation involves the detection of objects in an image and relations between these objects. Generally, objects are detected first, and the relations between them are then predicted. Relation detection using language modules based on a pretrained word vector [6], and through message interaction between object detection and relation detection [7], have been used for this prediction. Although scene graph generation complements the ambiguity of image captions expressed in natural language, it does not effectively express semantic information describing an image. Therefore, a method of adding semantic information by incorporating the Resource Description Framework (RDF) model [8] into conventional scene graph generation is proposed in this study. Although several works in the relevant literature [2,3,4,9,10] also applied the RDF model to image content, these studies did not utilize deep learning technology-based models. The proposed method detects image objects and relations using deep learning and attempts to express them using an RDF model, as shown in Figure 1.
The contributions of the present study are as follows. First, an RDF model for generating a scene graph is proposed. Semantic expression in a controlled vocabulary is possible by expressing the existing scene graph using an RDF model. Specifically, scene graph sentences that are logically contradictory can be filtered out using an RDF schema (RDFS). Furthermore, an image can be queried using SPARQL [11], an RDF query language. Second, deep learning technology was applied to the RDF model-based scene graphs. Although image content was also described in [3,10] using the RDF model, a user had to manually input the relation in [3], and a machine learning method was applied to detect only image tags in [10]. In this study, relations between image tags were detected using a deep learning model during generation of image scene graphs. Furthermore, subjects and objects were detected using a Convolutional Neural Network (CNN) model in the RDF-based scene graph, and a property relation was trained using a Recurrent Neural Network (RNN) model. Third, our results indicate that the expression of scene graphs can be improved significantly using the proposed inference approach with RDF-based scene graphs.
The remainder of this paper proceeds as follows. Various image annotations and image search methods are examined as related work in Section 2, and the proposed methods are described in Section 3. We conclude by presenting test results through a comparison with the conventional method in Section 4.

2. Related Work

Ontology-based image annotation has been studied extensively [1,2,3,4,5,8]. Image annotation using tags is mainly used for image search by identifying rankings and meanings of tags. I-TagRanker [5] is a system that selects the tag that best represents the content of a given image among several tags attached to it. In this system, first, a tag extension step is performed to find images similar to the given image, following which tags are attached to them. The next step is to rank the tags using WordNet (https://wordnet.princeton.edu/) in the order that they are judged to be related to the image while representing detailed content. The tag with the highest ranking is the one that best represents the image [12] suggested semiautomated semantic photo annotation. For annotation suggestion, [12] proposed an effective combination of context-based methods. Contextual concepts are organized into ontologies that include locations, events, people, things and time.
An image annotation and search based on an object-relation network was proposed in [13]. Their method entailed finding objects within images based on a probability model by finding parts of images, referred to as segments, and searching images using ontologies to represent the relation between objects. Compared to other approaches, images that are much more semantically similar can be distinguished using this method.
There have also been extensive studies conducted on image processing and tagging in mobile devices [14]. Images were searched using context information provided by smart devices in [14]. The context information used included time and location, as well as social and personal information. Context information was annotated when capturing a camera image; subsequently, the annotated information enables a user to find the desired image through a search. However, only the query words provided by the system can be used because the annotation form is standardized.
A model that shows the semantic relation between tags in an image using a knowledge graph was proposed in [15]. A relational regularized regression CNN (R3CNN) model was initialized along with AlexNet pretrained using ImageNet, using five layers of CNN, three pooling layers and two fully connected layers in [15]. First, tags were extracted from an image, then the image tags were vectorized using equations proposed in the study and the distance between vectors was calculated. Subsequently, the vectors were used for training. Although [15] is extremely useful for detecting relations between image tags, their method requires a well-established knowledge base.

3. Proposed System

3.1. Semantic Scene Graph

In this study, we constructed semantic scene graphs using RDF. The RDF was established by the W3C (World Wide Web Consortium) and was designed to be used to describe additional information pertaining to objects to be described, as well as hierarchical and similarity relations between data. In other words, it provides a way to define data and provide a description or relation.
The RDF is usually described using a triple model with a subject, predicate and object. The subject denotes the resource data to be represented, and the predicate denotes characteristics of the subject and indicates a relation between the subject and the object. The object, then, is the content or value of a description. Each value can be described using a uniform resource identifier (URI). The RDF not only describes actual data but also supports the RDFS, a schema that describes the types of terms used in the data and relations between them.
If the scene graph is described using the RDF(S) model, it can be utilized as shown in Figure 2. The upper and lower parts of Figure 2 represent the RDF model and RDFS, respectively. Our proposed scene graph generation method creates a representation of relations between objects in an image and can prevent semantically incorrect scene graphs, such as “Jone wears Banana” or “Jane eats shirt,” from being generated. For instance, RDFS can set “person” as the subject and “fruit” as the object for the predicate “eats.” Likewise, “person” could be used as the subject and “cloth” as the object of the predicate “wears,” to generate a semantically correct scene graph. Although image annotations were represented using the RDF model in [3,4], in these works the user manually inputs the subject, predicate and object. The uniqueness of the present study lies in our application of the RDF model to scene graph generation using a machine learning algorithm.

3.2. Creating a Deep Learning-Based Scene Graph

Image tagging was performed using a CNN for image annotation in a previous study [10]. However, training images using a CNN makes it impossible to understand the semantics of objects. Furthermore, since the previous method made a direct SPARQL query on DBPedia (https://wiki.dbpedia.org/) for detecting relations between images tags, it was impossible to find a relation that was not already stored in DBPedia. Therefore, although we performed image tagging using a CNN, as in the previous study, our approach predicted relations between tags after vectorization and image tagging using a long short-term memory (LSTM) network, as well as gated recurrent units (GRU), an RNN model. The overall process of the proposed method is as shown in Figure 3.
First, in our proposed method, tags are created for image objects using a CNN model. In particular, we used the ResNet-101 model [16] as the state-of-the-art (SOTA) CNN model for image tagging because it uses skip-connection techniques and has effective performance on ImageNet. The process for image tagging using ResNet-101 is as follows. First, the CNN model extracts the features of images using a filter and then stores them in various feature maps. The fully connected (FC) layer uses a value projected by the number of each class (tags) from each layer to classify the final image of the extracted feature map. In the output layer, the predicted value of the image tag t i is output using the Sigmoid activation function, shown in Equation (1) below:
σ ( x ) = 1 1 + e x
The Sigmoid function σ is used for outputting values for each class and is defined as follows in Equation (2):
t i = σ ( W x i + b )
where W is the FC layer value, and b is the bias term. Then, we adopt a binary cross-entropy loss function given as follows:
1 = i = 1 N ( j = 1 M y i j l o g ( t i j ) + ( 1 y i j ) l o g ( 1 t i j ) )
where x i is the input image, N is the number of image datasets, M is the number of tags, t i j is a predicted value of a tag j for an image x i , and y i j is the correct answer for the image x i . For example, y i j = 1 indicates that the image x i contains tag j. The process of detecting an object in an image using the CNN model is shown in Figure 4.
The disadvantage of processing images using a CNN is that it becomes impossible to understand the semantics of the objects (tag words). To understand the meaning of words, techniques such as Word2Vec [17], which expresses words as vectors, have been created. Furthermore, systems such as ConceptNet [18] have become available to help computers understand the relationships between words. ConceptNet is a semantic network designed to enable computers to understand the semantics of words used by people. It provides a 300-dimensional tag representation vector w i per word embedded. w i is a 300-dimensional vector that passes through the embedding layer in the image tag t i , represented as follows:
w i = w o r d 2 v e c ( t i ) ( w i 300 )
In this paper, we propose an RDF model-based scene graph describing an image using triple sets of the form <Subject, Predicate, Object> (For convenience, the <Subject-Predicate-Object> triple is denoted as <Subj, Pred, Obj>. The <Subject, Object> pair is expressed as <Subj, Obj>.). Thus, we predict the predicate p j using t s u b j and, t o b j , selected from the image tag set { t 1 , t 2 , , t i } . We use an RNN model, able to store previous data in a hidden state, to predict the sequence relationship of <Subj, Pred, Obj>. When a value x t is input at a time step t, h t is obtained and recorded by calculating with the previous hidden state h t 1 . In particular, we used LSTM [19] architecture and GRUs [20], which are RNN models with excellent performance. LSTM learns by controlling memory cells with three gates (input, forget and output) in the hidden state. It also converts backpropagation into addition after multiplication by the RNN. Thus, it alleviates the disadvantages of gradient vanishing and exploding. GRUs modify the three gates of LSTM to two gates (update and reset) and have the advantage of being relatively simpler than LSTM.
In this study, t s u b j and t o b j were not sequenced. For example, we cannot know which t o b j comes after the word ‘person.’ Thus, our method uses the pair <Subj, Obj>, combining t s u b j and t o b j . Because a general RNN model flows in one direction, we have <Subj, Obj> ≠ <Obj, Subj>. t s u b j and t o b j are transformed into w s u b j and, w o b j respectively. w s u b j , o b j is defined as follows as Equation (5).
w s u b j , o b j = c o n c a t [ w s u b j , w o b j ] ( w s u b j , o b j 600 )
From Equation (5), we can calculate the predicate p j as follows:
p j = σ ( L S T M ( w s u b j , o b j ) )
Finally, the loss function is defined as:
2 = s u b j , o b j j = 1 M y j l o g ( p j ) + ( 1 y j ) l o g ( 1 p j )
The relation prediction process is illustrated in Figure 5. We summarize the algorithm of the semantic scene graph, as described in Algorithm 1.
Algorithm 1 Training Algorithm.
Require: image tags { t 1 , t 2 , , t i } ∈ T
Procedure
1: Fine tune image tagging model on images using ResNet (Equations (1)–(3))
2: for i = 1, 2, ⋯ epoch do
3:    for t s u b j in T
4:     for t o b j in T
5:      tags t s u b j , t o b j convert to w s u b j , o b j (Equations (4) and (5)).
6:      Predict p j using w s u b j , o b j (Equation (6))
7:      Use stochastic gradient descent to find optimal and backward LSTM
      using loss function (Equation (7))
End procedure

3.3. Scene Graph Expansion Using Inference

The existing RDF graph introduced in the RDFS has the RDF entailment rule [21] as a constraint. Essentially, it has a structure that infers a sentence from the presence of another sentence. The entailment rule considered in this study was the RDF entailment rule derived from the RDF semantics, as shown in Table 1. For example, Rule 11 shows the subClassOf relation as a transitive closure. If resource U is a subclass of resource V, and resource V is a subclass relation of resource X, automatically resource U has a subclass relation of resource X. Although this extended entailment rule and data type entailment rule are provided, additional entailment rules are not considered in this study.
In this study, we exploited the RDFS entailment rule to generate a semantic scene graph. Additionally, user-defined rules were applied for inference in this study. Although user-defined rules were not defined in the RDFS entailment rules, they enable users to create their own rules and apply them to image annotations. However, inference based on the existing RDFS entailment rules is not well-adapted to context-specific inference because the predefined rules are a limitation. Therefore, the user can infer a triple that represents an image annotation after defining the inference suitable to the situation in advance. For example, if the annotation information on the location of the image taken using a mobile device is “Seoul” and there is a triple <Seoul, isPartOf, Korea>, the location of the image includes “Korea.” Additionally, user-defined rules that can represent general situations were applied in this study, as shown in Table 2. Although [22] also proposed semantic inference rules in the domain of fuel cell microscopy, [22] assists users in constructing rules with domain-specific features, such as colors and shapes.

4. System Implementation and Testing

The test dataset used in this study for visual relation detection [6] was composed of 5000 images. The experimental dataset had a total of 37,993 relations, 100 objects and 70 predicates. Duplicate relations with the same subject-predicate-object were removed, regardless of the image. Therefore, 4000 images and 6672 relations were used; among which 1000 images and 2747 relations were used as test data. Eighty-six <Subject, Object> pairs that were not used in training were used to predict the relations.
The test was implemented using Windows 10, with a Nvidia GTX 2060 6 GB GPU, running Python 3.7.4 and PyTorch 1.5.1 to conduct training. All the training processes took place in the same environment.
We combined CNN and RNN models to implement the proposed method. The ResNet-101 architecture using ImageNet was used for the CNN model. The weights were initialized with values pre-trained on ImageNet. The following Table 3 shows the Top-K error index tested using ImageNet.
We compared our method with [16] as the SOTA image recognition model [16] improved performance by using a skipped-connection technique. In our model, we replaced the last layer of ResNet-101, the FC layer, with the number of t i and performed fine-tuning. In the case of the ResNet-101 model, the learning rate was 0.001, reduced by 0.1 every 20 epochs. Top-1 error and Top-5 error indexes were used to evaluate image classification performance. Top-1 error is the error rate in which the top selection predicate by the model is not the correct answer. Top-5 error is the error rate without a correct answer among the top five categories predicted by the model. Although our performance was lower than that of [16] for image recognition, we focused on predicting the relationships between image tags.
The result of predicting the predicate using the <Subject, Object> pair after extracting the tag using the CNN is shown in Figure 6. In the case of the LSTM model, the learning rate was 0.005, the embedding size was 300, the hidden size was 128 and the num layer was set to 1. The test was conducted by setting the prediction of each relation such that it exceeded.05 probability when passing through the Sigmoid activation function during relation prediction. Precision, recall and F1-Score were used as evaluation criteria. We found that the proposed LSTM method had higher precision than the GRU method, as shown in Figure 6. Although the recall of the proposed method was lower than that of the GRU, its F1-Score was higher.
An example of the proposed method is shown in Figure 7. The first image shows the triple <Subject, Predicate, Object> obtained through the LSTM. However, some cases such as the triple <chair, touch, table> are not semantically correct because the “touch” predicate uses “person” as its domain type. The <person, wear, horse> triple in the third picture is also a semantically incorrect expression because the type range of “wear” is limited to “cloth.” However, a predicate that is not related to an actual image can be created. This is because all possible predicates were created based on <Subject, Object> in the dataset. For example, output tags {person, shirt, hat, horse, phone, street} were created in the third picture. Then, we predicted the predicate for all pairs of output tags, such as <person, shirt>, <person, hat>, …, <street, phone>.

5. Conclusions

With recent advances in the mobile environment and the exponential growth of image content, semantic image search has become critical. In this study, an RDF model was incorporated into the proposed scene graph generation method, as the semantics of the scene graph could be made more meaningful by describing it using the RDF model. Furthermore, the generated scene graphs could be described logically because they used a controlled vocabulary. CNN and RNN models were also applied to the RDF-based scene graph generation method to learn the semantic information of images. RDF model inference rules, and user-defined rules were used to enrich the annotation information of the scene graph.
In the future, we will conduct research building on the present study to further specify semantic images information using a deep learning model based on image context information in a semantic scene graph in RDF format.

Author Contributions

S.K. and D.-H.I. conceived the problem and supervised the overall research; T.H.J. implemented the algorithm and performed the experiments; I.R. and J.A. clarified some points that helped D.-H.I. write algorithm; S.K., I.R., and D.-H.I. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2021-2018-0-01417) supervised by the Institute for Information & Communications Technology Promotion (IITP). This work was also supported by the Research Resettlement Fund for the new faculty of Kwangwoon University in 2020. Further, this research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2019R1D1A3A03103802).

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Margues, O.; Barman, N. Semi-automatic Semantic Annotation of Images using Machine Learning Techniques. In Proceedings of the ISWC: International Semantic Web Conference, Sanibel Island, FL, USA, 20–23 October 2003. [Google Scholar]
  2. Gayo, J.E.L.; De Pablos, P.O.; Lovelle, J.M.C. WESONet: Applying semantic web technologies and collaborative tagging to multimedia web information systems. Comput. Hum. Behav. 2010, 26, 205–209. [Google Scholar] [CrossRef]
  3. Im, D.-H.; Park, G.-D. Linked tag: Image annotation using semantic relationships between image tags. Multimed. Tools Appl. 2014, 74, 2273–2287. [Google Scholar] [CrossRef]
  4. Im, D.-H.; Park, G.-D. STAG: Semantic Image Annotation Using Relationships between Tags. In Proceedings of the 2013 International Conference on Information Science and Applications (ICISA), Pattaya, Thailand, 24–26 June 2013; pp. 1–2. [Google Scholar]
  5. Jeong, J.; Hong, H.; Lee, D. i-TagRanker: An Efficient Tag Ranking System for Image Sharing and Retrieval Using the se-mantic Relationships between Tags. Multimed. Tools Appl. 2013, 62, 451–478. [Google Scholar] [CrossRef]
  6. Lu, C.; Krishna, R.; Bernstein, M.; Fei-Fei, L. Visual Relationship Detection with Language Priors. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
  7. Li, Y.; Ouyang, W.; Zhou, B.; Wang, K.; Wang, X. Scene Graph Generation from Objects, Phrases and Region Captions. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1270–1279. [Google Scholar]
  8. Resource Description Framework(RDF): Concepts and Abstract Syntax. Available online: http://w3.org/TR/2014/REC-rdf11-concepts-20140225 (accessed on 17 November 2020).
  9. Chen, H.; Guo, A.; Ni, W.; Cheng, Y. Improving the representation of image descriptions for semantic image retrieval with RDF. J. Vis. Commun. Image Represent. 2020, 73, 102934. [Google Scholar] [CrossRef]
  10. Shin, Y.; Seo, K.; Ahn, J.; Im, D.H. Deep-learning-based image tagging for semantic image annotation. In Advanced in Computer Science and Ubiquitous Computing; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  11. SPARQL Query Language for RDF. Available online: http://w3c.org/TR/rdf-sparql-query/ (accessed on 17 November 2020).
  12. Elliott, B.; Ozsoyoglu, M. A comparison of methods for semantic photo annotation suggestion. In Proceedings of the 22nd International Symposium on Computer and Information Sciences, Ankara, Turkey, 7–9 November 2007. [Google Scholar]
  13. Chen, N.; Zhou, Q.; Prasnna, V. Understanding Web Image by Object Relation Network. In Proceedings of the International Conference on World Wide Web, Raleigh, SC, USA, 26–30 April 2010. [Google Scholar]
  14. Xia, S.; Gong, X.; Wang, W.; Tian, Y.; Yang, X.; Ma, J. Context-Aware Image Annotation and Retrieval on Mobile Device. In Proceedings of the 2010 Second International Conference on Multimedia and Information Technology, Kaifeng, China, 24–25 April 2010. [Google Scholar]
  15. Cui, P.; Liu, S.; Zhu, W. General Knowledge Embedded Image Representation Learning. IEEE Trans. Multimed. 2017, 20, 198–207. [Google Scholar] [CrossRef]
  16. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
  17. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the ICLR, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
  18. Speer, R.; Chin, J.; Havasi, C. ConceptNet5.5: An Open Multilingual Graph of General Knowledge; AAAI: Menlo Park, CA, USA, 2017. [Google Scholar]
  19. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  20. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proceedings of the NIPS, Montréal, QC, Canada, 8–13 December 2014. [Google Scholar]
  21. Hayes, P. RDF Model Theory. W3C Working Draft. Available online: http://www.w3.org/TR/2001/WD-rdf-mt-20010925/ (accessed on 17 November 2020).
  22. Hollink, L.; Little, S.; Hunter, J. Evaluating the application of semantic inferencing rules to image annotation. In Proceedings of the 3rd International Conference on High Confidence Networked Systems, Berlin, Germany, 15–17 April 2004; Association for Computing Machinery (ACM): New York, NY, USA, 2005; p. 91. [Google Scholar]
Figure 1. Overview of semantic scene graph generation.
Figure 1. Overview of semantic scene graph generation.
Applsci 11 00826 g001
Figure 2. Resource description framework (RDF) and RDF schema for semantics.
Figure 2. Resource description framework (RDF) and RDF schema for semantics.
Applsci 11 00826 g002
Figure 3. Proposed system.
Figure 3. Proposed system.
Applsci 11 00826 g003
Figure 4. Convolutional neural network (CNN) model for image tagging.
Figure 4. Convolutional neural network (CNN) model for image tagging.
Applsci 11 00826 g004
Figure 5. Relation prediction process.
Figure 5. Relation prediction process.
Applsci 11 00826 g005
Figure 6. Performance evaluation. Long short-term memory (LSTM) vs. gated recurrent units (GRU).
Figure 6. Performance evaluation. Long short-term memory (LSTM) vs. gated recurrent units (GRU).
Applsci 11 00826 g006
Figure 7. Semantic scene graph.
Figure 7. Semantic scene graph.
Applsci 11 00826 g007
Table 1. RDFS entailment rules.
Table 1. RDFS entailment rules.
RuleIf E ContainsThen Add:
1.X A YA rdf:type rdf:Property
2.A rdfs:domain X
U A Y
U rdf:type X
3.A rdfs:range X
Y A V
V rdf:type X
4.U A B
B A U
U rdf:type rdfs:Resource
U rdf:type rdfs:Resource
5.U rdfs:subPropertyOf V
V rdfs:subPropertyOf X
U rdfs:subPropertyOf X
6.U rdf:type rdf:PropertyOfU rdfs:subPropertyOf U
7.A rdf:subPropertyOf B
U A Y
U B Y
8.U rdf:type rdfs:ClassU rdfs:subClassOf rdfs:Resource
9U rdf:subClassOf X
V rdf:type U
V rdf:type X
10.U rdf:type rdfs:ClassU rdfs:subClassOf U
11.U rdfs:subClassOf V
V rdfs:subClassOf X
U rdfs:subClassOf X
12.U rdf:type rdfs:ContainermembershipPropertyU rdfs:subProperty rdfs:member
13.U rdf:type rdfs:DatatypeU rdfs:subClassOf rdfs:Literal
Table 2. Example of User-Defined Rules.
Table 2. Example of User-Defined Rules.
Rule TypeExample
Season InferenceBetween March and May it is spring, while from June to August it is summer, and from September to November it is autumn, and from December to February it is winter
Location InferenceIf the location is part of Seoul, the location is also part of South Korea
Time InferenceIf the time is eight to twelve o’clock, it is morning
Table 3. CNN backbone Top-K Error.
Table 3. CNN backbone Top-K Error.
ArchitectureTOP-1 ErrorTOP-5 Error
Ours22.736.44
ResNet-101 [16]21.756.05
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, S.; Jeon, T.H.; Rhiu, I.; Ahn, J.; Im, D.-H. Semantic Scene Graph Generation Using RDF Model and Deep Learning. Appl. Sci. 2021, 11, 826. https://doi.org/10.3390/app11020826

AMA Style

Kim S, Jeon TH, Rhiu I, Ahn J, Im D-H. Semantic Scene Graph Generation Using RDF Model and Deep Learning. Applied Sciences. 2021; 11(2):826. https://doi.org/10.3390/app11020826

Chicago/Turabian Style

Kim, Seongyong, Tae Hyeon Jeon, Ilsun Rhiu, Jinhyun Ahn, and Dong-Hyuk Im. 2021. "Semantic Scene Graph Generation Using RDF Model and Deep Learning" Applied Sciences 11, no. 2: 826. https://doi.org/10.3390/app11020826

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop