Framework for an ontology-based web search engine
The arrangement of this framework consists of an object-attribute-value extraction procedure from a natural English language query and a lightweight ontology-based search engine design [34]. Because most of the information available on the web is in natural language and not machine understandable, there is no way to understand the data and draw out semantic inferences. Ontologies can be used to model the information so that it can be easily interpreted by machines.
Sentence structure A typical clause consists of a subject and a predicate, where the predicate is typically a verb phrase and any objects or other modifiers, as shown in Fig. 1. The parse tree for a sample sentence statement clause is shown in Fig. 2.
Object-attribute-value extraction procedure
When passing the text through the proposed model shown in Fig. 3, it is broken down into clauses, which are then tokenized and passed through the WordNet analyzer. The WordNet analyzer provides characteristic properties for each lemma, such as the part of speech (POS), synonyms, hypernyms, hyponyms, etc. Later, an object is created for each of these individuals and is added to the ontology. When passing the clause through the triplet extractor, it continuously searches for nested and direct relationships using the existing ontology. The extracted O-A-V triplets are then passed through a semantic analyzer, which determines the true form of the various objects in the O-A-V triplet based on the context where it has been used. These triplets and updated individuals are added to the ontology along with the generation of a taxonomy. At the end of all of these processes, a well-defined semantic network is developed, which can then be used to enhance search engine web results, providing the user with a completely reformed search experience.
Algorithm design
For extracting nested relations, such as X’s Y’s Z, the triplet extractor continuously checks for relationships and creates empty individuals, which can later be updated based on their future occurrence. The individuals are then classified based on the context where they are used, e.g., “Tommy” will represent a dog based on the relationship “Sam’s dog Tommy” but not on the convention that we have always used the name “Tommy” to refer to a dog.
To analyze direct relations, such as X is Y, the semantic analyzer determines the group that both individuals belong to, compares them, and accordingly updates the O-A-V triplet based on previous occurrences of both the object and its value, as shown in Fig. 4.
To develop a hierarchy among the various identified groups, hypernyms of all of the groups are acquired using WordNet (based on their usage) and common ancestors are determined for each entity going up the hierarchy level. This process is continued until we reach the top-level entity (Thing). With all of the individuals classified into groups along with their relationships and a hierarchy, a taxonomy is develop, as shown in Figs. 5, 6, 7 and 8.
For parsing the sentences taken in Fig. 9 using the proposed algorithm, it generates the semantic networks shown in Fig. 10. The semantic analysis of direct relationships is shown in Fig. 4.
The Web Ontology Language (OWL) representation for the above semantic network is shown in Fig. 11. The entity recognition for unknown entities and known entities during the semantic analysis are shown in Figs. 12 and 13.
After analyzing the clause “Neo is a bull”, it determines the group to which Neo belongs using its previous occurrences and compares it with the group bull belongs to. After analyzing the sentence, the proposed algorithm determines a conflict and infers that bull represents certain characteristic of Neo and does not imply that Neo is actually a bull.
A lightweight ontology-based search engine design
The content in a web page is unstructured. A browser can recognize the type of content in a web page using the meta-data provided but has no means of understanding it. A sentence, such as “Karen is a cow”, is just another piece of text it has to render, but actually, it might be expressing Karen’s behavior or simply implying that Karen is a cow. A browser has no means to infer such interpretations by just reading the plain unstructured text available in a web page. An ontological representation of the web page is a possible solution to this dilemma. Ontologies can act as computational models and provide us with certain type of automated reasoning. They will enable semantic analysis and processing of the content in the web page. The following Fig. 14 will show the results of Google search engine for the keyword “Neo”.
The currently available functional search engines provide the best available web results based on various ranking algorithms but do not provide us with meaningful insight into the content of the web page. The information available with each web link is not sufficient to help the user select the most apt web page. To obtain detailed information, it creates the user tendency of blindly going to Wikipedia without even checking the other web results provided by the search engine. In a way, we are bound to various websites based on their reputation and neglect valuable information that might be available with other web pages. The user should be made aware of the contents of the webpages before he selects a link. This approach will enable the user to make a more informed choice and streamline the web surfing experience. To fill these gaps, the proposed architecture of the ontology-based search engine is given in Fig. 15.
Representing information with each web link in the form of O-A-V triplets provides the user with insight into the content on a web page. Because this information is extracted semantically using ontologies, it also allows the user to understand the type of content available on the web and is shown in Figs. 16, 17 and 18.
Proposed framework for ontology-based image retrieval
The arrangement of this framework is shown in Fig. 19 and consists of domain ontology development for the image contents and creation of an RDF for the image descriptions, subject-predicate-object extraction based on [34] and from the natural language queries given by the user and auto generation of SPARQL queries on the ontology to obtain ontology-based image retrieval results.
An ontology refers to a description of a conceptualization. It describes a domain in a formal way. With the help of nearby textual information, the web image retrieval is accomplished. There are text-based image retrieval engines in practice, such as Yahoo, Bing and Google. They use text features, such as file names, as indices for searching for images on the web. At the next level, they search for textual information surrounding the image in the web page. The content-based image retrieval works with low-level image features, such as color, texture and shape.
However, due to the limitations of the current image processing algorithms, there still exists a gap called the “semantic gap”, which occurs due to the lack of understanding of image semantics using image processing algorithms when they try to map it with human understanding of the images. Image retrieval search engines are still evolving. The low-level descriptors of these engines are far from semantic notions. The other types of systems only rely on annotations. Therefore, there is a need to define an intermediate approach for image analysis by building a domain ontology for image categories. Some systems may define a specific domain with the help of domain experts by identifying vocabularies used to describe objects of interest. For experimental purposes, the image data set from the IAPR TC-12 Benchmark is chosen from ImageCLEF 2006, which contains detailed image descriptions. The image domain ontology is developed as in Fig. 20 for the data set taken with all possible class concepts using Protege [35]. The RDF output is shown in Fig. 21.
Once the ontology is created successfully, it can be stored as an OWL file. The images are annotated with descriptions given along with the data set. RDFs of all individual images are embedded and converted to make a single RDF file which is uploaded to Jena Fuseki Server. Each RDF attribute would be stored as a tuple in the server space and hence the considerable amount of tuples should be generated. These tuples should return values if a proper SPARQL query is fired through Jena engine.
The retrieval of images in this framework has to undergo another crucial process of evaluating the user query, which is given in natural language.
Natural language processing
The user query given in the english language is passed to the NLP processor, which performs operations similar to the O-A-V extraction in the web model. The first step is to perform part-of-speech (POS) tagging. Therefore, the sentence is passed through a POS tagging function within the NLP processing unit. This unit returns a list of tagged words with their parts of speech as tuples. The subject in an english sentence will act as the object in the O-A-V triplet. To identify the subject from the sentence, we need to identify a noun phrase consisting of nouns and adjectives, which define the various properties of the noun. Similarly, the predicate of an english sentence acts as the attribute in the O-A-V triplet. To identify the predicate, we need to identify the verb phrase in the sentence. Every grammatically correct English sentence contains a subject and a predicate. For the purpose of this model, we extract only the adjectives, nouns and the verbs from the tagged sentence by eliminating the stop words from the query. Once the desired parts of speech have been extracted, the tagged sentence is parsed, separating the SUBJECTs, PREDICATEs and OBJECTs from the sentence. Regular expressions are used to group all consecutive nouns and adjectives as a noun phrase. The result is stored as a tree object and is then traversed and parsed to separate the subject, predicate and object. The result of this separation is shown in Fig. 22.
These three groups (subjects, predicates, objects) are then used to search for the appropriate images in the database. A number of operations and transformations are applied to the natural language query for extracting keywords. Part of speech tagging is performed, followed by splitting the query into sentences and further into word tokens. Noun, adjective and verb tokens are lemmatized and stemmed to their appropriate roots.
Auto generation of the SPARQL query
Normally in a SPARQL query, the FILTER operator is used to screen the desired output when querying from the database. For example, if a user enters a query and n keywords have been picked, then the best possible retrieval results will be the images whose descriptions contain all n query words. However, there may arise situations where not all keywords are present in the description. Therefore, this query will give no result. However, there may exist subsets of the n keywords, which are present in descriptions of the images. It is always safe to assume that an image with a description containing more query keywords is most likely to give a better retrieval result. Still, it is very difficult to determine which keywords to eliminate when trying the next query. Therefore, to tackle this problem, all combinations of the n keywords are queried. For n keywords, \(2^{n}-1\) subsets can be formed.
The combinations are considered for querying in decreasing order of the number of elements (keywords) in the set. The UNION operator is used to ensure that the results of all queries are considered. The DISTINCT operator eliminates duplicate results. The retrieved images will be in decreasing order of likeliness, similar to any search engine’s page ranking results with the top queries having higher chances of being the desired results. First, a function builds the phrase dictionary containing the Subject-Predicate-Objects. The function then generates queries for all of the words present in the dictionary shown in Algorithm 5.
An effective search query is one where the maximum number of keywords match the description of multiple images. The higher this intersection of keywords to description, the higher the chance of that particular image being the most appropriate image. It is logical to search for all keywords in the same description as the first query. The descriptions of images are searched using the FILTER operator. Filtering a description with all keywords of the search query is most likely to produce the best results. However, it is possible that the query keywords are not a complete subset of the description of the image. This query will give no result, even though some keywords match. Then, the next step would be to remove certain keywords and re-query the database, which is where the problem arises. It is impossible to identify keywords whose elimination will produce results. Therefore, the program creates all possible combinations of the keywords present in the phrases dictionary. The result is stored in a list that contains all possible combination shown in Fig. 23 of the phrase words for the query shown in Fig. 22.
If 'n' keywords have been selected in the phrase dictionary, then a total of \(2^{n}-1\) combinations are stored in the list AC (all combinations) as tuples, where every tuple represents one of the \(2^{n}-1\) subsets, consisting of the word and the POS as a tuple.The combination() function returns a list of all combinations of subsets in increasing order of number of keywords. Every subset contains tuples of words, where every tuple contains the keyword and the part of speech. For more effective results, this list is reversed before generating the query. This step ensures that the program will recursively consider all combinations of keywords in decreasing order of number of keywords while generating the query. Once the list of all combinations is generated and reversed, the elements of the list are considered one by one to generate the query. An element of the list is one subset out of the \(2^{n}-1\) subsets.
This subset represents a single SPARQL sub-query. All words inside the subset are the FILTER operator variable, which are to be searched for in the description of the images. The complete query is generated as follows: The query is initialized as just the prefix values in the beginning of the program. Every time the program runs, it generates a query string containing the prefix statements as the ‘SELECT DISTINCT * WHERE’ statement. Because every element of the list is a sub-query, the function adds ‘select * where ?identifier s0:description ?value.’ to the existing query. Every element in the list is a subset containing different combinations of the keywords. Once it is an element of the list, it considers every tuple in the subset to filter the description. The ‘FILTER (REGEX(STR(?value),“’ is then added to the query, which is followed by the keyword present in the tuple.
Before the keyword can be used to filter the description, it must be lemmatized. This process of lemmatization will help to consider all words, including different conjugations, infinitives, plurals, etc. Every filter expression is closed with ‘”, “i”))’. Before moving on to the next subset, every sub query ends with ‘\(\}\}\)UNION’. If a subset of AC contains m keywords, then m filter options are added to the query. The UNION operator is concatenated before moving on to the next tuple in the list. This procedure generates \(2^{n}-1\) sub-queries joined by \(2^{n}-1\) UNION operators, for (n) keywords.
However, for \(2^{n}-1\) sub-queries, only \(2^{n}-2\) UNION operators are required. Therefore, before returning the query, the function removes the last ‘UNION’ and adds a ??. The UNION operator ensures that all subsets are being considered while querying the database. Before the programs exits, the SUBJECT, PREDICATE and OBJECT phrases are displayed followed by the resultant query as shown in Fig. 22. The auto generated SPARQL query is shown in Fig. 24.
Ontology-based image retrieval using auto-generated SPARQL query
The above auto-generated query in Fig. 24 is fed to the Jena-FUSEKI Server. The results retrieved by the server are shown in Fig. 25. As explained previously, the results at the top are more likely to be more relevant than the results at the bottom. However, these results are not optimized.
Keyword proximity score based optimization
While generating the SPARQL query, all possible combinations of the O-A-V key words are included. The reasons for considering all possible combinations has been explained in the previous section. The search results from the Jena Fuseki-Server are optimized in a two step process. First, the results are in decreasing order of number of matching keywords. A description which has more common keywords with the query is more likely to be a return of a better picture. However, since many of the descriptions in the dataset are elaborate, it is possible that keywords are spread out over the description. Consider these query keywords and two of its results as an example:
-
query_KeyWords = [‘man’,‘looking’, ‘mountain’]
-
\(\langle\) upload_base/1111.jpg \(\rangle\) \(\langle\) A man standing on the roof and looking at the mountain \(\rangle\)
-
\(\langle\) upload_base/2222.jpg \(\rangle\) \(\langle\) A man is looking at his children playing near the lake across the mountain \(\rangle\)
Both image descriptions contain all three keywords of the query. However, in 2222.jpg, the context of the query is lost since there is considerable distance between the words. But considering 1111.jpg where the the description is more meaningful and the keywords are closer. Given a set of sentences containing equal number of keywords, the word distance or keyword proximity can further optimize the search results. A higher keyword proximity score for a image suggests that its description contains phrases which are similar to the user’s query.
The keyword proximity score is calculated by taking the absolute differences of the position of consecutively appearing keywords in the description of an image and then normalizing these total distances for a range 0–1 where zero represents low-proximity and one represents high-proximity. For descriptions with only one matching keyword, the score is kept as minimum = 0.001.
Since the retrieval results are first sorted by number of occurrences of keywords, a description with a higher number of keyword matches suggests more similarity to the query. Hence while sorting with the keyword proximity score, image descriptions with equal number of keywords are sorted together and then displayed in descending order of their keyword proximity scores, while maintaining the previous structure of decreasing number of keyword occurrences. The keyword proximity score of a description with n keyword occurrences, cannot be compared to another description with m keyword occurrences where \(n \ne m\). The keyword proximity score is comparative only amongst image descriptions with equal number of keyword occurrences. After optimization, the retrieval results in Fig. 25 are re-ranked and shown in Fig. 26 as follows. Every ranked result contains four items separated by a “|”. The items in order are the image location, the image description, the keyword proximity score for the image, the number of query keywords present in the image description.