ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES

Page 1

ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGE PROCESSING FUNCTIONALITIES The problem of Natural Language Understanding is one of the first problems researchers in AI were trying to solve. There is a fundamental need for a system which can incorporate this ability of natural language understanding into the web and create a web which could understand and reason. This type of web is known as the semantic web which is in its nascent stages of development. This concept of Semantic Web employs tools and software which aims to achieve this very understanding which is needed in future Web Technologies. Some tools include Ontologies, Natural Language Software, Information Retrieval Algorithms...Etc. We propose a system which solves this problem of natural language understanding. The proposed system is capable of accepting inputs in form of perfect English sentences i.e. Natural English and produce outputs which are both pertinent and correct in nature. The system performs Part-of-speech Tagging which reduces the complexity of the input and makes it perceivable to the Natural Language parser which effectively translates the natural language query to a query on a database Schema. This particular Database Schema is built on a solid knowledge based logic obtained from a very domain specific Ontology. Hence, data is stored devoid of logical errors and information retrieval become easier when this is coupled with the Natural Language Technicality. We use the domain “Demographics of India” for demonstration purposes. The proposed model consists of four modules namely, the POS Tagger, The NLP Parser, The Ontology and Database, User Interface. The POS Tagger gets the input sentence and recognizes the various Parts-ofSpeech of each and every word in the Sentence. This is vital to the project because without understanding the context of the input words, it will be difficult to build a query with it. The second part of the project is the NLP Parser which gets the output of the POS Tagger and builds a query based on the original Natural English sentence. The Ontology is the integral part of the project as it houses the logic and the structure of the knowledge base which is simulated by the database. It is this database which is modeled after the Ontology which will be queried to produce the final result. The User Interface is used to get Input and display the result.

Existing System: Today’s Web Services are not organized enough to operate in a intelligent way. This is precisely why we need the Semantic Web in order to overcome the thinking deficiency of the present day Web. Since my project is a new concept which is in its nascent stages, We cannot compare it with one particular existing system completely. But when we compare the web as a whole, we can clearly say that the current web is totally keyword oriented and it lacks the infrastructure and resources to be able to provide solutions in a logical artificially intelligent manner. For example, almost all search engines in the internet make use of only keywords and not the logic behind the query. In simple words, they just match words found in the search string and try to give results based on previous visits by all users and popularity level rather than trying to provide the correct answer to the query or the most logical answer to a question. Currently, only one search engine namely “ask.com” in the internet has accomplished this partially by using Natural language processing and partially using keyword based techniques. But even, it does not use the concept of Ontology to be able to dynamically update itself and learn from queries.

Demerits of the Existing System:


The existing system does not have the capability to produce pertinent and accurate actions for the user. In this case of a web service to answer questions like in a search engine, the existing system fails to deliver the results as the narrow keyword based approach and the algorithm which depends on the number of successful previous search results does not help a lot to find answers to questions. Eg :- The existing system will not return a good answer if the search query is “Which states does the Yamuna flow through?” because it depends on a keyword based approach which will search for individual keywords like ‘river’, ‘Yamuna’, ‘flow’…etc and will return the most famous (i.e.) most visited result.

Proposed System: The Architecture of the proposed System is shown below in

Fig 3.1 Architecture of the Proposed System This system will accept natural English language as input and will produce pertinent and correct results as output. It accomplishes this by having a strong logical Knowledge Base which is built on the lines of a logically adept Ontology which makes sure that the created database remains logically sound and correct.

The Natural language is processed by using a POS Tagger and a NLP Parser both of which ensure that the English sentence is correctly understood by the system and a corresponding query is generated for querying the Database modeled on the Demographics of India – Ontology. Finally, the User interface is done in JSP and will be used to accept the input and display the output after it is processed. The POS Tagger and the NLP Parser are done in Java and the User interface is done in JSP and the Ontology is created with the Protégé tool and the database using MySQL 5.1. This System will accept simple queries and has the extension capabilities to add more support to queries which are more complex in nature. Madhura Raju This project was proposed in May, 2009


9/1/2009 A Report

On

The Enervation of IPv4

leading to The Adoption of IPv6

Submitted by Madhura

Page 1 of 8


9/1/2009 The Enervation of IPv4 leading to the Adoption of IPv6 The limitations of the IPv4 have paved a clear cut way for the deployment of the IPv6. This is tardily hitting the world and is undoubtedly anticipated to replace the IPv4. A deep delve into this subject will give a trenchant picture of why, how and when the whole process is transitioning. Before the subject matter, the fundamentals of the outgrowth are discoursed. Internet Protocol Address All the Information exchanged between the sender and the receiver in the internet, irrespective of type and content, is structured in the form of “packets�. A packet is a perfectly sized block of data, also known as datagram. Each of these packets includes a header and a message (also called payload). The header will have the source and the destination address. Internet Protocol is a standard that assists the various devices on a network to communicate with one another, having a unique IP address. This unique address serves the purpose of identifying every device on both the Local Area Network and the internet. The primary objective of the IP addresses is to route the packets in the network to their respective destinations.

Format of the IP Address The IP Address is represented as a 32 Bit address, written as 4 whole numbers separated by periods. Each number employs 8 bits and thus, can range from 0 to 255. Every IP address has two sections: NetID and HostID. The NetID is the identifier for the network while on the contrary HostID refers to the particular host in the network. The NetID is the

Page 2 of 8


9/1/2009 unique Internet Number that can be requested from the Network Information Center (NIC). The HostID, also known as the Machine or local address, represents the specific machine in the network that is linked. IP Address Management The Internet Assigned Numbers Authority is the key entity behind the successful management of the IP address. The IANA collaborating with the five Regional Internet Registries allocates IP blocks to the Internet Service Providers. The ISPs in turn apportions the IP addresses to the individual nodes or networks. Thus the ISPs bridge a connection between the network or individual machines to the Internet.

Domain Name System The DNS is a Naming System that simplifies the task of remembering perplexed IP addresses. It translates the address into alphabetical representation that can be easily put to memory. A DNS Server is used to interpret the Domain Name into IP addresses. Though the task sounds simple, it is a complicated operation because the Domain Name System database is apparently one of the most accessed databases on the internet.

IPv4 Exhaustion The current version of the Internet Protocol is IPv4, which is apparently the first version that is most extensively used. IPv4 employs 32 bit addressing and has a limitation up to 2

32

(4,294,967,296) unique addresses. As mentioned before we know that every device on a network has a unique IP address, this has obviously led to an insufficiency of the IP addresses. IPv4 caters around 4.29 billion IP addresses. However apart from these, large

Page 3 of 8


9/1/2009 blocks of addresses are allocated to various organizations. These blocks are not available for public allocation, which introduces the problem of inadequacy of the IP Addresses. However if there was a way of redefining the address blocks into a pool of regular addresses that can be used for public allocation, the problem can be alleviated, but to a very small extent. This is because the whole operation gets more complicated, expensive and is time consuming. Solutions to deal with the Exhaustion: Temporary: These issues are temporarily palliated by Network Address Translation in which one Internet Protocol address can be shared among many hosts in a Local Area Network. The main function of NAT is IP masquerading, which conceals a whole private Network behind one public IP address. This undoubtedly serves as a major mitigation process of the exhaustion. Other temporary ways of mitigations are Classful Networks, Classful Inter-Domain Routing (CIDR) and the Virtual Private Network (VPN). The only permanent solution to this problem could be migration to the new version - IPv6.

As the usage of Internet grew exponentially, the depletion of the IPv4 address space was foreseeable. In the “IP Address Space Report� by Jeoff Huston, he predicts that the RIR pool will be exhausted by March, 2013. We know that the main task of RIR is to allocate and register the internet number, within a particular region. Regional Internet Registry includes 5 regions operating:

Page 4 of 8


9/1/2009 1. American Registry for Internet Numbers (ARIN) 2. RIPE Networking Coordination Centre for Europe (RIPE NCC) 3. Asia-Pacific Network Information Centre (APNIC) 4. Latin American and Caribbean Internet Addresses Registry 5. African Network Information Centre The IANA delegates the RIRs the responsibility of providing the required resources in the form of Internet Addresses to the customers or end users. Thus the RIRs allocate the respective IP addresses to the various stations in a network.

Permanent: After a good amount of experimenting, a consensus was reached that the best way to handle such a crisis is the adoption of the IPv6. This IPv6 is a Next Generation Internet Protocol. This was recommended by the IPng Area Directors of the Internet Engineering Task Force in Toronto in 1994 after which it was approved by the Internet Engineering Steering Group and made into a “proposed Standard”. Then in 1997 it was made into a “Draft Standard”. After the period of exhaustion, the allocation of the IPv4 will be terminated. Now will be the time for the IPv6 to play its part. The speciality of IPv6 is that the number of unique addresses it provides is 2

128

.It has a 128 Bit address unlike

the 32 Bit address of IPv4. Once we reach the threshold of the exhaustion of the IPv4, organizations will start deploying IPv6.

Page 5 of 8


9/1/2009 This piece of information has already been circulated to the various organizations that employ the IPv4, by the American Registry for Internet Numbers. ARIN has predicted the deficiency of the Internet Numbers to take place within two years. It is suggested that the firms start planning on IPv6 adoption to continue acquiring additional IP addresses.

The large address size of the IPv6 is an obvious advantage of the IPv6 over IPv4. The header of IPv6 is designed in such a way that it fastens the routing process. In IPv6, the packets supported in the payload are more than those in IPv4, this is called Jumbograms. This increases the overall performance over high throughput networks. Other advantages of IPv6 are Multicasting, high Security (due to IPsec being a part of its core) and auto configuration of the hosts when connected to the IPv6 Routed Network.

The Transition: The transition from IPv4 to IPv6 is not a task that can be done overnight. It requires a lot of complications to be handled perfectly. This changeover is a prolonged process, mainly because of the extended horizon of the internet and the IPv4 users galore. This could be cited as a reason for the delay in the transition. One important thing to be noted in this process is that IPv4 and IPv6 can coexist without any issues. This confirms that the organizations should only process an up gradation. This migration has to be done node by node in the routed network. This can employ the auto configuration procedures to avoid manual operations. A closer study of the IPv6 reveals various interesting facts. The IPv6 is designed in such a way that its addresses can be derived from the IPv4 addresses. Another interesting feature is that the IPv6 nodes

Page 6 of 8


9/1/2009 conform to the Dual Stack approach. This means the IPv6 nodes can support both IPv6 and IPv4 at the same time. Thus the migration involves a comfortable interoperability of IPv4 and IPv6, distribution of the IPv6 routers and hosts in a gradual manner and easy comprehensibility among both the Network Administrators and the End Users. To assists the conversion, a list of mechanisms, is implemented. This is called Simple Internet Transition (SIT).The SIT attends to the progressive updating of the IPv4 hosts and routers to IPv6 one at a time. Secondly it facilitates address simplicity, in which IPv6 can use even IPv4 addresses.

Thus when the migration is ascertained the manufacturers will start integrating IPv6 in the networks, routers and operating systems and the users will adapt themselves to the new change.

Page 7 of 8


9/1/2009 Bibliography 1. The letter to the CEOs from John Curran, Chairman, ARIN

https://www.arin.net/knowledge/about_resources/ceo_letter.pdf.

2. A chapter on migration: http://www.cu.ipv6tf.org/literatura/chap12.pdf 3. IPv4 and IPv6 threat comparison and best practice evaluation: http://seanconvery.com/v6-v4-threats.pdf

4. IPv4 Address Report by Geoff Huston: http://www.potaroo.net/tools/ipv4/index.html

5. Internet Address Spacing by Organization for Economic Cooperation and Development: http://www.oecd.org/dataoecd/7/1/40605942.pdf

6. Notes on Internet Protocol: http://www.wisegeek.com/what-is-ip-orinternet-protocol.htm

7. The Choice: Exhaustion or Transition http://www.6journal.org/archive/00000285/01/the_choice_ipv4_exhaustion_or_tr

ansition_to_ipv6_v4.4.pdf 8. American Registry for Internet Numbers: https://www.arin.net/resources/request/index.html

Page 8 of 8


03 October 2008

Software Quality-V unit OOAD

SOFTWARE QUALITY


03 October 2008

is Software Quality-V Unit OOAD

components in isolation necessary‌ but not sufficient.

q Verifying

For this:

q Each component will behave perfectly. q Collective behavior is correct.

High level of confidence:

To develop and deliver Robust systems:

INTRODUCTION

Software Quality Assurance


03 October 2008

è("First actual case of bug being found.")

Log book

Software Quality-V Unit OOAD

èThere was a “moth” trapped between the machine.

è Working on the Harvard university, Mark II relay calculator that was room size, that experienced a problem.

èSeptember 9 , 1946

th

èGrace Murray Hopper during final days of WWII

History of how “debugging” came into existence


03 October 2008

Software Quality-V Unit OOAD

qIs the process of detection and elimination of the logical bug

TESTING:

qIs the process of eliminating the syntactical bugs

DEBUGGING:

DEBUGGING AND TESTING


03 October 2008

errors

Software Quality-V Unit OOAD

an indication of quality

performance

requirements conformance

What Testing Shows

Software Quality Assurance


from

03 October 2008

not perform the way you intended.

Software Quality-V Unit OOAD

端 Logic errors occur when the code does

attempts an impossible operation

端 Run-time errors occur when a statement

errors result incorrectly constructed code.

端 Language

Software Quality Assurance Types of Errors


Software Quality-V Unit OOAD

This is called “Testing the boundary conditions”

Ú

03 October 2008

anEmployee.computePay(hours)

Ú

Ú

Searches a class’s method for clues of interest and then tests the clues. E.g.: Ú Payroll compututation method, Employee class:

Error -based testing

ERROR BASED TESTING & SCENARIO BASED TESTING

TYPES

QUALITY ASSURANCE TESTING


03 October 2008

Software Quality-V Unit OOAD

what product does. Ø Capture use cases and user’s tasks and perform them as tests Ø More complex and realistic Ø Covers higher visibility interaction bugs, though not find everything.

Ø Concentrates on what user does, than

Ø Also called User-Based Testing

Scenario -based testing


03 October 2008

Software Quality-V Unit OOAD

but it can establish the “acceptability”

It can’t prove the correctness of the system

use a combination of the following: Ø Blacking box testing Ø White box testing Ø Top down testing Ø Bottom up testing

ÚThere are many strategies, but most

TESTING STRATEGIES


?

INPUT

??? OUTPUT

端Concept: to represent a system whose inside workings not available for inspection. 端In a black box, test item treated as BLACK, since logic is unknown. 端Only the input and output is known, not the implementation part.

Software Quality Assurance Black Box Testing


03 October 2008

Software Quality-V Unit OOAD


03 October 2008

Sy stem

O utput test r esults

I nput test da ta

Oe

Ie

BLACK BOX TESTING

Software Quality-V Unit OOAD

O utputs w hich r eveal the pr esence of defects

beha viour

anom alous

I nputs causing


03 October 2008

Statement Testing coverage

Software Quality-V Unit OOAD

Branch testing coverage

端 Main use in : error-based testing.. 端 1 form of white box testing is: path testing!

be tested to guarantee proper functioning.

端 Assumes that the logic is important and must

White Box Testing


03 October 2008

INPUT

Software Quality-V Unit OOAD

OUTPUT

端Software testing approach that uses inner structural and logical properties of the program for verification and deriving test data 端Also called: Clear Box Testing, Glass Box Testing and Structural Testing

White Box Testing

Software Quality Assurance


03 October 2008

Software Quality-V Unit OOAD

ü Test Stubs are used to simulate the components of lower layers that have not yet been integrated. ü No drivers are needed

ü Do this until all subsystems are incorporated into the test

ü Test the top layer or the controlling subsystem first ü Then combine all the subsystems that are called by the tested subsystems and test the resulting collection of subsystems

Top Down Testing

Software Quality Assurance


E

G

as new modules are integrated, some subset of tests is re-run

stubs are replaced one at a time, "depth first"

F

top module is tested with stubs

03 October 2008

Software Quality-V Unit OOAD

needs more testing than an individual object’s method.

Assumes that main logic or object interaction of application

D

C

B

A

Top Down Testing

Software Quality Assurance


03 October 2008

Software Quality-V Unit OOAD

and proceeds to a higher level. 端 More appropriate 端 Test each object, combine, test their interaction and the messages passed among the objects. 端 Leads to integration testing , which leads to systems testing.

端 Starts with the details of the system

BOTTOM UP TESTING


03 October 2008

D

C

E

B G

Software Quality-V Unit OOAD

worker modules are grouped into builds and integrated

drivers are replaced one at a time, "depth first"

F

A

Bottom up testing

Software Quality Assurance


testing èReusability of tests

èImpact of inheritance in

appear.

plausible ÚSome new errors might

ÚSome could become more

less plausible

ÚSome types of errors could become

IMPACT OF OBJECT ORIENTATION ON TESTING


03 October 2008

Software Quality-V Unit OOAD


test case= one that detects undiscovered errors.

§ Successful

errors

errors. § Good test = high probability of detecting

§ It is Process of exe a program with intent to find

èTo test a system: Ú Construct test input, Ú describe how output will look, Ú perform tests Ú compare with expected output. èMyer's objective of testing:

TEST CASES


03 October 2008

Software Quality-V Unit OOAD


03 October 2008

meet them. Software Quality-V Unit OOAD

Ú Should state the test objectives and how to

Ú Users might demand a test plan with the product.

potential problems before delivering the softwares to the users. Ú It offers road map for testing activities, whether usability, user satisfaction or quality assurance tests.

Ú Test plan: is developed to detect and identify

TEST PLAN


of

the

test

§

Develop input and output data and test ÚTest analysis Examination of the test output and § documentation.If, errors, debug and repeat until no error-state.

case

ÚDevelopment

ÚObjectives of the test § Create and describe how?!

STEPS TO CREATE TEST PLAN


03 October 2008

Software Quality-V Unit OOAD


03 October 2008

routine updates. Software Quality-V Unit OOAD

ü Keep configuration information and complete

ü Sync test plan & product and keep up to date

possible about the tests. ü a Schedule and a list of required resources ü Document every type of test ü Tracking the changes to the code.

ü Try to include as much as detail as

Guidelines for Developing Test Plans

Software Quality Assurance


Bug Locating principles

DEBUGGING PRINCIPLE

Software Quality-V Unit OOAD

Think If you reach an bottleneck, sleep on it If the bottleneck remains, describe the problem to someone else Use Debugging tools Experimentation should be done as a last resort

03 October 2008

ü ü ü ü ü

v

BUG LOCATING PRINCIPLE

Software Quality Assurance Myer’s Debugging principles


Debugging principles

03 October 2008

Software Quality-V Unit OOAD

as the size of the program increases. 端 Beware of the probability that an error correction will create a new error.

端 The Probability of the solution being correct drops

端 Fix the problem ,not just the symptom of it.

端 Where there is one bug, there is likely to be another

v


03 October 2008

deactivated Ú And so on..

still wrongècard is

Software Quality-V Unit OOAD

Ú Wrong pin number enteredè 3 chances provided è

Ú Act of vandalism occursè sound alarm

Ú If the cash is lowèsystem notifies bank

Ú Transaction completedèshow main menu

Ú Password incorrectè error Msg , card ejects

Ú If bank client inserts cardè password request

Ú Lets consider the test case of a ATM system:


03 October 2008

Raju Software Quality-V Unit OOAD

èMadhura

èèTHANK YOUçç

new questions that could refine the system close to perfection.

ÚThe positive aspects is : arise to

ÚAt every iteration new issue is exposed.

ÚThis is an interactive process


11/11/2010 A Report

On

SOFTWARE USABILITY A NECESSITY NOT AN OPTION

Submitted by Madhura

Page 1 of 4


11/11/2010

SOFTWARE USABILITY A NECESSITY NOT AN OPTION

Usability in general refers to the quality of the ability to provide good service. This in portmanteau with software is a very important aspect in computer science. “Software Usability” is the process of how easy-to-use a software product or a user interface can be made. This directly relates to the customer satisfaction, without which the software product serves no purpose. This is not necessarily restricted to software, it can also pertain to Websites and Software design in general.

Need for Software Usability:

To keep the customers happy and comfortable, to increase the employee’s productivity and to augment the modus operandi of a company efficiently the Software Usability should be at a very good standard. Otherwise, the customers will not use or recommend the software due to the difficulties he is encountering; the employee will not be efficient as the content on the intranet is not so easily accessible and the company can be on loss due to ineffective operation of the websites.

Assessing Usability:

The first question to be asked when assessing the usability of software is: How easy is it to use? After the answer to this question, various other attribute should be checked. The software should be easily understood and be well-equipped enough to guide the user through the lifecycle. The tasks that the user requests, should be fulfilled faster than

Page 2 of 4


11/11/2010 usual, the software or the website should be quickly learned. If the user takes longer time, the product has to be reconstructured into a simpler application. The Target Users should be known well before hand and the design of the product should be done accordingly. Rapid prototyping and Feedback mechanisms can be carried out in order to ascertain the ease-of-use of the product. The users handling of the product or the website is observed closely. His level of satisfaction, the frequencies of errors he makes while working with the product, his efficiency and consistency in carrying out the operations without forgetting them are the factors that depicts the usability of the software. The whole procedure of assessing the usability of a end product or a website can be done by both third party consultants or by internal groups, like focus groups. There are various consultancy companies working on usability. The prototype should be cautiously tested and approved by a sample of customers. If there is a bad feedback about the usability, it should be handled by an expert group. Focus Groups are good, but testing directly by users is the best way to handle Usability. The number of steps taken by the user to accomplish their tasks without encountering errors and the usage of online help, when they come across a problem, is observed closely. Thus, Software Usability is a very important requisite for Customer Satisfaction and Increased User Efficiency.

Page 3 of 4


11/11/2010 Bibliography

1. http://www.usabilityfirst.com/. Last accessed 08 September 2009.

2. http://www.paciellogroup.com/resources/whitepapers/WPAssessingUsability.html. Last accessed 09 September 2009.

Page 4 of 4



Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.