This report defines and presents the Semantic Web from a non-technical business-orientated perspective. Its purpose is to introduce the reader to the objectives and components of the Semantic Web, discuss why it is arguably an improvement over the current Web, and highlight how it might be used by organisations using the Internet as a part of their business processes.
The current Web is based around HTML pages delivered to a Browser and subsequently displayed for human consumption rather than for meaningful manipulation by computer applications. Holzschlag (2002) states the original goal of HTML was to provide document markup for files sent via the Internet and, at heart, it is simply a document markup language focusing on the structure of a document. The current HTML-based World Wide Web was originally conceived and created by Tim Berners-Lee at CERN, the European organisation for physics research, to allow scientists who regularly exchanged papers to reference other papers. The Semantic Web represents his vision for the next generation Web of the future (Updegrove, 2005).
To place the concepts of The Semantic Web into a context, the report presents a worked example based on a haulage company transporting goods by truck between the UK and Europe, using a scenario based loosely on the example presented by Holloway et al (2002) in their report into delivering Web Services.
The Semantic Web and its Components
Introduction to the Semantic Web
The Semantic Web has been described as a vision of a next-generation Web that lets publishers create webpages containing notations designed to provide "meaning" to the data within the page content (Ogbuji, 2002).
Dr Dieter Fensel (Daconta, 2003) argues that the enormous amount of data in the current web is becoming increasingly difficult to search and process because it is primarily written in natural language, whereas the Semantic Web will explicitly represent the semantics of the underlying data and webpages, enabling a knowledge-based Web providing a qualitatively new level of service. He goes on to argue that this will improve the capacity of automated services to assist humans in achieving their goals through greater “understanding” of Web content, thus providing more accurate searching, filtering and categorisation of information, ultimately leading to an extremely knowledgeable system. In the same book, the authors claim that the key to understanding the Semantic Web is:
‘the path to machine-processable data is to make the data smarter … the technologies … are the foundations of a systematic approach to creating “smart data.”’
From this we can see that the Semantic Web is focused on the data itself, rather than the computer applications which create, manipulate and consume that data, or indeed the programming languages used to create those applications or the platforms on which they run. Passin (2004) describes it as a huge interoperable virtual database, where the technologies of the Semantic Web unify the description and retrieval of data.
Components of the Semantic Web
As part of his keynote speech at XML 2000, Tim Berners-Lee (2000) presented a slide containing the graphic in shown Figure 1 illustrating the layered component architecture of the Semantic Web. This section briefly describes these foundational components.

Figure 1 - The Semantic Web Architecture, © T. Berners-Lee
XML
XML (eXtensible Markup Language) arose from the need for a portable data format and provides a framework capable of representing almost any kind of data (Bond, 2004). Like HTML, XML is also a markup language using the same general tag notation. However, unlike HTML, XML allows authors to define their own tags and is strict about their usage – an XML document must be well formed, ie: comply with a defined set of rules (Ashbacher, 2000).
HTML has a well defined syntax where both the tag set and tag semantics are fixed, whereas in XML they are not. XML is a metalanguage used to define and describe other markup languages, it provides a facility to define new tags along with the structural relationships between them. Since tags are not predefined, there are no predefined semantics, so the meaning of the data contained within a pair of tags is defined by the author (Walsh, 1998). Because XML can encode data without explicitly declaring a language, and because the rules of valid XML are strictly defined, an XML document can by parsed and the data extracted and consumed by applications (Radcliff, 2001). In other words, an application does not require any prior language definition in order to extract the data from an XML document.
XHTML is a new version of HTML and uses XML to reformulate HTML in an attempt to ensure documents are well-formed and render correctly in a browser (Powell 2003). However, XHTML remains focused on the document’s display for humans rather than making it’s contents meaningful to computers. In contrast, an XSL stylesheet and stylesheet processor can take a well-formed XML document and transform it into an (X)HTML (or other format) document suitable for people to read (Berglund, 2006). The key point is that the transformation does not alter the data contained in the XML document in any way so it remains a meaningful data source to a computer application.
URI’s
The Uniform Resource Identifier (URI) forms the basic global addressing scheme for the entire Web, and provides a simple means for identifying a specific resource, such as a file, on the Web (Berners-Lee, 2005). For example:
http://www.ietf.org/rfc/rfc2396.txt
This URI identifies a specific individual text file on the Web. In the context of The Semantic Web, URIs are a component of Triples discussed below.
Resource Definition Framework
Manola and Miller (2004) describe the Resource Definition Framework (RDF) as a language for representing information about resources on the Web. Often it is used to provided metadata which is commonly defined as “data about data”, in other words, to give information meaning. A resource might be something tangible which can be downloaded such as a document or file, or an intangible entity such as a person or product which can be uniquely identified even if it cannot be retrieved directly from the Web. RDF is intended for situations where this information needs to be processed by machines, rather than simply displayed to humans, and uses URIs to identify resources.
RDF uses an XML syntax to identify a resource using 3 parts – the subject, the predictate and the object, forming a statement known as a Triple. Berners-Lee (2001) describes how RDF triples form webs of information about related things and how, since RDF uses URIs to encode information, the URI’s ensure the information is tied to a unique definition.
Web Ontology Language (OWL)
OWL, or Web Ontology Language, is another language used, like RDF, for Semantic Web documents intended to be processed by machines, rather than simply read by humans. It is used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms - an ontology (in Semantic Web terms) is the representation of these terms and their interrelationships. OWL builds upon RDF and has greater facilities for expressing meaning and semantics, and thus goes further in its ability to represent machine interpretable content (McGuinness, 2004).
Clark (2003) describes OWL’s purpose within the context of the Semantic Web as a language to create a formal specification for a knowledge domain, which can then be fed into a computer system in order to reason about the domain and it’s knowledge. He goes on to say that, unlike humans, computers can only reason over a knowledge domain when it has been formally specified in advance.
Agents
Hendler (2001) describes a vision of intelligent Semantic Web Agents that find and present a selection of possible choices for ways in which to meet their user’s needs, rather than doing everything for their user. These Agent programs can run on a variety of devices and are able to reason over, and make decisions based on, the semantics of the information described by technologies such as RDF and OWL. The hypothetical case scenario presented later in this report illustrates Agents at work.
Digital Certificates
A digital certificate is an electronic document which provides recognised proof of identity for an individual, a server, a company or some other entity. They are issued by Certificate Authorities (CA) such as Thawte (http://www.thawte.com/) or VeriSign (http://www.verisign.com/), and have the following characteristics (Hendricks, 2002):
- Binds the public key belonging to the entity identified by the certificate, so only the certified public key will work with the entity’s corresponding private key.
- Includes the name of the CA that issued the certificate, a serial number, expiry date, and the name of the entity the certificate identifies.
- Contains the digital signature of the issuing CA. This permits the certificate to act as a “letter of introduction” for the entity. The CA is likely to be recognised and trusted by end users who do not know the entity identified by the certificate.
Digital Certificates play an essential role in building a “web of trust” through digital metadata. Since the Web provides the capability for anyone to provide information about anything, the information provided by different sources can be in conflict, so it is essential to know who is providing the information and whether it is to be trusted. However, according to Zambonini (2005), that trust must also be in the context of the information being provided, for example X trusts Y to provide financial information but not necessarily information about the weather.
Why the Semantic Web is an Improvement on the Current Web
In their 2001 Scientific American article, Berners-Lee, Hendler and Lassila describe the Semantic Web as “…an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation”. As discussed earlier, the current Web mostly provides documents in natural language for people to read, rather than as meaningful data sources which enable computer systems to make decisions and subsequently carry out tasks.
Consider current search engines such as Google (http://www.google.com). If a person wanted to search the web for information on how the Semantic Web is an improvement over the current Web they could visit Google and enter the following search criteria:
“semantic web” improvement “current web”
This query is interpreted by the search engine as ‘return all web pages containing the words “semantic web” AND “improvement” AND “current web” in the text of the page’, and as of March 2006 returned 13,500 results. The key point is that the search engine just lists all pages containing all the specified words and/or phrases, it has no understanding of the actual information for which the user is searching, leaving them to plough through the links and read the pages to see if they are of any use. The search engine computers are not really working in conjunction with the person searching for information, they’re simply scanning their web page indices for basic word matches, leaving the person to decide if the pages contain the information they need. This is an illustration of Dr Fensel’s earlier argument that the amount of data on the current Web is making it increasingly difficult to search and process meaningfully.
The previous section which presented the components of the Semantic Web discussed the technologies that will make the information in web pages and other resources meaningful to computer systems. This is a key factor as to why the Semantic Web is an improvement over the current one. Consider again the above search engine query, this can easily be re-created as a Triple so that a Semantic Web search engine (or other agent) would understand precisely the information required, namely web pages (or other resources) which contain information pertaining to how the Semantic Web (the subject) improves (the predictate) the current Web (the object). Using the Semantic Web, the results of this search would contain only the information required.
To a search engine or other computer application, the documents available on the current Web are little more than unstructured raw data. The Semantic Web converts this raw data into information a computer system can process. Daconta et al refer to the maxim that “knowledge is power”, and that organisations with the best information gain a competitive advantage. They argue that knowledge is the conversion of raw data into useful and meaningful information for decision making. This is the benefit of the Semantic Web over the current Web – where the current Web provides us with raw data the Semantic Web provides us with information and therefore knowledge.
To reiterate Dr Fensel’s earlier argument, by explicitly representing the semantics of the underlying data, the Semantic Web will be knowledge-based, providing a new level of service, compared to the current Web, improving the capacity of automated services to assist people in achieving their goals.
Impact to Internet Business
The Semantic Web has much to offer consumers, for example the use of a Semantic Search Engine such as Swoogle (http://swoogle.umbc.edu/) to improve search results, however, it has a great deal more to offer companies using the Internet as part of their business processes. Daconta et al argue that the Semantic Web can benefit businesses in the following areas:
- Decision Support
- Business Development
- Information Sharing and Knowledge Discovery
- Administration and Automation
Companies using the Internet for Supply Chain Management can save huge amounts of resources. One way of accomplishing this is through Enterprise Application Integration (EAI), sharing information between applications both internally and with external trading partners. XML currently features heavily in EAI, for example through projects such as ebXML (Electronic Business using eXtensible Markup Language - http://www.ebxml.org/) and products such as Microsoft’s BizTalk Server (http://www.microsoft.com/biztalk/default.mspx). RDF on the other hand builds on the capabilities of XML in order to facilitate easier and more meaningful integration between systems.
In their paper titled "Integrating Applications on the Semantic Web", Hendler et al (Hendler, 2002) describe how there is often an overlap between a company’s stock control, sales order processing, CRM and accounting systems which, if integrated using shared information, would save a lot of re-keying and associated errors (not to mention time). They describe how "glue code" is developed to extract the data from one system, reformat it, and load it another system. Companies suffer because these systems use differing internal data formats and are not seamlessly integrated. However, if the data from each of these disparate systems is made available in RDF, it can be seamlessly joined on those terms which are defined to correspond to the same URIs. This is possible because the RDF data is made up of triples where each triple expresses the value of one unique property of something, so the data from the separate systems will be referring to the same entity – even if they use different names for it (such as different account numbers or stock codes). This allows companies to use the “best of breed” for each of their disparate internal systems and yet integrate them seamlessly. Also, so long as their trading partners make their data available in RDF, the Semantic Web allows them to seamlessly integrate their respective systems across the Internet irrespective of the backend systems or platforms involved.
As mentioned earlier in the introduction to this report, in order to describe the Semantic Web “in action” the following paragraphs present a hypothetical example of a haulage company transporting goods from the UK to Milan.
An Agent at the office constantly monitors the position of each of its lorries via GPS, and monitors traffic sites and news channels for any alerts which may affect the planned routes and timings. The Agent receives information from a trusted source that there has been a strike by dockers at Dunkirk, blocking the port. It checks the position and route of the lorry en route to Milan, calculates that this will delay its journey, and sends a message to the Agent in the lorry which then informs the driver of the potential problem. The driver’s Agent checks the drivers tachograph for the number of legal driving hours remaining and relays this back to the Agent in the office.
The office Agent then proceeds to check for alternative routes and calculates that it can re-route the lorry to Folkestone and through the Eurotunnel to Calais. It cancels the booking for the Dover/Dunkirk ferry and makes a new booking for the Eurotunnel instead. Once the Agent at the Eurotunnel has confirmed the booking the change of route is sent to the driver’s Agent which updates the route planner in the lorry and informs the driver. Finally, the Agent back at the office re-calculates the lorry’s arrival time at its Milan destination and sends a message to the Milan Agent telling it to re-schedule the lorry’s booking-in time.
The actions of the Agents in this example would have been completed in a matter of seconds with no human intervention, their ‘understanding’ of the meaning of the information they received allowed them to make decisions based upon it, and take the subsequent actions. Compare this with a scenario where a person hears about the strike on the news, checks whether it affects a lorry, rings the driver to check his location and hours, checks alternative routes, cancels the ferry, books the Eurostar, rings the driver back with the new information, then e-mails the warehouse in Milan. By utilising the technologies of the Semantic Web, the companies involved would have saved a lot of time and resources, eliminated the possibility of human error and obviated any potential data transformations (and thus misinterpretations) between their separate systems. Each system would have understood the precise meaning of the information to which the other was referring, and the web of trust between them would have enabled rapid and secure business transactions.
Hopefully this simple example illustrates one of the potential benefits the Semantic Web has to offer companies when implemented both internally and externally over the Internet with their trading partners.
Conclusion
This report has presented a high level discussion of the Semantic Web, including what it is, the components and technologies behind it, how it improves on the current Web, and the benefits it can bring to businesses operating via the Internet. The Semantic Web focuses on the data itself, rather than the technologies that create and consume it. By making the data “smarter”, it can become meaningful to computer systems, within a precisely defined context, so that they can use it to make computerised decisions and carry out tasks to help people achieve their goals. It lessens the need for humans to carry out mundane tasks and frees up resources, whilst at the same time improving accuracy and productivity.
Whilst the Semantic Web remains a vision rather than an established mainstream platform, work continues in order to bring the vision to fruition. It is the recommendation of this report that forward-thinking organisations should monitor the development of the Semantic Web technologies and emerging standards, and keep them in mind during the design and development of their systems. As the Semantic Web emerges into the mainstream, those organisations will be better placed to reap the benefits the Semantic Web has to offer.
David Jones
05/07/2006
References
Ashbacher, C., (2000), Teach Yourself XML in 24 Hours, Indianapolis, SAMS
Berglund, A., Extensible Stylesheet Language (XSL) Version 1.1 [online], Available from: http://www.w3.org/TR/xsl11/ (Accessed 14/03/2006)
Berners-Lee, T., (2000), Architecture Slide [online], Available from: http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html (Accessed 14/03/2006)
Berners-Lee, T., Fielding, R.T., Masinter, L., (2005), Uniform Resource Identifier (URI): Generic Syntax [online], Available from: http://www.gbiv.com/protocols/uri/rfc/rfc3986.html#intro, (Accessed 10/03/2006)
Berners-Lee, T., Hendler, J., Lassila, O., (2001), The Semantic Web [online], Available from: http://www.scientificamerican.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2 (Accessed 15/03/2006)
Bond, M., Law, D., Longshaw, A., Haywood, D., Roxburgh, P., (2004), Teach Yourself J2EE in 21 Days, USA, SAMS
Clark, K.G., (2003), The Semantic Web is Closer Than You Think [online], Available from: http://www.xml.com/pub/a/2003/08/20/deviant.html (Accessed 14/03/2006)
Daconta, M.C., Obrst, L.J., Smith., K.T., (2003), The Semantic Web: A Guide to the Future of XML, Web Services and Knowledge Management, Indianapolis, Wiley Publishing, Inc.
Google Help : Advanced Search [online], http://www.google.co.uk/help/refinesearch.html (Accessed 17/03/2006)
Hendler, J., (2001), Agents and the Semantic Web [online], Available from: http://www.cs.umd.edu/~hendler/AgentWeb.html (Accessed 14/03/2006)
Hendler, J., Berners-Lee, T., Miller, E., (2002), Integrating Applications on the Semantic Web [online], Available from: http://www.w3.org/2002/07/swint (Accessed: 16/03/2006)
Hendricks, M., Galbraith, B., Irani, R., Milbery, J., Modi, T., Tost, A., Toussaint, A., Basha, S.J., Cable, S., (2002), Professional Java Web Services, Birmingham, UK, Wrox Press Ltd.
Holloway, S., Din, Z., Adams, A., Strange, A., Carroll, J., Harris, R., (2002), Delivering Web Services – The Sun™ ONE Vision [online], Hertfordshire, Computacenter, Available from: http://www.computacenter.co.uk/strategy-guide/sunoneguide/default.asp (Accessed 09/03/2006)
Holzschlag, M.E., (2002), Using HTML and XHTML, Indianapolis, Que
Manola, F., Miller, E., (2004), RDF Primer [online], Available from: http://www.w3.org/TR/REC-rdf-syntax/, (accessed 13/03/2006)
McGuinness, D.L., van Harmelen, F., (2004), OWL Web Ontology Language Overview [online], Available from: http://www.w3.org/TR/owl-features/ (Accessed 14/03/2006)
Ogbuji, U., (2002), The Languages of the Semantic Web [online], Available from: http://www.newarchitectmag.com/documents/s=2453/new1020218556549/index.html (Accessed 15/03/2006)
Passin, T.B., (2004), Explorer’s Guide to the Semantic Web, Greenwich, USA, Manning Publications Co.
Powell, T.A., (2003), HTML & XHTML: The Complete Reference, Fourth Edition, California, McGraw-Hill/Osborne
Radcliff, C., (2001), Perl for the Web, Indianapolis, New Riders
Updegrove, A., (2005), The Semantic Web: An Interview with Tim Berners-Lee [online], Available from: http://www.consortiuminfo.org/bulletins/semanticweb.php (Accessed 09/03/2006)
Walsh, N., (1998), A Technical Introduction to XML [online], Available from: http://www.xml.com/pub/a/98/10/guide0.html (Accessed 10/03/2006)
Zambonini, D., (2005), Context in the Web of Trust [online], Available from: http://oreillynet.com/pub/wlg/7776 (Accessed 14/03/2006)