Intelligent Agents and Chatterbots

It seems that Web 1.0 came and went without ever being called such.  And Web 2.0, having barely been given a palpable definition, and a ‘hi there’ wave by the consumer, is merely laying the foundation for Web 3.0.  And what is web 3.0?  The semantic web.

Each of these (Web 1.0, Web 2.0, Web3.0 or the semantic web) can be defined or characterized by the organization, processing and presentation of information.  In the case of Web 1.0, it was html, and the breaking point was web browsers for the masses (Netscape), and IP connections for the masses (AOL and Windows 95).  Portals were big players (Yahoo, Google, Excite, MSN), needed to spider over the web using various schemes to catalog and sort through the information at their disposal, cataloging the information so it could be consumed by the masses, consumers and individuals alike.  Transactions for real goods (eBay, Amazon) and electronic digital goods (iTunes) was happening realtime, and a new digital age was born.

Web 2.0 is characterized by collaboration, and content being equally provided and consumed the average internet citizens.  Blogs were the beginning (WordPress, etc), then video blogs (You Tube).  Fatter pipes through DSL and cable modems made the publication of rich media possible by the average consumer.  In business, collaboration in engineering projects extended to remote collaboration on marketing and sales presentations.  Mashups, Rich Internet Applications, Ajax, Desktop Widgets, and all the rest of the big happy web family gave their contributions to making the Web what it is today.  And the Web 2.0 wave is not over, not by a few years by my estimation (2007 now, it will wane in 2011 or so).

Each of these iterations overcame barriers, technological and otherwise, and then made leaps in progress, followed by a lot of hard work by the content producers and web programmers and database engineers who make it all go.  But each of these iterations also faced a similar problem, of bridging the gap between the primary and secondary domains, that is, the semantic gap between systems, and the semantic gap between the physical and the digital.  These are areas which are directly addressed by the Semantic Web.

Some of the solutions we have already seen for the systems gap include the Object Relational Mapping devices, like Hibernate, ActiveRecord, and others, which valiantly apply their solutions to the problem of the gap between relational database objects (SQL), and object oriented languages which use the data (Java, C++).  And these are solutions being applied internally by companies, to help their own systems talk to each other.

For textual data, such as is found in HTML, a more significant problem exists.  How is information represented, and how should it be represented, so that it can be understood and related?  Even with something as simple as product catalogs, there is not an easy standard so that a system, or a systematic human, could draw down product information consistently. And product information is a data set where it is fairly easy to determine the truth or accuracy of the data (depending on the product).    Research on most search engines is like working with a very dumbed down SQL language, which amounts to hit or miss on even finding the data.  Historical, medical, geological, or biographical information is much more difficult to find and parse.  And then realtime information, such as would be useful for collaboration, dates and times, and places, and so on, present their problems as well.

For the representation of data, a simple proposal has been the adoption of Microformats, or standard methods for representing common data elements (such as dates, times, places), and relationships between data elements.  A more complex but more complete solution is being drawn out by the standards organization for an ontological language which describes not only the standard representation of data elements, and their relationships, but a query language to access that information.  The
Protégé project
is doing remarkable work on this very area.  An older project, SHOE, attempted to use HTML as the basic information source for their ontology, but its members have moved on to RDF and other semantic web “standards” for their work.

The data analysis must be performed algorithmically, just as it is gathered and analyzed today by the spiders that wander the web.  However, the spiders who gather information for these ontological queries will have to be more sophisticated than today’s spiders.  They will necessarily make the jump from dumb robots, to intelligent agents, who have to make decisions based on what their goals are, the constraints placed upon them, and their determination of the true from the false.

Multi agent platforms are described, and the API presented, for the  Protégé-OWL framework.  Agent databases are called, in some agent frameworks, belief bases, as they are representations of what the agent believes to be true.  The framework describes an API for communicating with Reasoners, whose language is called DIG, a descriptive logic language.   And who gets to translate the solutions to the questions to the average guy like me?  Yet another agent, who is conversant in natural language and as well as the languages used by these agents, agent platforms and or reasoners. 

And the unintended consequence, for big business, will be the rise of the personal agent, as distrust in centralized datasources heavily weighted against personal and private concerns drive a demand for specialized (or biased) consumer agents, trusted agents, and defense agents.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: