Search Result Visualisation with xFIND
Keith Andrews, Christian G ¨
utl, Josef Moser, Vedran Sabol, Wilfried Lackner
Institute for Information Processing and Computer Supported New Media (IICM),
Graz University of Technology
A-8010 Graz, Austria
The xFIND gatherer broker architecture provides a wealth of metadata, which can be used toprovide sophisticated search functionality. Local or remote documents are indexed and sum-maries and metadata stored on an xFIND broker (server). An xFIND client can search a partic-ular broker and access rich metadata for search result presentation, without having to fetch theoriginal documents themselves. Search result sets are not only presented as a traditional rankedlist, but also in an interactive scatterplot (Search Result Explorer) and using dynamic thematicclustering (VisIslands).
Introduction and Status of Research
The amount of information readily accessible to individuals at their workplace and at home is rapidly in-creasing. There are now more than one billion unique, indexable web Monolithic, centralised search engines are increasingly unable to cope with the exponentialgrowth of the web. Since individual pages are visited perhaps only once or twice a year, the centralised indexis inherently out of date. More general search queries often return many hundreds or thousands of matchingdocuments. Hence the motivation for both a scalable resource discovery framework and for visualisationtools to help end users explore search result sets.
Wwas a client-server indexing and retrieval system, using a protocol derived
from The Harvest is a distributed, gatherer-broker information discovery and access system for the web, an integrated set of tools, written largely in C,for gathering information from diverse repositories, building topic-specific content indices, and searchingthe indices.
Enplots the result of bibliographic searches in a two-dimensional
scatterplot. The mapping of particular attributes to visual representations such as the x-axis, y-axis, iconsize, and icon shape, is controlled by drop-down menus. Bead uses a force-directed placement technique to lay out relationships between documents in a corpus as a landscape. Moresimilar documents lie closer together in the landscape. Searches can be made and the results highlighted.
ThemeScape (part of the SPIRE text visualisation examines a corpus of text documentsand extracts a set of discriminating terms (words), usually nouns, which characterise topics in the corpus. Acombination of stoplists, synonym substitution, and statistical analysis is used to select 200 to 300 "good"discriminating terms. A landscape of topical terms is generated and documents contribute to the height fieldat each point, based on their contribution to each topic. Self-organising maps (SOMs) use neural networks to organise a set of text documents. The neural network is trained
: The distributed architecture of the xFIND system.
initially on a sample set of documents and thereafter can assign further documents to their correspondinglocation on a regular (hexagonal) grid in unsupervised mode.
The xFIND System
the extended Framework for Information Discovery, is a is a frame-work for distributed information discovery and knowledge management. For scalability, xFIND uses agatherer-indexer-broker architecture, similar to that pioneered by the Harvsystem.
As shown in Figure xFIND consists of gatherers
. For reasons of portability andplatform independence, xFIND is implemented entirely in Java.
In addition to traditional metadata like title, keywords, and description, xFIND also gathers and indexes
document headings, and information about embedded links and images. Metadata regarding aspects ofinformation quality, such as authority, diction, and target audience can be manually or semi-manually added.
The xFIND Gatherer
The gatherer gathers information about documents and resources, both local and remote, and pre-processesthis information. The gathering process is adaptive. At configurable intervals, entire servers, particular sub-sites, or individual documents can be gathered. To reduce network load, a local gatherer can be configuredto gather information from a locally mounted file system. The gatherer pre-processes HTML and plain textdocuments (soon also PDF) and generates a pre-defined set of metadata (a document descriptor or sum-mary) for each document. Thumbnails of embedded images are also generated and stored in the documentdescriptor. Only the document descriptors are passed on to one or more xFIND indexers.
The xFIND Indexer
The document descriptors harvested by a gatherer can be fetched by one or more authorised indexers. Theindexer indexes a set of document descriptors and renders them searchable. An indexer may specialise, forexample, in a particular topic or geographical location. Statistical information such as term frequency anddiscriminating terms (discriminators) are generated. Furthermore, trusted external systems (rating systems,ACF systems, etc.) are allowed to contribute additional metadata fields. The xFIND architecture alsoprovides for the contents of a particular index to be replicated in whole or in part, in order to minimisenetwork loads.
The xFIND Broker
An xFIND broker is the starting point for user interactions. A broker can distribute its search queries toa particular set of indexers. The broker is also able to expand queries using a thesaurus. The results of adistributed search are collated and compiled into a uniform search result set. Brokers can be individuallytailored for a division, a department, a group of employees or even for a single user as well as supportingparticular topics.
The xFIND broker provides standard search functionality (simple, extended, and expert search) through
HTML forms. Search queries can combine both full-text and descriptive and evaluative metadata. Thestandard result set is a linear list ranked by relevance.
The enhanced metadata set can be divided into two main parts: extracted document information and qualitymetadata. Automatically extracted metadata are generated by the gatherer, which identifies the URL, mimetype, file size, and creation or modification time of each document. Further information extraction dependson the type of the object, as well as the proper filter for processing the object. At the time of writing, plaintext and HTML filter are available for textual information, and an image filter (supporting gif and jpg fileformat) for multimedia documents. Support for PDF and audio and video formats is planned. For HTMLdocuments, common meta-attributes such as title, keywords, description, and language are parsed and pro-cessed by the gatherer. The full-text content is retained for indexing. Headings, links, and (thumbnails of)embedded images are extracted and retained as metadata.
The gatherer also creates an electronic fingerprint of each information object. This fingerprint suffices to
determine the trustworthiness of information in case of replication and allows detecting the origin of everypiece of information.
The xFIND system supports the integration of external metadata, for example for non-textual objects.
Since authors rarely enrich individual documents with additional metadata, xFIND supports the definitionof metadata for an entire document structure, a directory, or a particular document by inserting additionalmeta data files. More specific metadata overrides more general metadata.
Search Result Visualisation
The richness of metadata provided by xFIND can greatly aid users during the search process.
For the visualisation examples presented here, an xFIND broker specialising in the topic of Knowledge
Management will be used. At the time of writing, the broker has access to an index of some 44,878 doc-uments, gathered from the sites shown in Table as well as a number of other sites less frequently. Thequery used in each case is the single word "agents", for which there are 314 matching documents.
: These sites are indexed for the broker on Knowledge Management, as well
as a number of other sites at less frequent intervals.
Ranked List Search Results
The default presentation of search results by an xFIND broker takes the form of a traditional ranked list.
Figure shows the first two matching documents. The most relevant document to the query "agents" isentitled "Coordination as Distributed Search". Note the context of the query term "agents" is shown for eachdocument, the most important terms (discriminators) contained in each document are listed, and thumbnailimages are shown for documents containing embedded images. Figure shows 13 thumbnails for the ninthdocument in the result list.
Interactive Scatterplots with the Search Result Explorer
The Search Result Explorer uses a scatterplot (starfield display) to allow interactive exploration of the searchresult set based on the rich metadata associated with each object, in a manner similar to EnDocuments are plotted according to two of their metadata attributes (corresponding to the x andy axes). Further metadata attributes can be mapped to icon size and icon colour, allowing four dimensionsof metadata to be visualised and explored simultaneously. If too many documents would be mapped to thesame proximity, a group icon is used to represent that subset of documents. For group icons, the size andcolour of the group icon is determined (under user control) by the maximum, minimum, median, or averagevalue of the group's members. Since it is possible to zoom in on specific areas of the display, an overviewwindow is provided in the lower left corner to help maintain context and orientation.
Figure shows the first 210 (a user-configurable limit) of the 314 matching documents plotted by rel-
evance on the y axis and document size on the x axis. The most relevant document is shown at the top ofthe plot. At the moment, the colour of each document icon is determined by the document's age, from yel-low older documents to white recent documents. Relevance is mapped to icon size, providing a redundantencoding. More relevant documents are both larger and towards the top of the plot.
In Figure this document has been selected and its metadata displayed. It is the same document from
10th October 1998 entitled "Coordination as Distributed Search".
Figure illustrates the interactive nature of the plot. Most of the rich metadata attributes provided by
xFIND can be mapped to either axis or to icon size or colour. Figure shows the result of the change. The yaxis now corresponds to the modification date of the document, and document relevance is mapped to bothicon size and icon colour (more relevant are orange, less relevant are white). It can be seen at a glance, thatthe most relevant documents are about a year old and reasonably small.
Dynamic Clustering with VisIslands
The VisIslands interface supports dynamic thematic clustering of search result sets, in a manner similar toSPIRE's and its commercial successor
: The standard ranked list returned by an xFIND broker. Note the context of
the query term "agents", the most important terms (discriminators) contained in each
document, and the thumbnail images.
Image thumbnails of embedded images are generated and associated as
metadata with a HTML document object.
The Search Result Explorer plots search results along two axes. Here,
document relevance is mapped to the y axis and document size to the x axis. Morerelevant documents also have larger icons. Older documents are yellowish.
: Search Result Explorer: The top matching document has been selected and
its metadata displayed.
: Search Result Explorer: Most of the rich metadata attributes provided by
xFIND can be mapped to either axis, or to icon size or icon colour.
: Search Result Explorer: An alternative view. Modification date has been
mapped to the y axis. More relevant documents are now both larger and more orange.
: VisIslands: Pre-clustering using hierarchical agglomerative clustering has
identified the 22 clusters shown in the left-hand pane. Cluster 17 has been selected.
The search result set is first pre-clustered using hierarchical agglomerative clustering (or optionally k-
means The cluster centroids are then distributed randomly in theviewing rectangle. The documents belonging to each cluster, as determined by the initial pre-clustering,are then placed in a ring around each centroid. This arrangement is fine-tuned using a linear iterationforce-directed placement algorithm derived from Documents similar to one another areattracted towards each other. After a certain cut-off point, the arrangement has stabilised, and each documentcontributes its weight to the height field of the grids within which it lies. Dense areas of many documentshave corresponding peaks. The overall result is like a contour map of islands. A more three-dimensionalvisualisation of the islands would also be possible, but has not yet been implemented.
Figure shows the islands visualisation for the first 200 documents matching our example query. Pre-
clustering has identified 22 clusters. Cluster 17 concerns "mind, human, language" and has been selected.
Note the corresponding visual cluster of red documents in the islands display. Figure shows the metadataassociated with Cluster 17.
Figure focuses on Cluster 22, containing 108 documents. Note that fine tuning with force-directed
placement has attracted one document which pre-clustering assigned to Cluster 22 over towards the "web-master" and "architectures" clusters. Zooming in on Cluster 22, Figure shows that, in fact, many of thedocuments assigned to Cluster 22 on pre-clustering, should perhaps have been assigned to the cluster called"erl"auterungen".
The xFIND project is ongoing. Current work includes development of a utility to automatically detect thelanguage of a document, so that stop list and stemming algorithms can be applied automatically. Also, afilter for indexing PDF documents is planned.
On the visualisation side, the Search Result Explorer and VisIslands are currently independent xFIND
: VisIslands: The metadata associated with Cluster 17 is displayed. Its most
frequent terms include "mind", "human", and "language".
: VisIslands: Cluster 22 deals with a variety of topics including "proceed-
ings" and "conference".
: VisIslands: After zooming in on Cluster 22, the group of documents on
the right-hand peak has been manually selected.
clients. A Master's thesis just commencing to build an integrated search client incorporating query formula-tion and history, ranked result lists, and a framework for plugging in one or more synchronised visualisations,embracing both the Search Result Explorer and VisIslands.
The xFIND infrastructure builds on the original work of Harvest and provides a rich, highly configurableknowledge management framework. The Search Result Explorer and VisIslands visualisations demonstratethe potential of information visualisation techniques applied to the exploration of search result sets. It isintended to accompany this paper with a live demonstration of the system.
We would like to acknowledge the support of the IICM, Graz University of Technology, partial funding fromthe Austrian Ministry of Science, and the contributions of former colleagues and students: J¨urgen Heber,Axel Jurak, Bernhard Kn¨ogler, Herbert Legenstein, Susanne Mayr, and Erwin Weitlaner.
Anil K. Jain, M. N. M. and Flynn, P. J. (1999).
Data Clustering: A Review
ANSI/NISO (1995). Z39.50-1995
, Library of Congress.
Bowman, C. M., Danzig, P. B., Hardy, D. R., Manber, U., Schwartz, M. F. and Wessels, D. P. (1995). Har-
vest: A Scalable, Customizable Discovery and Access System. Technical Report CU-CS-732-94, Uni-versity of Colerado.
Cartia (2000). Mapping the Information Landscape.
Chalmers, M. (1993). Using a Landscape Metaphor to Represent a Corpus of Documents
. In Spatial
Information Theory, Proc. COSIT'93, pages 377–390, Boston, Massachusetts (1993). Springer LNCS716.
Chalmers, M. (1996a). Adding Imageability Features to Information Displays
. In Proc. UIST'96, Seat-
tle, Washington (1996a). ACM.
Chalmers, M. (1996b). A Linear Iteration Time Layout Algorithm for Visualising High-Dimensional Data
In Proc. Visualization'96, pages 127–132, San Francisco, California (1996b). IEEE Computer Society.
G¨utl, C. (2000). xFIND: Extended Framework for Information Discovery. IICM, Graz University of Tech-
G¨utl, C., Andrews, K. and Maurer, H. (1998). Future Information Harvesting and Processing on the Web
In Proc. European Telematics: Advancing the Information Society, Barcelona, Spain (1998).
Inktomi and NEC Research Institute (2000). Web Surpasses One Billion Documents. Press Release.
Kahle, B., Morris, H., Davis, F., Tiene, K., Hart, C. and Palmer, R. (1992). Wide Area Information Servers:
An Executive Information System for Unstructured Files
. Electronic Networking: Research, Applicationsand Policy, 2(1):59–68.
Kohonen, T. (2000). Self-Organizing Maps
. Springer, third edition. ISBN 3540679219 .
Nowell, L. T., France, R. K. and Hix, D. (1997). Exploring Search Results with Envision
. In CHI'97 Demonstra-
tion (Extended Abstracts), pages 14–15, Atlanta, Georgia (1997). ACM.
Nowell, L. T., France, R. K., Hix, D., Heath, L. S. and Fox, E. A. (1996). Visualizing Search Results: Some Alternatives
to Query-Document Similarity
. In Proc. SIGIR'96, pages 67–75, Zurich, Switzerland (1996). ACM.
WEBSOM (2000). WEBSOM - Self-Organizing Maps for Internet Exploration. Helsinki University of Technology.
Wise, J. A. (1999). The Ecological Approach to Text Visualization
. Journal of the American Society for Information
Research and Reporting Methods Annals of Internal Medicine Net Reclassification Improvement: Computation, Interpretation,and ControversiesA Literature Review and Clinician's Guide Maarten J.G. Leening, MD, MSc; Moniek M. Vedder, MSc; Jacqueline C.M. Witteman, PhD; Michael J. Pencina, PhD;and Ewout W. Steyerberg, PhD The net reclassification improvement (NRI) is an increasingly pop-
Sanofi and Lilly announce licensing agreement for Cialis® (tadalafil) OTC - Companies anticipate providing over-the-counter (OTC) product to treat erectile dysfunction after expiration of certain patents - PARIS, France, and INDIANAPOLIS, May 28, 2014 — Sanofi (EURONEXT: SAN and NYSE: SNY) and Eli Lilly and Company (NYSE: LLY) today announced an agreement to pursue regulatory approval of nonprescription Cialis (tadalafil). Cialis is currently available by prescription only worldwide for the treatment of men with erectile dysfunction (ED). Under the terms of the agreement, Sanofi acquires the exclusive rights to apply for approval of Cialis OTC in the United States, Europe, Canada and Australia. Sanofi also holds exclusive rights to market Cialis OTC following Sanofi's receipt of all necessary regulatory approvals. If approved, Sanofi anticipates providing Cialis OTC after expiration of certain patents. Terms of the licensing agreement were not disclosed. "This agreement provides us with an opportunity to work with Lilly, a leader in men's health, to transform how this important medicine is offered to millions of men throughout the world," said Vincent Warnery, senior vice president, Global Consumer Healthcare Division, Sanofi. "The opportunity to forge an industry-leading partnership that adds to Sanofi Consumer Healthcare's leading portfolio and successful track record of over-the-counter switches reinforces consumer health care as a major growth platform for Sanofi." "Millions of men worldwide trust Cialis to treat ED. We are pleased to work with Sanofi to pursue a path that could allow more men who suffer from ED to obtain convenient access to a safe and reliable product without a prescription," said David Ricks, senior vice president, Lilly, and president, Lilly Bio-Medicines. "Switching a medicine to over-the-counter is a highly regulated process that is data-driven and scientifically rigorous. Together with Sanofi, we look forward to working closely with regulatory authorities to define the proper actions and necessary precautions to help patients use over-the-counter Cialis appropriately." Cialis was first approved by the European Medicines Agency in 2002, then by the U.S. Food and Drug Administration in 2003, for the treatment of erectile dysfunction. Ultimately, Cialis has received approval in more than 120 countries for indications that vary by country, including erectile dysfunction and erectile dysfunction and the signs and symptoms of benign prostatic hyperplasia (BPH). Cialis reached $2.16 billion USD (€1.58 billion) in worldwide sales in 2013 and has recorded total global sales of more than $14 billion USD (€10.2 billion) since launch. To date, more than 45 million men worldwide have been treated with Cialis. About Cialis Currently only available with a prescription, Cialis is a tablet taken to treat erectile dysfunction (ED), the signs and symptoms of benign prostatic hyperplasia (BPH), and both ED and the