Muddiest Point - How common is collaborative filtering in DLs?
How common is it for DL users to use Internet search engines to find digital content?
Can you talk more about web crawling technology? Specifically why is so much of the deep web academic in nature?
Reading question - How can DLs be structured to be accessed easily by Internet search engines?
Federated Searching:
- average users seeking information lack sophisticated search techniques. They don't want to search "they want to find"
-success of google demonstrates what type of searching the average information seeker wants to use.
-the universe of available content is no longer limited to that stored within library walls. the type of content users are looking for is less commonly cataloged than it had been in the past.
-"We shouldn't force users to predetermine the information source as a precondition to asking their question"
-Google proves that the best way to access information is often the simplest. more complex ways of accessing information block users from materials stored within that system.
-"Not all federated search engines can search all databases, although most can search Z39.50 and free databases. But many vendors that claim to offer federated search engines cannot currently search all licensed databases for both walk-up and remote users."
-"A federated search engine searches databases that update and change an average of 2 to 3 times per year. This means that a system accessing 100 databases is subject to between 200 and 300 updates per year—almost one per day! Subscribing to a federated searching service instead of installing software eliminates the need for libraries to update translators almost daily so they can avoid disruptions in service."
Z39.50 - Information Retrieval (Z39.50); Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-1995" -- is a protocol which specifies data structures and interchange rules that allow a client machine (called an "origin" in the standard) to search databases on a server machine (called a "target" in the standard) and retrieve records that are identified as a result of such a search
-"Z39.50 is one of the few examples we have to date of a protocol that actually goes beyond codifying mechanism and moves into the area of standardizing shared semantic knowledge. The extent to which this should be a goal of the protocol has been an ongoing source of controversy and tension within the developer community, and differing views on this issue can be seen both
in the standard itself and the way that it is used in practice."
-Recent versions of the standard are highly extensible, and the consensus process of standards development has made it hospitable to an ever-growing set of new communities and requirements.
-The OSI, or Open System Interconnection, model defines a networking framework for implementing protocols in seven layers. Control is passed from one layer to the next, starting at the application layer in one station, proceeding to the bottom layer, over the channel to the next station and back up the hierarchy.
-The protocol defines interactions between two machines only
-The basic architectural model that Z39.50 uses is as follows: A server houses one or more databases containing records. Associated with each database are a set of access points (indices) that can be used for searching. This is a much more abstract view of a database than one finds with SQL, for example. Relatively arbitrary server-specific decisions about how to segment logical data into relations and how to name the columns in the relations are hidden; one deals only with logical entities based on the kind of information that is stored in the database, not the details of specific database implementations
-A search produces a set of records, called a "result set", that are maintained on the server; the result of a search is a report of the number of records comprising the result set. The standard is silent as to whether the result set is materialized or maintained as a set of record pointers, and as to how the result set may interact with database updates that may be taking place at the server. Result sets can be combined or further restricted by subsequent searches
Search Engine Technology:
-How should libraries see the future of their information discovery services? Instead of a highly fragmented landscape that forces users to visit multiple, distributed servers, libraries will provide a search index, which forms a virtual resource of unprecedented comprehensiveness to any type and format of academically relevant content
-provide metadata based subject gateways to distributed content. Based on the OAI initiative, libraries and library service organisations are following the idea of "OAI Registries" as central points of access to worldwide distributed OAI repositories
-First of all, this is an acknowledgement that, particularly at universities, libraries deal with a range of users with often different usage behaviours
-Most systems focus solely on the search of metadata (bibliographic fields, keywords, abstracts). The cross-search of full text has only recently been introduced and is often restricted to a very limited range of data formats (primarily "html" and "txt").
How common is collaborative filtering in DLs?
ReplyDeleteCollaborative filtering (CF) is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. ---Cited from http://en.wikipedia.org/wiki/Collaborative_filtering
In my opinion, collaborative filtering is more like an information trapping, a RSS reader in the digital library. It can push topical information to a person who has viewed a topic or product of interest. In digital library, this technique is not mature,but we can see this filtering in the commercial database, such as Web of science and Scopus. Users can receive the latest activities of their interested fields.