Andrew McGraw LIS 2670

Friday, December 4, 2009

Access Management

Muddiest Point - why have digital materials in a library that you don't own copyright to? I would assume these are born digital materials but if not why incur the expense to digitize materials when the access to them is going to be restricted?

Reading notes - for several reasons, such as copyright and patient privilage (medical records), user access to digital libraries has to be restricted. Different users can only be permitted access to certain records contained in a digital library. There are several ways to accomplish this task. One is password and id verification another is through the users IP address. But it is important to keep in mind that management access will have adverse affects on the the user interface.

The central concept of the general model framework is that access is controlled through the creation of policies. These policies assigns users a set of digital material which they have permission to access while denying access to others.

Materials can also be assigned a level of risk and categorized that way so the level of access can be determined through that risk assessment. Analysis of users and their roles must also be examined when implementing access management.

Thursday, November 19, 2009

Reading notes

no muddiest point.

The concept of creating separate digital libraries, from an institutional standpoint, makes a good deal of sense. But while a good deal of attention has been focused on creating interoperability between these distinct digital libraries there are some that feel, despite these efforts, that the approach is not in the users best interest. That has led people to consider the World Wide Web as a model for the future of digital libraries. Google books represents this model to a degree. They're goal seems to become the digital library for digital books. Users have embraced and become accustomed to this structure because of their web experiences. They do not care who has created or digitized the material/information they are looking for but only that the information is easily retrieved. So the question is do we move toward creating the digital library as opposed to organizational digital libraries.

Thursday, November 5, 2009

Reading notes digital preservation

Muddiest Point: none this week

Main requirements of OAIS

1. provide long-term persistence of digital information
2. ensure access to that information
3. negotiaite for and accept appropriate information from information producers.
4. determine the scope of the archives user community
5. ensure that the preserved information can be understood by users without the assistance of the information producer.
6. make the preserved information available to the user community.

An OAIS must retain sufficient intellectual property rights, along with custody of the information in order to guarentee preservation of those materials. The three distinct parts: the producer of the information, the managers of the information, and the consumers of the information. Managers provides strategic planning, defining the scope of the collections, and ensuring preservation of materials. Producers submit the information to be preserved and associated metadata for ingest. Consumers are the users of the information.

The OAIS function model is a collection of 6 high-level services, or functional components that taken together fulfill the OAIS duel function of providing access and preservation. Those components are
1. ingest - the set of processes responsible for accepting information submitted by producers.
2. archival storage - part of the system that handles long-term storage and maintenance of ingested information.
3. data management - maintains a a database of descriptive metadata identifying and describing the archived information in support of the OASI's finding aids.
4. Preservation planning - responsible for mapping out the OAIS's preservation strategy.
5. Access - manages the the processes and services by which consumers locate, request, and receive delivery of items residing in the OAIS.
6. Administration - day to day management and coordinating the previous five elements.

There are many research challenges associated with long-term preservation in the digital realm. Traditional practices for preservation seem to be insufficient for use in the digital realm. Reasons for the need for a paradigm shift in digital preservation include
1. Traditional digital preservation tools can no longer keep pace with the complexity and dynamic mulit-media digital objects.
2. If long-term preservation is going to span decades the threat of interrupted management of digital objects is critical.
3. There are no formal models for dealing with the economic, social, technical aspects of preserving digital materials over time.
4. New tools and technologies are needed to streamline many of the processes associated with digital preservation and that support human decision-making.
5. Infrastructure needs to be created so that digital preservation becomes sustainable and effective.

Sunday, October 25, 2009

Reading Notes Retrieval:

Muddiest Point - How common is collaborative filtering in DLs?
How common is it for DL users to use Internet search engines to find digital content?
Can you talk more about web crawling technology? Specifically why is so much of the deep web academic in nature?

Reading question - How can DLs be structured to be accessed easily by Internet search engines?

Federated Searching:
- average users seeking information lack sophisticated search techniques. They don't want to search "they want to find"
-success of google demonstrates what type of searching the average information seeker wants to use.
-the universe of available content is no longer limited to that stored within library walls. the type of content users are looking for is less commonly cataloged than it had been in the past.
-"We shouldn't force users to predetermine the information source as a precondition to asking their question"
-Google proves that the best way to access information is often the simplest. more complex ways of accessing information block users from materials stored within that system.
-"Not all federated search engines can search all databases, although most can search Z39.50 and free databases. But many vendors that claim to offer federated search engines cannot currently search all licensed databases for both walk-up and remote users."
-"A federated search engine searches databases that update and change an average of 2 to 3 times per year. This means that a system accessing 100 databases is subject to between 200 and 300 updates per year—almost one per day! Subscribing to a federated searching service instead of installing software eliminates the need for libraries to update translators almost daily so they can avoid disruptions in service."

Z39.50 - Information Retrieval (Z39.50); Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-1995" -- is a protocol which specifies data structures and interchange rules that allow a client machine (called an "origin" in the standard) to search databases on a server machine (called a "target" in the standard) and retrieve records that are identified as a result of such a search
-"Z39.50 is one of the few examples we have to date of a protocol that actually goes beyond codifying mechanism and moves into the area of standardizing shared semantic knowledge. The extent to which this should be a goal of the protocol has been an ongoing source of controversy and tension within the developer community, and differing views on this issue can be seen both
in the standard itself and the way that it is used in practice."
-Recent versions of the standard are highly extensible, and the consensus process of standards development has made it hospitable to an ever-growing set of new communities and requirements.
-The OSI, or Open System Interconnection, model defines a networking framework for implementing protocols in seven layers. Control is passed from one layer to the next, starting at the application layer in one station, proceeding to the bottom layer, over the channel to the next station and back up the hierarchy.
-The protocol defines interactions between two machines only
-The basic architectural model that Z39.50 uses is as follows: A server houses one or more databases containing records. Associated with each database are a set of access points (indices) that can be used for searching. This is a much more abstract view of a database than one finds with SQL, for example. Relatively arbitrary server-specific decisions about how to segment logical data into relations and how to name the columns in the relations are hidden; one deals only with logical entities based on the kind of information that is stored in the database, not the details of specific database implementations
-A search produces a set of records, called a "result set", that are maintained on the server; the result of a search is a report of the number of records comprising the result set. The standard is silent as to whether the result set is materialized or maintained as a set of record pointers, and as to how the result set may interact with database updates that may be taking place at the server. Result sets can be combined or further restricted by subsequent searches

Search Engine Technology:
-How should libraries see the future of their information discovery services? Instead of a highly fragmented landscape that forces users to visit multiple, distributed servers, libraries will provide a search index, which forms a virtual resource of unprecedented comprehensiveness to any type and format of academically relevant content
-provide metadata based subject gateways to distributed content. Based on the OAI initiative, libraries and library service organisations are following the idea of "OAI Registries" as central points of access to worldwide distributed OAI repositories
-First of all, this is an acknowledgement that, particularly at universities, libraries deal with a range of users with often different usage behaviours
-Most systems focus solely on the search of metadata (bibliographic fields, keywords, abstracts). The cross-search of full text has only recently been introduced and is often restricted to a very limited range of data formats (primarily "html" and "txt").

Saturday, October 10, 2009

XML reading notes

Reading Question -How prevalent have XML schema become and will they replace the traditional XML format?

muddiest point - As we contemplate moving toward a standard metadata schema for all groups will that require something like Dublin Core to continue adding elements and will that make it far less simplistic if it were to grow to include groups such as archives. Does interoperability matter as much with EAD since it is the accepted standard within the archival community and so all sharing of metadata will all be done in EAD?

Reading Notes:

XML is designed to make it easier to interchange structured documents over the Internet. Defines how structured URLs can be used to identify components of XML data streams.

XML elements ensure that document creators put information in its appropriate place which is the Document Type Definition.

Allows users to:
-bring multiple files together to form compound documents.
-identify where illustrations are to be incorporated into text files and the format used to encode each illustration.
-provide processing control information to supporting programs such as doc. validators and browsers.
-add editorial comments to a file.

Core XML technologies:
-unicode defines strict rules for text format as well as the DTD validation language
-XML is a simplification of SGML and includes adjustments that make it better suited to the web environment.

XML Catalogs - defines a format for instructions on how an XML processor resolves XML entity identifiers into actual documents

URIs - uniform resource identifiers. an extensions of URLs

XML Namespaces - provides a mechanism for universal naming of elements and attributes in XML

XML Schema:
-defines elements that can be in a doc.
-what attributes can be in a doc.
-which elements are child elements
-the order of child elements
-the # of child elements
-whether and element is empty or can define text
-defines data types for elements and attributes
-defines default and fixed values for elements and attributes

XML Schema are the successor to DTDs because:
-They are extensible to future additions
-richer and more powerful than DTDs
-Schemas are written in XML
-support data types
-support namespaces

Schema support data types:
-easier to describe allowable document content, correctness of data, data from a database, restrictions on data. Also easier to define data formats and convert data between different elements.

-even well formed XML documents still contain errors but most of these will be found by validator in XML schema.

Simple Element - an XML element that can only contain text w/o other elements or attributes allowed.

attribute - string, decimal, interger, boolean, date, time
restrictions - used to define acceptable values for XML elements or attributes.

Complex Elements:
-Empty Elements
-elements that contain only other elements
-contain only text
contain both elements and text.

Indicators:
order indicators - used to define the order of element
-all
-choice - specifies that one child element or another can occur
-sequence - child elements must occur in a specific order

Occurance Indicators:
maxOccurs
minOccurs

Group Indicators - elements are defined with a group decleration
group names
attributGroupe name

Any Element - enables an XML doc. w/ elements not included in a schema

String data type- used for values that contain character strings. can contain characters, live feels, carriage returns and tab characters

Misc. Data Types - boolean, base64Binare etc.

Friday, September 25, 2009

Metadata in digital libraries

Reading question - how interoperable are the different metadata schemas?

-The term metadata is less commonly used among creators and consumers of networked digital content. Using web page tabs, folksonomies, and social bookmarks are growing practices.

Metadata should reflect three thing
1. Content - what the object contains or what is intrinsic to an information object
2. Context - indicates the who, what, why, where and how aspects associated with an objects creation and is extrinsic to an information object
3. Structure - relates to the formal set of associations with or among individual information objects and can be both intrinsic and extrensic.

Data Structure Standards - Catagories or containers of data that make up a record or information object. (MARC, EAD)

Data Value Standards - Terms, names, and other values that are used to populate data structure standards or metadata elements. (LOC Subject headings)

Data Content Standards - guidlines for the format and syntax of the data values that are used to populate metadata elements (DACS)

Data format/ technical interchange - type of standard is often a manifestation of a particular data structure standard, encoded or marked up for machine processing. (XML)

- information communities are aware that the more highly structured on information object is, the more that structure can be exploited for searching, manipulation and interrelating with other information objects. This can only occur with strict adherence to metadata standards.
-certifies the authenticity and degree of completeness of the content.
- est. and documents the context of the content
- identifies and exploits the structural relationship that exists within and between information objects.
- provides a range of intellectual access points for an increasingly diverse group of users.
- provides some of the info that an info professional would have provided in a reference scenario

Different types of metadata
Administrative - used in managing and administering collections and information resources

Descriptive - used to identify and describe collections and related information resources

Preservation - preservation management of collections and information resources. Documentation of physical condition of resources.

Technical - how a system functions

use - level and type of collections

Attributes and characteristics of metadata
source of metadata - internal metadata is generated by the creating agent with the item is digitized or born. external metadata is created by someone who is not the creator.

method of creation - automatically generated by the computer or manually by humans

nature of metadata - non-expert vs. expert creation

status static metadata never changes, dynamic changes with use, manipulation, or preservation

Semantics - controlled metadata vs. uncontrolled metadata

- metadata creation has become a complex combination of manual and automatic processes

Primary functions of metadata
-creation, multivisioning, reuse and recontextualization of information objects.
-organization and description
-validation - users scrutinize metadata to assure authenticity and authoritativeness
-searching and retrieval
-utilization and preservation - metadata related to user annotations, rights tracking and version control
-disposition - accession and deaccessioning

Bibliographic entities
documents, works, editions, authors, titles and subjects

MARC
-governed by AACR2R
-stored as a collection of tagged fields in a fairly complex format and is also used to represent authority records which are standarized form that are part of controlled vocabulary.

Dublin Core
-designed for nonspecific use
-simple/flexible has only 15 elements compared to hundreds in MARC

BibTeX - used for mathematical notation. manages bibliographic date and references within docs. end note?

Refer - similar to BibTeX

Thursday, September 24, 2009

Assignment 2 link to flickr

http://www.flickr.com/photos/mcrib/sets/72157622447509222/