Thursday, September 17, 2009

Reading notes week 3

Reading question: How prevalent are identifiers used in place of URL's for digital objects in DLs? How prevalent are they outside DLs on a site similar to flickr?

Identifiers and Their Role in Networked Information Applications

-Bibliographic utility identifier numbers such as the OCLC and RLIN numbers are used in duplicate detection and conslidation in the construction of online union catalog databases.

-"The assignment of identifiers to works is a very powerful act; it states that, within a given intellectual framework, two instances of a work that have been assigned the same identifier are the same, while two instances of a work with different identifiers are distinct."

- URLs serve as the key links between physical artifacts and content on the Web, as well as providing linkage between objects within the Web.

- URLs are not really names, merely instructions on how to access an object. URLs were never intended to be long lasting names for content; they were designed to be flexible, easily implemented and easily extensible ways to make reference to materials on the Net.

URN- uniform resource names. the syntax of a URN for a digital object is defined as consisting of a naming authority identifier and an object identifier which is assigned by that naming authority to the object in question; the specific content of the identifier may have structure and significance to users familiar with the practices of a given naming authority, but has no predefined meaning within the overall URN framework.

- Could you talk in class about the function of "resolvers" within the URN framework?

-browsers do not understand URNs

-PURL server creates a database entry linking this hostname and filename to the identifier that will appear in the PURL. When the PURL server is contacted because because someone is valuation a PURL, it looks up the identifier in its database, finds out where the object in question currently resides, and uses the redirect feature of the HTTP protocol to connect the requester to the host houseing the object.

-SICI- Serial Item and Contribution Identifier. can be used to identify a specific issue of a serial, or a specific contribution within an issue (such as an article or table of contents)

-BICI (Book item and contribution identifier. can be used to identify specific vloumes within a multivolume work, or components such as chapters within a book

Digital Object Identifier - provides a mechanism for implementing a naming system that fits roughly within the URN framework and that provides a mechanism for implementing naming systems for arbitrary digital objects.

-DOI provides a method for collecting revenue for access to material that is described by a DOI if the organization that owns the rights to DOIs in and of themselves are the only identifiers and do not imply that any sort of copyright enforcement mechanisms will be bundled with the objects that they describe; the presence or absence of such copyright enforcement technologies is an entirely separate issue.

Digital Object Identifier System (Paskin)

Identifier is

- a string, typically a number or name denoting a specific entity. Think ISBN

- A specification, which prescribes how such strings are constructed.

- a scheme, which implements the specification. Typically such schemes provide a managed registry of the identifiers within their control, in order to offer a related service.

Uniqueness - is the requirement that one string denotes one and only one entity (the "referent").

Resolution - is the process in which an identifier is the input to a service to receive in return a specific output of one or more pieces of current information related to the identified entity.

Persistence - is the requirement that once assigned an identifier denotes the same referent indefinitely.

URLs do not refer to the identity of an entity but its location on a network.

The DOI system is such a managed system for persistent identification of content on digital networks, using a federation of registries following a common specification. Information, such as where to find an object may change over time (URL?) but its DOI will not change. It brings together a syntax specification, defining the construction of a string. A resolution component, providing the mechanism to resolve the DOI name to data specified by the registrant. A metadata component, defining an extensible mode for associating descriptive and other elements of data with the DOI name. A social infrastructure, defining the full implementation through of policies and shared technical infrastructure in a federation of registration agencies.

Arms Chapter 9

Methods for storing textual materials must represent two different aspects of a document: its structure and its appearance. The structure describes the division of a text into elements such as characters, words, paragraphs and headings. It identifies parts of the documents that are emphasized, material placed in tables or footnotes, and everything that relates one part to another. The structure of text stored in computers is often represented by a mark-up specification. In recent years, SGML (Standard Generalized Markup Language) has become widely accepted as a generalized system for structural mark-up.

The appearance is how the document looks when displayed on a screen or printed on paper. The appearance is closely related to the choice of format: the size of font, margins and line spacing, how headings are represented, the location of figures, and the display of mathematics or other specialized notation. In a printed book, decisions about the appearance extend to the choice of paper and the type of binding. Page-description languages are used to store and render documents in a way that precisely describe their appearance. This chapter looks at three, rather different, approaches to page description: TeX, PostScript, and PDF.

style sheet - describes how each structural element is to appear, with comprehensive rules for every situation that can arise.

- Mark-up languages can represent almost all structures, but the variety of structural elements that can be part of a document is huge, and the details of appearance that authors and designers could choose are equally varied

OCR - Optical character recognition is the technique of converting scanned images of characters to their equivalent characters. The basic technique is for a computer program to separate out the individual characters, and then to compare each character to mathematical templates

- Computers store a character, such as "A" or "5", as a sequence of bits, in which each distinct character is encoded as a different sequence

- Since it is impossible to represent all languages using the 256 possibilities represented by an eight-bit byte, there have been several attempts to represent a greater range of character sets using a larger number of bits. Recently, one of these approaches has emerged as the standard that most computer manufacturers and software houses are supporting. It is called Unicode.

- SGML is a system to define mark-up specifications. An individual specification defined within the SGML framework is called a document type definition.

- SGML is firmly established as a flexible approach for recording and storing high-quality texts. Its flexibility permits creators of textual materials to generate DTDs that are tailored to their particular needs.

- html is considered a simplified DTD

- xml is designed to bridge the gap between html and the full power of sgml

- Every time a new feature is added to HTML it becomes less elegant, harder to use, and less of a standard shared by all browsers. SGML is the opposite. It is so flexible that almost any text description is possible, but the flexibility comes at the cost of complexity. Even after many years, only a few specialists are really comfortable with SGML and general-purpose software is still scarce.

- Since XML is a subset of SGML, every document is based on a DTD, but the DTD does not have to be specified explicitly. If the file contains previously undefined pairs of tags, which delimit some section of a document, the parser automatically adds them to the DTD.


1 comment:

  1. How prevalent are identifiers used in place of URL's for digital objects in DLs?

    I think now it is common that documents have identifiers in digital libraries and database. For example, when you use ULS remote access to EBSCO database and then do some search. The articles that you get always have permanent url. And this url is this article's identifers. The url on the top of the browser is dynamic, so the permanent url is important in digital libraries since it tells you the real place of this article.

    ReplyDelete