Bibliogram

From HandWiki

A bibliogram is a graphical representation of the frequency of certain target words, usually noun phrases, in a given text. The term was introduced in 2005 by Howard D. White to name the linguistic object studied, but not previously named, in informetrics, scientometrics and bibliometrics. The noun phrases in the ranking may be authors, journals, subject headings, or other indexing terms. The "stretches of text” may be a book, a set of related articles, a subject bibliography, a set of Web pages, and so on. Bibliograms are always generated from writings, usually from scholarly or scientific literature.

Definition

A bibliogram is verbal construct made when noun phrases from extended stretches of text are ranked high to low by their frequency of co-occurrence with one or more user-supplied seed terms. Each bibliogram has three components:

  • A seed term that sets a context.
  • Words that co-occur with the seed across some set of records.
  • Counts (frequencies) by which co-occurring words can be ordered high to low.

As a family of term-frequency distributions, the bibliogram has frequently been written about under descriptions such as:

  • positive skew distribution
  • empirical hyperbolic
  • scale-free (see also Scale-free network)
  • power law
  • size frequency distribution
  • reverse-J

It is sometimes called a "core and scatter" distribution. The "core" consists of relatively few top-ranked terms that account for a disproportionately large share of co-occurrences overall.

The "scatter” consists of relatively many lower-ranked terms that account for the remaining share of co-occurrences. Usually the top-ranked terms are not tied in frequency, but identical frequencies and tied ranks become more common as the frequencies get smaller. At the bottom of the distribution, a long tail of terms are tied in rank because each co-occurs with the seed term only once.

In most cases bibliograms can be described by power laws such as Zipf's law and Bradford's law. In this regard, they have long been studied by mathematicians and statisticians in information science. However, these treatments typically ignore the qualitative meanings of the ranked terms themselves, which are often of interest in their own right. For example, the following bibliogram was made with an author's name as seed and shows the descriptors that co-occur with her name in the ERIC database. The descriptors are ranked by how many of her articles they were used to index:

6   Creativity
4   Creativity Tests
3   Divergent Thinking
2   Elementary School Mathematics
2   Instruction
2   Mathematics Education
2   Problem Solving
2   Research
2   Time
1   Acceleration
1   Anxiety
1   Beginning Teachers
1   Behavioral Objectives
1   Child Development
1   Classroom Techniques
1   Cognitive Development
    etc.

This author is a researcher in education, and it will be seen that the terms profile her intellectual interests over the years. In general, bibliograms can be used to:

  • suggest additional terms for search strategies
  • characterize the work of scholars, scientists, or institutions
  • show who an author cites over time
  • show who cites an author over time
  • show the other authors with whom an author is co-cited over time
  • show the subjects associated with a journal or an author
  • show the authors, organizations, or journals associated with a subject
  • show library classification codes associated with subject headings and vice versa
  • show the popularity of items in the collections of libraries
  • model the structure of literatures with title terms, descriptors, author names, journal names

Bibliograms can be created with the RANK command on Dialog (other vendors have similar commands), ranking options within WorldCat, HistCite, Google Scholar, and inexpensive content analysis software.

White suggests that bibliograms have a parallel construct in what he calls associograms. These are the rank-ordered lists of word association norms studied in psycholinguistics. They are similar to bibliograms in statistical structure but are not generated from writings. Rather, they are generated by presenting panels of people with a stimulus term (which functions like a seed term) and tabulating the words they associate with the seed by frequency of co-occurrence. They are currently of interest to information scientists as a nonstandard way of creating thesauri for document retrieval.

Examples

Other examples of bibliograms are the ordered set of an author's co-authors or the list of authors that are published in a specific journal together with their number of articles. A popular example is the list of additional titles to consider for purchase that you get when you search an item in Amazon. These suggested titles are the top terms in the "core" of a bibliogram formed with your search term as seed. The frequencies are counts of the times they have been co-purchased with the seed.

Examples of associagrams may be found in the Edinburgh Associative Thesaurus.

Other methods

Similar but different methods are used in data clustering and data mining. Google Sets does also create list of associated terms to a given set of terms.

See also

References

  • Howard D. White (2005): On Extending Informetrics: An Opinion Paper. In: Proceedings of the 10th International Congress of the International Society for Scientometrics and Informetrics. Stockholm p. 442-449