Center for Cooperative
Research Database Project (Draft)
Summary of the Database Application
Applications and Additional Features
The Center for Grassroots Oversight seeks to encourage grassroots participation and collaboration in documenting
history using an open-content model. The website we are developing allows people to input information into a database on past and current events, as
well as on the entities associated with those events. The data is published on the website in the format of
dynamic timelines and entity profiles, and is exportable into XML so it can be shared with others for non-commercial purposes.
Conceptual Summary of the Database Application
The History Commons database project has two main components—data presentation/manipulation and
collaborative data submission.
Data Presentation and Manipulation
The website provides
visitors with information about specific historical events, the entities involved in those events, and the relationships that exist
between and among those events and entities. This information is presented in the format of timelines and entity profiles.
Each timeline entry includes a summary of the event, the date the event took place, a
title, and links to its primary and/or secondary sources.
Entities are people, organizations, companies, governments,
projects, and so on.
Relationships between entities:
Entities can be linked to other entities by
a number of different relationships. For example person x may be the brother of person y, the author of book a, the
employee of corporation c, and the founder of organization d.
Events that relate to one another in some way may be linked together by cross-reference links. For example, if the
summary of an event that took place on November 5, 2003 refers to another event that took place on September 28, 2002, in the
text of the entry there would be a link from the former to the latter that will appear as “(see September
between events and entities:
An entity may be linked to an event via one of three relationship types—as an observer,
as an active participant, or as a passive participant. The relationship type provides the application with a basis for
assessing the significance of a relationship link; so, for example, it can treat the relationship between an event and a person who
participated in the event differently than the relationship between an event and a person who was merely mentioned in the event; or who
commented on the event, but did not participate in it.
At the end of each
timeline entry, there is a list of all entities associated with that event. Each of the entities listed at the bottom of the event is
linked to an entity profile page which includes a description of the entity's relationships with other entities, a list of all quotes
associated with the entity, as well as a timeline of all events in which the entity participated.
Significance of relationship mapping:
Relationship mapping is powerful because
it makes the data more meaningful. By mapping the relationships that exist between entities and events, the website can depict both the
historical development of an issue as well the cotemporal interrelationships associated with it.
Relationship mapping also allows the application to behave more intelligently. For
example, the application can approximate how closely an entity or event is related to another entity or event by examining the number
and type of relationships that are linked between them. Using this information, the application can provide the user with contextual
information related to a particular entity or event. For example, one feature that will eventually be added is the capability to create
scalable “context” timelines. Each timeline entry would have a “context” link, which when clicked would instruct
the application to generate a chronology of all events that are related to a specific event. The user would have the option of
specifying how narrow or broad the context timeline would be. A narrow context timeline, for example, would only include those
entries that the application determines are very closely related to the event in question whereas a very broad context timeline would
include entries that are both closely and distantly related to the event. The relationship proximity between one event and another event
would be a value computed from a logarithm based on factors such as: the presence of direct relationships between the two events
(cross-reference links), the presence of an indirect relationship (when the entries are linked to each other through other entries by
way of cross reference links) between the two events and how close that relationship is (determined by the number of degrees of
separation), how many active participants they have in common, and how many keywords the two entries share.
There are two basic
timeline types: dynamically generated timelines and user-created timelines. Both timelines draw from the same collection of events in
the database. They are different only in how the entries are selected for inclusion. In the former case, it is the software that selects
the events, whereas in the latter it is human judgment.
If a visitor would like to view the context of a certain timeline entry, he or she may click on a link that generates a “context
timeline.” Currently, context timelines contain all entries that are either linked to or linked from a particular entry. In
future versions of the application, the context timeline will be based on computed proximity values (see above).
Query-generated timelines: When a visitor to the website conducts a search of our
database, the website generates a timeline consisting of all entries in the database that contain the search terms.
Entity timelines: An entity timeline consists of all events in which a specified entity
A user-created timeline is a chronology that has been compiled manually by a user or
group of users. Since it is the result of calculated human judgment—a far better tool for discerning the meanings of human events—it
goes without saying that such timelines tend to consist of entries that are more on-topic than those in timeline that has been generated
dynamically by a server. On the other hand, user-created timelines also tend to be more biased since the selection process can easily be
influenced by the user or users’ assumptions and opinions about the topic. For example, a user may decide not to include certain
events if those events challenge his or her assumptions or theories about a certain issue.
Other differences between user-created timelines and dynamically-generated timelines
timelines are associated with a project that is managed by one or more users who have been approved to serve in that position.
The project manager of a user-created timeline project has the option of
creating one or more “category-sets,” each of which may contain an unlimited number of categories. An entry may be included
in any number of categories, thus providing a way to slice and dice the entries according to variety of different criteria.
(still in development)
Users who want to
begin an investigation of a specific issue or document a certain chain of events will send us a request to create a project for them.
If approved, we will set up a “project” for the user, and the user will be made “project manager” for that project.
The user will then be able to create category-sets and categories for the timeline, compose timeline entries, and add entities. During the
initial compilation of this timeline, the project will be assigned the status “work-in-progress” and will not be viewable to
other users. When the user is ready to go live with his or her project, each entry in the timeline will be reviewed to
ensure that every statement is backed by the sources cited. The timeline will then be proofread before being published live on the
Registered users can add new entries or edit existing ones using a wiki-style
user interface. After a new entry or proposed modification is submitted, the submission will appear as “pending” until
it has gone through the review process.
will go through a review process before being published on the website. The review process consists of four release stages:
- draft- This is the initial state. When a user creates an entry, the user
is permitted to save the entry as a draft. The user may therefore continue working on
the entry later. While in draft state, no other users can see or edit the entry, unless the user checks the option to share the draft
entry with other users. When the entry is ready for submission, the user clicks "submit" and
the entry is released to the next state, "pending-verification."
- pending-verification- After an entry has been submitted, it is placed in queue to be verified by a content-editor.
- pending-copyedit- After an entry has been verified by a content editor, it is placed in queue to be copyedited.
- pending-inclusion-Once copyedited, the entry is permanently added to the database. However, before it is included in the timeline it was submitted to, it must be accepted by the project manager of that timeline.
There are two
types of editors—content editors and copyeditors. Content editors check submissions for accuracy, sound logic, and proper
writing format. Copy editors review submitted material that has already been approved by a content editor to ensure that it conforms
to History Commons’s style manual.
All edits to entries are logged, and users can view and compare
previous versions of modified entries. Content editors can revert an entry to a previous version if necessary.
Acceptance into a project
Once an entry has been verified, it goes to the project manager who must decide whether or not to include the entry in the project’s
timeline. If the project manager decides not to accept the entry, the entry is automatically posted to a special category, titled
“Verified entries not accepted for inclusion in this project.” If a project member decides not to include an entry because the entry includes “inconvenient”
facts that undermine his or her assumptions about the issue, the presence of these entries in the list of “Verified entries not
accepted for inclusion in this project,” will signal readers to the bias.
Project managers are not permitted to assume the role of content-editor. Any edits or
new entries submitted by a project manager will have to go through the same review process as entries submitted by other contributors.
If a project manager believes that a submitted verified entry contains factually inaccurate information, the s/he may appeal it one
time (This feature is not yet implemented).
Editor Management (Not yet implemented)
The network of content-editors will be comprised of contributors who have either
demonstrated solid writing and journalistic skills or whose professional or academic experience pre-qualifies them for the work.
Content-editors will select an area or areas of expertise and specify how many entries they are willing to review each day, week,
or month. The website will then distribute the editing work accordingly. Editors will be alerted via email whenever they are
assigned a new pending entry. The email would contain the text of the entry and a link to the editor’s account page on the website
where they would go to edit the entry online.
The management of
copyeditors would be similar except that they would not need to specify an area of expertise.
Contributor Management (Not yet implemented)
To avoid backlogging of pending entries, each contributor will be limited in the number of
entries he or she is permitted to submit. The limit will be determined by two factors: the contributor’s “skill
ranking” and the ratio between “active editors” and “active contributors.” When an editor approves (or
rejects) an entry, the person would specify how well the entry was written, (e.g. perfect, no edits needed; good, only minor edits;
fair, minor edits, some inaccuracies; poor, lots of work was required, poorly written, didn't summarize event well, etc.) People
who consistently get high marks would receive a higher rating and would be permitted to submit more entries. People who submit poorly
written entries would have lower marks and would be restricted in the number of entries they can write until they achieve a higher skill
ranking. Skill rankings would not be viewable to the public. The other factor, the "active editor" to "active
contributor" ratio, would maintain the system’s efficiency by increasing or decreasing the submission limits for all
contributors when there is an imbalance between the editors and contributors. So for example, if a large number of editors were to
suddenly enroll in the project, thereby increasing "active editor" to "active contributor" ratio, the system
would automatically increase the submission limits for all contributors.
All the data in the History Commons database
is exportable into XML so it can be used by other individuals and groups for non-commercial purposes. As such the historical data
collected by contributors and stored in the History Commons database will serve as a history data commons.
Possible applications and additional features
Timeline reader plug-in
We could offer data feeds, perhaps using Google's new gdata, that would allow other websites to display
data exported from the History Commons database. For example, a blogger who is commenting on an issue may want to refer her readers
to a chronology of events in order to provide context or support for her comments. Similarly, an organization advocating on behalf of a
certain cause may want to include a timeline on its website consisting of data from the History Commons database.
Regular contributors could be given their own blog
accounts that they could use to discuss their research. The blog would be tied into the main application and would have a feature
allowing bloggers to include timeline entries into their posts. Each blog could have its own domain name and design.
Customizable report builder
A feature could be added to the website that would
allow visitors to generate custom reports that they can download and use for non-commercial purposes. The user would select timeline
entries from the database and then specify how the data would be formatted and what citation style to use. The website would then
generate a temporary file (such as a PDF) that the user would download.
Customizable email updates
We could add the capability for registered users to customize the content of their email updates. Users would specify
how often they want to receive emails and under what circumstances. For example, one could elect to receive emails whenever changes are
made to certain projects or timelines.
Windows/Mac -based software for local editing and
A program could be developed for Windows and Mac
users so contributors and editors could work offline.
We could develop a reputation/trust system that would offer promotions (e.g., from “contributor” to “editor”) to users who have completed a certain number of actions (e.g., submissions, edits, etc.) and who have high quality ratings.
We could develop a voting system that would allow all active users to take part in decisions relating to the website. The weight of each user's vote would depend on the user's reputation.
Facilitate process of documenting and analyzing current events
During the first few days immediately after a major event, the public’s understanding of the event is typically far from coherent or complete. This may be a result of several different factors. In some cases, different news reports describing the incident may focus on different aspects of the event and may contain contradictory information. If the event is controversial in any way, its reporting may be fragmented by conflicting accounts and interpretations that are fueled by competing interests. Another problem that makes it difficult for the public to quickly form adequate understandings of major events is that the press rarely puts events into their proper contexts.
It is usually only with time that the record of a particular event becomes more coherent and complete. Several things may contribute to this: (1) With time, more facts about
the event may be uncovered; (2) Facts that may have been suppressed or downplayed by the media or government at the time of the event are
eventually thoroughly investigated and receive more attention; (3) Obscure temporal and spatial relationships of the event may become
more apparent; and (4) Contradictions that may have been present in the original reporting are more thoroughly examined. The superior data available at
this stage in the process therefore allows researchers to piece together the varying aspects of
the event into a much more coherent whole. In some cases, the public’s understanding of an incident will change dramatically
at this stage in the event’s documentation leading to a corresponding change in public opinion about the event and raising
the possibility that had the public known earlier what it learned later, its reaction to the event may have been dramatically
The large span of time that is often needed for
the public to develop a relatively coherent understanding of an event works against public interest. It impairs the public’s ability
to monitor the activities of powerful interests as well as its ability to competently assert its interests through democratic
processes. Fortunately, recent advances in information technology appear to have significantly shortened the time needed for the public
to acquire a relatively accurate understanding of an event. In addition to providing its users with instant access to thousands of news sources,
the Internet provides a space where people can meet to discuss and analyze events in a collaborative setting. This aspect of the
Internet has greatly facilitated the processing of information in the public sphere.
The History Commons database project seeks to harness the efficiency of online collaboration and database
technology in an effort to reduce the amount of time needed for the public to form coherent accounts of current events and issues. The
database project has four important qualities that make this possible: cumulative content; extensive citation; the capability to
immediately put current events into context; and an open-content, peer-reviewed contribution system.
Unlike a newspaper or magazine article, the content of the History Commons website is cumulative. When new
information is published about an event, this information is incorporated into the existing entry on that event. The timelines
themselves are also cumulative because their compositions change as new entries are added to the database.
The project’s second important quality is that the website has the capability to immediately put events into
context by generating “context” timelines, which can draw attention to relationships that are not readily apparent.
The contribution system makes the task of documenting events a collaborative process.
This increases efficiency because there is less duplication of work since contributors are building upon each other’s work
in real time. Furthermore, since anyone is permitted to contribute material to the website, the selection of events and
issues to be researched and documented is not influenced by narrow interests as they often are in the mainstream media. This
results in fewer information gaps and a more complete, coherent accounting of events.
Strengthen public role in recording history
A second objective of the History Commons
database project is to strengthen the public’s role in documenting events so that people at the grassroots level can exert
greater influence over the content of the published historical record. The recent advances in information technology have changed
the nature of information production and distribution in two very important ways that are fundamental to achieving this goal.
Firstly, the Internet has considerably lowered the costs of information distribution. Before the Internet, large capital
investments were needed to obtain the equipment, licenses, and staff necessary to disseminate information to large audiences. This is no
longer the case. For about $15 a month, a single person can potentially reach millions of people.
Secondly, Internet technology has created an environment where public collaboration in the production of
information can take place at a level of efficiency comparable—if not superior—to that of the capital-intensive efforts of
hierarchically structured private enterprises. Collaboration in a networked “open-content” environment can greatly improve the
efficiency and quality of information production in the public sphere as it allows contributors to build upon and improve the work of
others in real time. This collaborative “open-content” model is politically and economically significant because it enables
grassroots efforts to compete on a near equal footing with those of private industry. This phenomenom was noted in the book Information Rules, by Carl Shapiro and Hal R. Varian, who wrote: “The old industrial economy was driven by economies of scale; the new information economy is driven by the economics of networks.”
The combination of lowered costs and the public’s capability to compete with private industry has effectively
decentralized the processes of information production and distribution. This is historically significant because it represents
a restructuring of the relationship between the producers of information and the consumers of information. In the conventional
mode of information production and distribution, the producers and consumers belong to different segments of society with distinctly
different interests. Private industry, whose interests are tied to making profit and accumulating capital, produces information to be
consumed by the public, whose interest is to receive accurate information and who is generally unconcerned with the profitably of
information distribution. This conflict of interest is greatly diminished when the process of producing and distributing information
takes place within civil society proper.
By providing the public with a website that
literally allows them to participate in the writing of history, History Commons hopes to further this trend of decentralizing
information production and distribution, and contribute to the autonomization of civil society vis-á-vis the state and large
Provide a venue for people at the grassroots
level to monitor the activities of government and private businesses.
It is commonly asserted that the role of
monitoring the activities of governments and private businesses is the responsibility of the press and the government itself. However
this view ignores the fact that the interests of these groups often coincide with the very groups they are supposedly keeping in check.
Because of this problem, the role of oversight, in many cases, has been taken up by groups that have emerged from within civil society,
ranging from small grassroots organizations to large government-licensed non-profit corporations. These groups are
generally regarded as much more effective advocates of public interest. The History Commons database project, by providing a
venue for collaborative research and documentation, will help expand involvement in government and private industry oversight to
individual members of the public.