From documents to digitization
To design a research project using primary sources from the Web, you'll need to know what's out there and how to find it. This article explains what's available, why, and where.
The World Wide Web has been a boon for students doing research in almost any field, and social studies is no exception. Until recently, students had to live near a major university or state library to have access to most primary sources, including newspapers, photographs, diaries and letters, and audio recordings. Now, more and more primary source materials are available on the Web. But the boom in access doesn’t mean that everything you want will be available, let alone easy to find — or free of charge. How can you tell what kinds of source materials are available on the Web, and how can you find and use what’s out there? These questions aren’t always easy to answer. But until you know what’s available, you can’t design a good research project or help your students design one. It’s all too easy to become attached to a particular research question or hypothesis, only to find that not only don’t the sources back you up, there aren’t even any sources to look at! For students facing time constraints and limited attention spans, it’s vital to start by finding out what kinds of source materials are available to them.
There are two kinds of sources you and your students are likely to be looking for online. The first is privately owned, for-profit sources such as newspapers and magazines. The second consists of documents and other materials, such as audio recordings or photographs, in the possession of libraries and archives. Resources of the first type were created to make money and are still owned by their creators, who have no vested interest in making old materials available to the public — especially when they may still be able to profit from them. Libraries and archives, on the other hand, exist to preserve source materials and make them available to the public for research. As a result, different kinds of materials face different processes for becoming digitized. We’ll look at each type separately.
Newspapers and magazines
Suppose a student, researching a paper on gangsters, comes to you looking for articles from the Chicago Tribune from the Prohibition era, and wants to find them in the Tribune’s online archive. (A colleague who used to be a high school media specialist swears this happened to her.) It probably won’t surprise you to learn that material that old is not available online, although it might surprise a middle or high school student who can’t clearly recall a time B.E. (before email). But how recent does material have to be before it becomes available? Could you find newspaper articles online about the Iran-Contra scandal, for example? What about the 1996 Presidential election?
The answer is "it depends." It depends on how old the articles are: Virtually every newspaper with an online edition gives you access to yesterday’s news. As a general rule, very recent newspaper content (a week or a month) is available online for free; anything older is available on a for-fee basis. Most newspapers will let you pay by the article for downloads. For example, the Chicago Tribune will let you have anything published since 1985, but you have to pay $3.95 an article. The Washington Post online archives date back to 1977, and you can search them for free, but you have to pay to retrieve the text of an article more than two weeks old. The Philadelphia Inquirer offers content online back to 1978, again on a pay-as-you-go basis. Locally, Raleigh’s News and Observer makes the past seven days’ news available free of charge on the Web; other articles since 1990 are available solely to subscribers. The Charlotte Observer offers archived articles for $1.95 each. A rare exception is The New York Times, which makes all of its content since 1996 available free of charge online; you must register for access, but there is no fee for either registration or downloads.
Why the fees? It’s easy to get irritated with newspapers when they charge for access to old news. The content has already been produced, and they’ve (presumably) already made their money back from advertising and subscription fees. But building and maintaining a database of articles costs money. A single issue of a major metropolitan newspaper may contain hundreds of articles. Imagine archiving 100 articles a day, 365 days a year, for 10 years: that’s a third of a million documents to keep track of. Web designers and developers don’t come cheap, and somebody has to post all that content to the Web in the first place. A newspaper’s management may also believe, correctly or not, that making content available for free via the Web discourages readers from purchasing subscriptions. There is little evidence that a significant number of newspaper readers prefer online content, even if it’s free, and The New York Times doesn’t seem to be suffering; the main value of newspaper archives is for research. Nevertheless, newspapers are in business to make money, and they have always charged for content. The real surprise should be that any newspaper content is available free of charge
Even if newspaper archives are available free, or even if you’re willing to pay for access, you won’t be able to search the entire run of a major newspaper. No newspaper that I’m aware of maintains an online archive with content produced before the mid-1970s, and in most cases you’ll be lucky to find material prior to 1990. The deciding factor is when a particular newspaper went to an electronic publication system. Newspapers, like all printed material, used to be typeset mechanically, and the only permanent record of their content is the printed newspaper itself. For the most part, only since reporters started using desktop and laptop computers instead of typewriters and production departments switched to electronic typesetting have newspapers been able to maintain digital archives. A few larger newspapers have been doing it for longer, but for most, it simply wasn’t an option until the 1990s.
So what can you do to find old newspaper articles? Microfilm is still available for free through your local library. You may have to get it through interlibrary loan, and of course it isn’t nearly as convenient as a digital archive: it can’t be full-text searched, and you’ll have to know which reels — i.e., what range of dates — you want to order. You have to go to the library to use it, which for anyone used to the convenience of the Web is the worst part of all. Most newspapers, even local ones, maintain a subject index that will help you track down content, but those indexes are usually available only at the newspaper offices. (The New York Times is again an exception; printed subject indexes are available for each publication year.) Still, microfilm is relatively easy to use, and doing so won’t cost you anything except time. It’s also generally available for the entire run of a major newspaper, even back to the nineteenth century. The North Carolina Collection at UNC-Chapel Hill, for example, has microfilm copies of North Carolina newspapers going back two hundred years.
If you’re looking for news from the past twenty years and feel that $1.95 an article is worth the time you’ll save, try the online NewsLibrary. This service makes available the content of more than sixty newspapers from nearly every state in the nation, including some 10 million articles. You can search any subset of those newspapers or specify a region, and you can choose to search any range of dates available. This won’t be an option for most students, and it isn’t something you should necessarily recommend, but it may be a good way to find teaching materials.
Libraries and archives
Unlike newspapers, libraries and archives don’t charge for access to information. That doesn’t mean, however, that anything you want — or even everything they have — will be available on the Web.
The limiting factor is money. Web databases are expensive to build and maintain, but simply digitizing primary source materials is even more expensive. Photographs are actually relatively easy to digitize; you can scan a photograph on a $100 desktop scanner and in five minutes have an image that is more than adequate for Web publication. Documents are far more difficult. You can scan a document and store it as an image file, but a fairly high resolution is required to make the text readable — particularly if the document was handwritten. High resolution means long download times for users, and, of course, a picture of a document can’t be indexed for full-text searching. For typed documents and previously published materials, optical character recognition (OCR) software can translate a scanned document into plain text that can be edited, searched, and formatted just like an original word-processed document. Unfortunately, OCR is only reliable for typed or printed material; handwriting varies too much to be easily read by a computer. And even 99% accuracy, which sounds great at first, means that one out of every hundred characters will be misread, errors that can require hours of editing for a single long document. Handwritten documents have to be transcribed by hand, a long and tedious process.
Audio files, which seem almost to be made for the Internet, can be even more expensive than documents to digitize. We’re used to listening to music on CDs and as MP3 files, but audio recordings in archives started out on records or tapes. That’s true even of recent recordings such as oral history interviews; even in 2001, portable digital audio recorders are expensive and don’t offer the quality of analog cassette tape. To digitize an audio recording means making a digital recording of the original, then cleaning it up with an application such as SoundForge, which is to audio what Photoshop is to images. To digitize a 90-minute recording may take several hours of highly skilled labor. The costs, needless to say, add up quickly. Oral history interviews, for example, could be a tremendous resource for teachers and students, but the universities, schools, and community groups that produce and archive them can rarely afford to digitize them. Then too, there is always the danger that the format a library chose for digital audio might soon go the way of the LP — and all that work would have to be redone
Libraries looking to post materials on the Web may face another problem: copyright restrictions. Just because a diary, recording, or photograph is available in a library doesn’t mean that the library owns the rights to the material. That’s obvious with, say, a jazz record from the 1930s, but it can be true even of a set of personal papers. Posting material on the Web is, legally, a form of publication, and that requires permission, which can be tricky — and costly — to obtain. You might think that there’s no money to be made on an obscure, half-century-old record or photograph, and you might be right — but that won’t always stop people from trying. And in the case of personal papers and interviews, the authors or subjects may simply not be comfortable with having their private materials available to the world. They may feel that availability on the Web is a very different and more public thing than availability in a library.
To pay the high costs of digitization, libraries, archives, and universities have to get grants from government or private funding agencies. Often, that means that politically valuable materials are digitized first. Sometimes that’s a good thing; certainly, the Declaration of Independence is worth putting on the Web in various forms. But it means that in general, there is simply no rhyme or reason to what’s available on the Web. A powerful Congressman’s pet project may get funding from the National Endowment for the Humanities or the Library of Congress while a collection with great educational value waits in line. The American Memory collections of the Library of Congress are a wonderful resource, but you’ll quickly notice as you browse them that they don’t follow a consistent pattern. In order for a collection of documents or photographs to be digitized, a dedicated individual or organization has to decide that the collection merits the time and money involved. A digital collection may be the product of a university program, a public library, an undergraduate college course, or a high school project. Local programs are likely to focus on collections of local interest, though they may be of use to a broader audience as well. Larger organizations may have specific political, social, or cultural agendas that shape what they are willing to fund: a collection relating to religion, for example, or to African-American culture.
So where should you start? Government sites are always a good bet for sources relating to the United States. LEARN NC’s collection includes a number of sites providing primary sources; try searching for the type of resource you want (photograph, audio, etc.) as well as the topic. A search engine such as Google will find you many more resources, although it may be difficult to weed out sites that are not educational. Try searching for "primary sources" and the topic or time period you’re interested in ("medieval," "antebellum," and so on). If you want to ensure that resources you find won’t be available on a for-fee basis, try eliminating .com from your searches using a tool like Google’s advanced search. If you’re interested in local resources, contact a local university library and ask what they have available, or browse their online catalog. It also never hurts to ask if they have plans to digitize their collections; if they know that people are interested in using their materials online, they may be more likely to find or apply for money to digitize them. Another strategy is to design research projects for your students based on what you know to be available. If all of your students will have ample access to the Web for research, you might simply send them to American Memory and ask them to think of questions about one of the collections, or choose one of the collections yourself for a more directed activity.
There have been a few attempts to catalog the primary source materials available on the Web. The University of Idaho Special Collections Library provides links to more than 4,600 Web sites around the world describing holdings of manuscripts, archives, rare books, historical photographs, and other primary sources. Not all of the libraries, universities, and archives listed have materials online, however, and there is no way to search the contents of all of the sites at once. In addition, most of the sources listed will be of little use to K-12 students. The listing is, however, organized by location, and it might be a good place to start looking for North Carolina resources available both online and in person.
Final thoughts
In an ideal world, students could ask any question they wanted and find a way to answer it. The real world of research, whether in physics or psychology or history, places restrictions on the tools available to us. Sometimes, the source materials necessary to answer a particular question about the past simply aren’t available — because they were never created, because they weren’t preserved, or because they’re too far away or too difficult to get to. When you’re working with middle- and high-school students, it’s vital to know what kinds of sources will be available to them before they invest too much in a project and waste time or get frustrated because they can’t finish it. Like all historians, students will have to tailor their inquiry to the sources available to them. But remember that focusing on available sources can be a good thing, not just a restriction. Sometimes, our best ideas come to us when we stumble across a source we never knew existed! With care and forethought, you can use the Web to help students design research projects that will interest them and teach them more about history than they could ever learn from a book.



