These guidelines offer advice for those wishing to contribute documents to the Open Modernisms project. For a file to be easily included in an Open Modernisms anthology, a “master” version of the file must exist in our slightly specialized markdown format. Markdown is a simple text markup format that allows one to easily indicate simple divisions and text properties (italics, etc) in plain text. We then use pandoc, along with a set of customized scripts and templates, to produce a range of output formats (HTML and PDF, but conceivably plain text, ePub, TEI, and more), all from a single source.
The heart of these guidelines are conventions for representing a a text. This includes both “metadata” which describes the text (its author, its title, its source, and so on) and conventions for representing properties of the text itself (italics, poetic lines, etc). We pay particular attention to the sort of challenges one is likely to encounter in the essays and works at the heart of the Open Modernisms project: challenges of accurately representing poetry, epigraphs, and similar textual features that markdown alone may not easily represent.1 In addition to the guidelines presented here, our markdown is a superset of the pandoc’s markdown, so pandoc’s documentation may be valuable as well.
All metadata—information related to a text that is not the body of the text itself (its title, its original publication information, who encoded it, its date of publication)—for an Open Modernisms text is contained at the beginning of the encoded document. This information is encoded as a “YAML block” at the very beginning of the file. This YAML metadata block is marked at its beginning and end by three hyphens (---
) and should include all the metadata for the document: both bibliographical information about the source text (author, title, date, and so on), information about the markdown file (who produced it, from what sources). It may also include an editorial note explaining the text’s context/importance/relevance.
Here, for example, is a metadata block for T. S. Eliot’s “Tradition and the Individual Talent,” originally published in The Dial in November of 1923.
---
title: '*Ulysses*, Order, and Myth'
author:
family: Eliot
given: T. S.
citation:
title: '*Ulysses*, Order, and Myth'
container-title: The Dial
page: 480-483
volume: LXXV
date: 1923-11
editor: Chris Forster
source:
- http://summit.syr.edu/vwebv/holdingsInfo?bibId=911529
- http://people.virginia.edu/~jdk3t/eliotulysses.htm
- http://onlinebooks.library.upenn.edu/webbin/serial?id=thedial
---
Note that YAML format consists of a set of “keys” and “values” (separated by a colon); they are grouped together by indenting them. So, the citation
above includes title
(with the value, '*Ulysses*, Order, Myth'
), as well as container-title
, page
, volume
, and date
(each with their own values).
For details on fields, see below.
The name of an author is stored in two fields, a family name and a given name.
author:
given: Charlie
family: Chaplin
In addition to an author’s name, it is possible to include other information about an author. If you wish to include the dates of an author’s birth and death, they may be included as birth
and death
, in the format YYYY-MM-DD
. Memento mori.
There are two title
fields in the metadata block, one for the encoded document and another as part of its citation
. In most cases these two titles will be identical (simply encode the same title in two places). There are instances, however, where the title for a document in the Open Modernisms collection may differ (even if only slightly) from the title of its source text. For instance, if one wished to excerpt a text, one would use its original title in the citation
, but provide its title as “Excerpt from”Ulysses, Order, and Myth," or similar in the main title
field. Markdown markup (such as italics, etc.) can be used in titles. Please be sure to enclose keep the title in single quotation marks.2
The citation
section of the metadata header provides complete citation information about the source of the transcribed document.
Fields that may be included in a citation
are:
title
: the title of the work as it appears in the source that is being transcribed or represented.
container-title
: for works that are part of a larger work—essays in a collection, chapters of a book, articles in a periodical—the container-title
is the title of the larger item.
date
: dates must be encoded as hyphen-separated numbers of the format, YYYY-MM-DD
or YYYY-MM
or YYYY
. This date should reflect the stated publication date in the source document.
volume
: for periodical items published with volume information.
issue
: for periodical items published with issue information.
publisher
: for sources that list a publisher (usually books), this field will container the name of the publisher.
publisher-place
: for sources that list the location of publisher, that information can be included here.
page
: a page range for the pages on which the essay/article occurs.
The source
is a list of resources used to create the document—usually a list of links to electronic resources, image files, or library records of print items consulted for the transcription and encoding. It should be formatted:
source:
- [Library of Congress Copy, digitized at archive.org PDF](https://archive.org/details/harlemshadows00mcka)
- [University of Toronto Copy, Digitzed at archive.org](https://archive.org/details/harlemshadowspoe00mckauoft)
- [University of Virginia, Special Collection](http://search.lib.virginia.edu/catalog/u1282785)
Be sure to include the name of the person responsible for putting together this transcription file under editor. This field is a list of names; for convenience, also include an email address. It should be formatted:
responsibility:
- Chris Forster <cforster@syr.edu>
Any other information about the document or its transcription (including problems with encoding, justifications for edition chosen, and so on) can be included in a note.
By convention we reserve first level headers (what in HTML would be <h1>
s) for titles of works. If the work you are encoding has subdivisions of some kind other, please indicate them using level two headers, encoded in markdown as ##
; if, in turn, those sections are further subdivided use level three headers (###
) and so on.3
Example: An essay that had been divided into sections, numbered simply with Roman Numerals, could be encoded:
## I
Lorem ipsum, *et cetera*.
## II
Return of the mack...
### III
And so, in conclusion...
Most dashes one encounters are either em-dashes or hyphens. Hyphens may be simply transcribed as is. Em dashes may be transcribed as unicode em-dashes (—
) or, LaTeX-style, as three hyphens (---
). The latter encoding may be preferable because it is easier to type and it is easier to proofread.
Should you find yourself confronted by the rare en-dash (–
), typically used to separate numbers in a range, it may be encoded either with the appropriate unicode character or LaTeX-style, with two hyphens.
Nota Bene: End of line of hyphenation may be silently removed from a transcription. We are not monsters.
Encode accented or other characters from the extended Latin alphabet is to simply use the appropriate UTF-8 character. Voilà! Très facile.
Comments can be left in markdown using HTML style comments: <!-- Comments. -->
. Comments will be ignored when the source file is processed and will not appear in the output.
Such comments can be useful as a way of leaving notes or other information in a document that would be valuable to yourself or someone else working on the encoding. For instance, one could use comments to mark the location of page-breaks in a source image/text:
What facts, then, let us ask ourselves,
what elements of the spectacle before
us, will naturally be most interesting to
a highly developed age like our own, to
<!-- pb n='306' -->an age making the demand which we
have described for an intellectual deliverance
by means of the complete intelligence
of its own situation?
This encoding (which takes a cue from the TEI guidlines), marks the location within the a page breaks the running text, from 305 to 306.
Also note that this encoding respects the line breaks of the original source document/image. This makes the encoding more readable, but (because pandoc
eliminates line breaks) does not affect output.
A block quote is best encoded by encolosing it in <span class='blockquote'>
and </span>
tags. Do not use pandoc
’s blockquote syntax.4 For example:
I am here saying something wise and insightful about an
important piece of prose. Let me now show you that prose:
<span class='blockquote'>Hamster ipsum, dolorous spit.</span>
And now we return.
Nota bene: Unlike with custom divs
(used for epigraphs and poetry), be sure to not include an empty line. These items should appear, essentially, inline in the markdown and will be processed as necessary to output formats.
Marking up poetry requires two things:
|
).<div type='poetry'>
and </div>
tags, with an empty line.For instance:
Hardy begins his meditation on the Titantic's fate:
<div class='poetry'>
| In a solitude of the sea
| Deep from human vanity,
| And the Pride of Life that planned her, stilly couches her.
</div>
The sea's solitude separates the ship from
the hubris, the "human vanity," which brought
such a technological marvel into existence.
We uses spaces within a poetic line to capture the spacing of the original; this spacing will be preserved in output formats.
Note the blank lines—between the <div class='poetry'>
and the first line of poetry, and the between the last line of poetry and the closing </div>
.** These are necessary for any <div>
to be correctly processed by pandoc
.
If a document contains an image, you must first obtain a high quality (300dpi or higher) image file of the image (preferably as a PNG) and place it in a the same directory as your markdown file. Be sure to give the image file a unique filename, relating it to do the file of which it is a part (something like [author name]_[essay title]_image_001.png
would work well). To include it in the file, add the following markdown:

Example: In Ford Madox Ford’s essay “On Impressionism,” Ford offers “Hogarth’s drawing of the watchman with the pike over his shoulder and the dog at his heels going in at a door” executed in four lines. For the encoding of that essay, an image was made from the source text. It was named ford_on-impressionism_01.png
, and the appearance of the image as encoded like this:
do you know, for instance, Hogarth's drawing
of the watchman with the pike over his shoulder
and the dog at his heels going in at a door,
the whole being executed in four lines? Here
it is:

Now, that is the high-watermark of Impressionism;
since, if you look at those lines for long enough,
you will begin to see the watchman with his slouch
hat, the handle of the pike coming well down into
the cobble-stones, the knee-breeches, the leathern
garters strapped round his stocking, and the surly
expression of the dog, which is bull-hound
with a touch of mastiff in it.
To mark an epigraph please mark it as a div
of type epigraph
; this will allow it to be typeset or styled appropriately in output formats. Epigraphs themselves may contain multiple paragraphs, or lines of verse, and should be marked up accordingly. (As with all other divs
, be sure to include a blank line before both the opening and closing div
tag.)
Example: Consider this page image.
This material (including complete metadata, the details of which cannot be inferred the image above alone), could be marked up as follows.
---
title: 'The Influence of Mr. James Joyce'
author:
family: Aldington
given: Richard
birth: 1892-07-08
death: 1962-07-27
citation:
title: 'The Influence of Mr. James Joyce'
container-title: The English Review
page: 333-341
date: 1921-04
sources:
- http://search.proquest.com/docview/2441624?accountid=14214
editor: Chris Forster
---
<div class='epigraph'>
*"La via n'est de soy ny bien ny mal; c'est la place du bien et du mal, selon
que vous la leur faictes."* --- Montaigne.
</div>
## I
Obviously no valid criticism can be made of Mr. Joyce's
*Ulysses* until the whole work has been published in book
form. It seems to me that the serial publication has lasted
an abnormally long time, and that there s some excuse
for my impatience in speaking of *Ulysses* while it is
still fragmentary. Mr. Joyce's attempt is most interesting
Designed originally as an easier, more economical, way to write HTML, markdown—unlike, for instance, TEI—is not entirely suited to the challenge of richly remediating print texts. But we think its advantages (relative simplicity; a universal source format for pandoc) outweigh its lack of descriptive richness. Much of this project represents a tactical attempt to add back in some of the descriptive richness markdown lacks, without too seriously compromising its advantages.↩
This will solve YAML parsing problems that can be caused, for instance, by colons and other punctuation in a title.↩
Pandoc markdown supports two styles of headers, “Atx Headers” and “Setext Headers.” For consistency, please encode all headers in ATX style (that is, with hashmarks, #
s); avoid “Setext” headers completely.↩
The technical reason for this is somewhat complicated. Most markdown processors treat “blockquotes” as “block elements” which, therefore, cannot be included within a paragraph. We therefore encode block quotes as inline; this gives us a broader range of options when transforming to output formats.↩