Materials
Preliminary list of recommended materials for Developers’ Challenge (items may be added). (Full license and availability information.)
dataset | format(s) | size | availability | license |
Archimedes Palimpsest transcriptions | XML: TEI P5 | 5.6 MB | www.archimedespalimpsest.net/ | CC-BY |
Archimedes Palimpsest images | TIFF | approx 1 TB | www.archimedespalimpsest.net/ or on HD | CC-BY |
British Prints Database: www.bpi1700.org.uk | MySQL dump + online images | MySQL dump of metadata: 21.7 MB | CD; images images.cch.kcl.ac.uk/bpi/ (not to be redistributed) | CC-BY-NC |
Centre for History and Analysis of Recorded Music (CHARM) catalogues | bespoke XML + METS | 145MB | CD | CC-BY-NC |
Clergy of the Church of England: www.theclergydatabase.org.uk/index.html | MySQL dump | dump is 474 MB | CD | CC-BY |
DEMOS (text of articles) | XML: TEI P5 | 2.4 MB | CD | CC-BY-NC-SA |
Domesday/Prosopography of Anglo-Saxon England project | Spreadsheet | CD | CC-BY-NC | |
Duke Databank + HGV + APIS (papyri transcriptions, translations + metadata) | XML: TEI P5 (EpiDoc) | 2.2 GB | git clone idp.atlantides.org/git/idp.data.git/ | CC-BY (except APIS) |
Euripidies Scholia | pseudo-TEI P5 | 500 KB | euripidesscholia.org/sourceFiles/ | CC-BY-NC-SA |
Greek, Roman and Byzantine Pottery at Ilion | HTML, JPG, RDFa (+KML) | 345 MB | classics.uc.edu/troy/grbpottery/ | CC-BY-NC-ND |
Hofmeister | TEI XML + Authority files | 115MB + | www.hofmeister.rhul.ac.uk/2008/content/reference/thesaurus_download.html | CC-BY-NC-SA |
Homer Multitext images | TIFF, JPEG2000, JPG, Pyramid TIFF, +c | >500 GB (TIFFs alone), several TB total | amphoreus.hpcc.uh.edu/ | CC-BY-NC-SA |
Inscriptions of Aphrodisias | XML: TEI P4 (EpiDoc) | 6.6 MB | insaph.kcl.ac.uk/iaph2007/xml/inscriptions.zip | CC-BY |
Inscriptions of Aphrodisias: feeds | Atom | 2.2 MB | concordia.atlantides.org/examples/iaph2007.atom | CC-BY |
Inscriptions of Roman Tripolitania | XML: TEI P4 (EpiDoc) | 10.2 MB | irt.kcl.ac.uk/irt2009/redist/inscr/irt2009_inscriptions.zip | CC-BY |
Inscriptions of Roman Tripolitania: feeds | Atom | 2.2 MB | irt.kcl.ac.uk/irt2009/index.atom | CC-BY |
Inscriptions of Roman Tripolitania: geodata | KML | 400 KB | irt.kcl.ac.uk/irt2009/redist/maps/tripolitania_earth.kml | CC-BY |
Jonathan Swift Archive | bespoke XML | 35 MB | CD | CC-BY-NC |
Khirbat al-Mudayna al-Aliya excavations | Atom + images + structured data | opencontext.org/sets/Jordan/Khirbat+al-Mudayna+al-Aliya | CC-BY | |
Nineteenth Century Serials Edition | Plain text | 2.6 GB | DVD | CC-BY |
Nomisma.org (ancient coins) | RDFa (+KML) | 2.3 MB | nomisma.org/nomisma.org.xml | CC-BY-NC |
Old Bailey Transcripts | bespoke XML | > 1 GB | FTP | non-commercial (license required) |
Perseus Greek and Roman texts | XML: TEI P4 | 340MB | nlp.perseus.tufts.edu/hopper/opensource | CC-BY-NC-SA |
Perseus Treebanks (grammatical markup) | XML | 10 MB | nlp.perseus.tufts.edu/syntax/treebank/ | CC-BY-NC-SA |
Petra Great Temple Excavations | Images + KML + Atom | opencontext.org/sets/Jordan/Petra+Great+Temple | CC-BY | |
Stormont Papers (Hansard): text | XML | 47 MB | CD | non-commercial (license attached) |
Stormont Papers (Hansard): geodata | KML | 78 MB | CD | non-commercial (license attached) |
Victoria and Albert Museum Collections | JSON via webservice | API doc: www.vam.ac.uk/api | non-commercial (terms online) | |
Vision of Britain relational data (www.visionofbritain.org.uk) | postgres dump | 2GB | DVD | CC-BY-NC-SA |
Vision of Britain historic mapping | georeferenced rasters | www.visionofbritain.org.uk/maps | (images not for redistribution) | |
WGBH OpenVault metadata records | Dublin Core and PBCore | 3000 records | internet access via OAI-PMH from Fedora repository (openvault.wgbh.org/fedora/oai), a Solr request handler (openvault.wgbh.org/solr/select) | non-commercial (terms online) |
WGBH OpenVault Vietnam interview transcripts | TEI with SMIL & RDF | 230 records | openvault.wgbh.org/api/dhdev | non-commercial (terms online) |
WW1 Poetry Archive | JPG + metadata CSV | 60 MB sample; full >10 GB | sample on CD; remainder scrapable from www.oucs.ox.ac.uk/ww1lit | non-commercial (license attached) |