Attending Linked Open Data Library and Archives Summit 2013

  • Posted on: 3 June 2013
  • By: warren


I'll be attending the LODLAM summit this June 19th and 20th focusing on using linked open data for archives and museums. New this year is a series of Challenges between the different projects, Muninn won't have an entry since grant deadlines are in the way. 

There will be an early meetup for talks on linking data between Great War Linked Open Data project on the evening of the 18th at the Kam Fung restaurant, more details here.


Paper at SEXI2013: Sex, Privacy and Ontologies

  • Posted on: 30 January 2013
  • By: warren

Sex, Privacy and Ontologies
SEXI-2013 workshop at WSDM-2013
Tuesday, February 5, 2013 at 11am
Presented by Adriel Dean-Hall
Personal profiling has long had negative connotations because of its historical association with societal discrimination. Here we re-visit the topic with an ontology driven approach to personal profiling that explicitly describes preferences and appearances. We argue that explicit methods are superior to vendor-side inferences and suggest that privacy can be maintained by both exchanging preferences independently from identity and only sharing preferences relevant to the transaction. Futhermore this method is an oppourtunity for additional sales through the support of anonymous 'drive by' shopping that preserve privacy. We close by reviewing the computational advantages of accurate profiling and how the ontology can be applied to complex real world situations.
Paper is here

Creating specialized ontologies using Wikipedia: The Muninn Experience.

  • Posted on: 25 June 2012
  • By: warren


Creating specialized ontologies using Wikipedia: The Muninn Experience.
Paper Session III, Saturday June 30, 10:30-11:30
This paper reports on the experiences of the Muninn project in creating specialized ontologies for historical governmental and military organizations using the Wikipedia data set and its linked open data companion DBpedia.  The motivation for the ontologies and the extraction methods used are explained and their performances reviewed.  Overall Wikipedia is a very accurate knowledge base from which multilingual concepts can be extracted.  The caveat is that while the information is almost always present, it is not always straightforward to retrieve because of missing structures or categorization information. Hence, an iterative methodology has been found to work best in extracting information from Wikipedia.

A Social Networking Approach to the Legal Learning Track TREC 2011

  • Posted on: 13 November 2011
  • By: warren


A Social Networking Approach to the Legal Learning Track
TREC 2011, Legal Learning Track
Legal Track, Tuesday November 15, 15:45-16:00
Plenary, Thursday November 17, 10:00-10:30
This presentation reports on the University of Waterloo experience with the Legal Learning track where three different methods were used to approach the retrieval task.  Two are based on previously used methods and the last is a novel method based on modifying the responsiveness probability using social network analysis.

Attending Linked Open Data Library and Archives Meeting in San Francisco

  • Posted on: 17 May 2011
  • By: warren


I'll be attending the #LODLAM meeting in San Francisco this June 2nd and 3rd focusing on using linked open data for archives and museums. The topic is close to some of my own interests, including those of The Muninn Project which has fairly complex modelling requirements.

Do a billion documents change the First World War?

  • Posted on: 1 February 2011
  • By: warren


Do a billion documents change the First World War?
Wednesday, March 30th, 2011, 19:00-21:00
Waterloo Stratford Campus Digital Media Series
Presented by Rob Warren and Shelley Hulan


The First World War has come alive for later generations via their close reading of individual works on the war. But this war was the first lengthy international conflict to keep records on hundreds of thousands of displaced people and military personnel as they moved all around the globe, and the documents generated by them provide a rich source of insight into the times, and in the wake of the large-scale digitization of paper-based data from pre-digital periods, First World War records have the potential to touch readers anew.
Where soldiers' journals and longer accounts bring the conflict to light in a very personal way, the digitization of millions of forms and official documents concerning the "war to end all wars" allows for the detection of global patterns of migration, communication, and disease previously impossible to find using manual research methods. Mining Great War data might be feared to rob the war of its power to illuminate the costs of modern conflict, a power that has historically lain in the personal tragedies and triumphs identified with it and the revelations they offer about human suffering and human potential, not the more anonymous and repetitive information on official forms. In a discussion of the patterns and trends detectable by analyzing millions of data mine-able Red Cross files, however, we will suggest that data mining both significantly alters our understanding of the war and yet continues to move us in surprising ways.

Presentation at CASBS 2010: Muninn Project

  • Posted on: 1 June 2010
  • By: warren


The Muninn Project
Tracking, Transcribing, and Tagging Government: Building Digital Records for Computational Social Science
Tuesday June 22, 2010, 14:15-15:15
Center for Advanced Study in the Behavioral Sciences
The Muninn Project is a multidisciplinary, 
multinational, academic research project investigating millions of records pertaining to 
the First World War in archives around the world.

In this talk I will review some of the methods being used in the Muninn project to 
extract information from the scanned documents of historical archives. Previous data 
extraction efforts for historical research were done through the human review of 
documents, one at a time. We employ an approach where computing power is used to collate 
similar document types to extract the information from them.

The Great War era produced a mix of hand-written and type-written documents that require 
processing using computer extraction methods assisted by the manual reviews of specific 
cases by human volunteers. I will contrast this with previous methods that have been used 
to digitize documents, such as recapchat, and close with some observations about managing 
archival data in a high-volume setting.


Paper at MEM2010: Canopener: Recycling Old and New Data

  • Posted on: 16 April 2010
  • By: warren


MEM2010, Monday, April 26th, 2010, 11:00am-11:30am
Presented by Cosmin Basca
The advent of social markup languages and lightweight public data access methods has created an opportunity to share the social, documentary and system information locked in most servers as a mashup.  Whereas solutions already exists for creating and managing mashups from network sources, we propose here a mashup framework whose primary information sources are the applications and user files of a server.  This enables us to use server legacy data sources that are already maintained as part of basic administration to semantically link user documents and accounts using social web constructs.