What is going wrong with the Semantic Web?

  • Posted on: 10 April 2018
  • By: warren

The US Semantic Technologies Symposium was held at Wright State University a month ago where there were great discussions with Craig Knoblock about SPARQL servers reliability1Eric Kansa about storing archeology data and Open Context, Eric Miller about the workings of the W3, mid West farmers and old bikes, Matthew Lange about tracking crops with LOD and a 'fruitful' talk with Evan Wallace about farm data storage standards. 

Thinking through these conversations, I decided to outline what I think are the troubling conclusions for our area, namely that a) Semantic Web adoption is lagging, b) we keep rehashing old problems without moving on and c) our ongoing lack of support for our own projects after which I'll suggest a few solutions.


Talk: Ontologies, Semantic Web, and Linked Data for Business

  • Posted on: 28 November 2017
  • By: warren

13 February 2018, from 10am to 2pm.

This is a half-day workshop about the current business uses of the semantic web. It is targeted at executives, project managers and subject matter experts who want to understand what problems the technology can solve. This workshop will concern itself with the basic building blocks of the semantic web and the solutions that each aspect brings to an organization. The objective of the workshop is not to provide in-depth technical training, rather, we wish to present an overview that will enable a varied audience to determine what this technology provides for their organization. Specific aspects will include recent standards such as the FIBO and schema.org and the recruitment and training of staff, as well as opportunities for localization to different markets and lowering the cost of regulatory reporting. In order to anchor the discussions the tribulations of a fictional company, "The Triples Coffee Company", will be used to present business cases within different areas of an enterprise. Example solutions will then be outlined using a semantic web approach for each business case.

Jointly presented with Robert Warren, Ph.D., Jennifer Schellinck, Ph.D., Patrick Boily, Ph.D.


Presentation at the Canadian Linked Data Summit: Operationalizing Linked Open Data

  • Posted on: 12 October 2016
  • By: warren
CLDI LogoOperationalizing Linked Open Data
Venue: University of Montreal, 3200 Jean-Brillant, Room B-2245
Monday, October 24th 2016, 14:10 - 14:30
This talk summarizes the combined experiences of the Muninn Project and the Canadian Writing Research Collaboratory in operating large linked open data projects. Topics that will be touched on will include best operating practices, known pitfalls and realizing the promise of the semantic web for researchers.  
Presentation slides in English and in French.

Presentation: Bridging Communities of Practice: Emerging Technologies for Content-Centered Linking

  • Posted on: 1 April 2014
  • By: warren
Bridging Communities of Practice: Emerging Technologies for Content-Centered Linking
Thursday, April 03, 2014 - 1:30pm - 3:00pm
Watertable Ballroom (ABC), Renaissance Baltimore Harborplace Hotel
Baltimore, MD, USA
Presented by Douglas W. Oard
This paper describes the potential of new technologies for linking content among cultural heritage collections and between those collections and collections created for other purposes. In recent years, museum professionals, archivists, librarians, and digital humanists have worked to render cultural heritage metadata in an interoperable form as linked open data. Concurrently, computer and information scientists have been developing automated techniques that have significant implications for this effort. Some of these automated techniques focus on linking related materials in more nuanced ways than have heretofore been practical. Other techniques seek to automatically represent some aspects of the content of those materials in a form that is directly compatible with linked open data. Bringing these complementary communities together offers new opportunities for leveraging the large, diverse, and distributed collections of computationally accessible content to which many of us now contribute.

Workshop: Computational Linguistics for Libraries, Archives and Museums at CODE4LIB

  • Posted on: 12 March 2014
  • By: warren


CLLAM Workshop (Computational Linguistics for Libraries, Archives and Museums)
Code4Lib Conference 2014, Raleigh, NC, USA
Monday, March 24
Joint presentations with Corey Harper, Amalia Levi, Douglas W. Oard and Robert Warren.
We will hack at the intersection of diverse content from Libraries, Archives and Museums and bleeding edge tools from computational linguistics for slicing and dicing that content. Did you just acquire the email archives of a start-up company? Maybe you can automatically build an org chart. Have you got metadata in a slew of languages? Perhaps you can search it all using one query. Is name authority control for e-resources getting too costly? Let's see if entity linking techniques can help. These are just a few teasers.

There will be plenty of content and tools supplied, but please bring your own [data] too -- you'll hack with it in new ways throughout the day. We'll get started with some lightning talks on what we've brought, then we'll break up into groups to experiment and work on the ideas that appeal. Three guaranteed outcomes: you'll walk away with new ideas, new tools, and new people you'll have met.


Presentation at LDG2014: From the trenches - API issues in Linked Geo Data

  • Posted on: 1 March 2014
  • By: warren


5th - 6th March 2014, Campus London, Shoreditch, UK
Joint work with David Evans
This paper reports on the experiences of building a linked geo data coordinates translation API and some of the issues that arose in the process.  Beyond the basic capacities of SPARQL, a specialized API was constructed to translate obsolete British Trench Map coordinates from the Great War into modern WGS84 reference systems.  Concerns over current methods of recording geographic information along with accuracy and precision of information are discussed.  Open questions about managing the opportunistic enrichment of geographical instances are discussed as well as the scalability pitfalls therein.
Note: The final report on the workshop can be read here.

Presentation at ACAT - Ask not what you can do for Linked Open Data but what Linked Open Data can do for you.

  • Posted on: 5 December 2013
  • By: warren


Ask not what you can do for Linked Open Data but what Linked Open Data can do for you.
Monday, 9th December 2013 - 12PM
Centre for Aboriginal Studies Boardroom, Building 211, Curtin University
Presented to the Centre for Culture and Technology (CCAT)
Digital Humanities scholars have long been hampered by the twin problems of getting the data into digital form and then managing ever-increasing amounts of it. Too often, the data behind the research becomes prisoner of a 'research portal' or lost on someone's laptop. In many ways the most successful data management tool so far is the spreadsheet - a 40 year-old technology!
This talk is about linked open data, or the semantic web, an approach to the management of data that is showing promise for researchers, libraries and archives. The talk is non-technical and focuses on explaining how real-world research data problems can be solved. These include the identity of historical persons, dealing with incomplete or false data; identifying or referencing lost geographical locations and encouraging the serendipitous reuse of data in other projects. Real-world examples of problematic data from the Great War will be shown from the Muninn Project and the solutions using linked open data approaches.

Creating specialized ontologies using Wikipedia: The Muninn Experience.

  • Posted on: 25 June 2012
  • By: warren


Creating specialized ontologies using Wikipedia: The Muninn Experience.
Paper Session III, Saturday June 30, 10:30-11:30
This paper reports on the experiences of the Muninn project in creating specialized ontologies for historical governmental and military organizations using the Wikipedia data set and its linked open data companion DBpedia.  The motivation for the ontologies and the extraction methods used are explained and their performances reviewed.  Overall Wikipedia is a very accurate knowledge base from which multilingual concepts can be extracted.  The caveat is that while the information is almost always present, it is not always straightforward to retrieve because of missing structures or categorization information. Hence, an iterative methodology has been found to work best in extracting information from Wikipedia.

Do a billion documents change the First World War?

  • Posted on: 1 February 2011
  • By: warren


Do a billion documents change the First World War?
Wednesday, March 30th, 2011, 19:00-21:00
Waterloo Stratford Campus Digital Media Series
Presented by Rob Warren and Shelley Hulan


The First World War has come alive for later generations via their close reading of individual works on the war. But this war was the first lengthy international conflict to keep records on hundreds of thousands of displaced people and military personnel as they moved all around the globe, and the documents generated by them provide a rich source of insight into the times, and in the wake of the large-scale digitization of paper-based data from pre-digital periods, First World War records have the potential to touch readers anew.
Where soldiers' journals and longer accounts bring the conflict to light in a very personal way, the digitization of millions of forms and official documents concerning the "war to end all wars" allows for the detection of global patterns of migration, communication, and disease previously impossible to find using manual research methods. Mining Great War data might be feared to rob the war of its power to illuminate the costs of modern conflict, a power that has historically lain in the personal tragedies and triumphs identified with it and the revelations they offer about human suffering and human potential, not the more anonymous and repetitive information on official forms. In a discussion of the patterns and trends detectable by analyzing millions of data mine-able Red Cross files, however, we will suggest that data mining both significantly alters our understanding of the war and yet continues to move us in surprising ways.

Presentation at CASBS 2010: Muninn Project

  • Posted on: 1 June 2010
  • By: warren


The Muninn Project
Tracking, Transcribing, and Tagging Government: Building Digital Records for Computational Social Science
Tuesday June 22, 2010, 14:15-15:15
Center for Advanced Study in the Behavioral Sciences
The Muninn Project is a multidisciplinary, 
multinational, academic research project investigating millions of records pertaining to 
the First World War in archives around the world.

In this talk I will review some of the methods being used in the Muninn project to 
extract information from the scanned documents of historical archives. Previous data 
extraction efforts for historical research were done through the human review of 
documents, one at a time. We employ an approach where computing power is used to collate 
similar document types to extract the information from them.

The Great War era produced a mix of hand-written and type-written documents that require 
processing using computer extraction methods assisted by the manual reviews of specific 
cases by human volunteers. I will contrast this with previous methods that have been used 
to digitize documents, such as recapchat, and close with some observations about managing 
archival data in a high-volume setting.