Imagine if computers could read and interpret documents. Humans could focus their efforts on understanding what analysis results mean to make better decisions. Our interest, at Dynamic Risk, is to improve the safety and reliability of energy pipeline networks by taking full advantage of the vast amounts of data locked in cumbersome formats, handwritten documents, drawings, photographs, and in paper archives.
We want to fundamentally change how we ask questions and receive answers. Today, we ask questions based on the data we have available in structured databases. In the future we want to ask questions and not worry about having the data readily available in a structured format.
To move forward towards this grand vision, we are sponsoring a challenge to solve the first part of this puzzle. We want a solution that can not only learn how and what data to extract from data sources like spreadsheets, word processor files, and computer generated PDF files but also learn how to map the data found in these documents to specified target fields in a database.
The Problem
The user will be presented a document to be "read". They will manually identify the locations in the document that contains the data required and teach the system how it maps to the target fields in the database including any manipulations to the data (ie: unit conversions) that are necessary. The software will need to learn, using a series of documents which will vary in the formatting of the same target content. For example, the learning process will need to process reports containing the same target information but enclosed in reports by different service providers, therefore the formats can be quite different.
Ideally, the solution must be able to empirically rate its level of confidence in its results as well as identify what it cannot process.
The Challenge Breakthrough
If computers could read and analyse vast amounts of information, we can focus our efforts on understanding what the results means and make better decisions. The Cognitive Computing Challenge will break the barriers that prevent us from accessing the majority of information in this world locked in written documents which require humans to interpret and extract the useful information. Our interest is to change how energy pipeline networks are managed, improve their safety, and ultimately save lives. However, this technology has broad application to multiple industries.
These types of tasks are currently done manually by humans. A solution to this challenge will automate the repetitive tasks required to do complex analysis which will eliminate most of the time required and minimize errors. Ideally, there should be as little human intervention as possible.
The winning solution for this challenge will be the one that is not only the most accurate, but also the most flexible, easy to use, and can be trained with minimal documents to extract any type of data that is required.
Challenge Structure
The Cognitive Computing Challenge has four stages:
1. Qualifying Challenge
This is a qualifying problem which requires you to process MLS (Multiple Listing Service) data from 300 MLS training records into a database. You will be provided with a document summarizing the correct format of each field to be extracted.
The files required for this Qualifying Challenge can be downloaded here:
URL: ftp.dynamicrisk.net
User: CCChallenge
PW: @@@halleng@
Filename: DRAS_sample_v1_20150605.zip
The document format is identical for all of the training documents. You must train your system to process and load the data into a database with a structure of your choice. Teams must submit a successful solution for this Qualifying Challenge before they can compete for the Cognitive Computing Challenge.
The submitted system will be tested by Dynamic Risk by processing a separate set of documents in the same format. The resulting data produced will be scored and each teams total score and the components of their score will be posted on a leaderboard visible to the public. Only scores will be published. The teams methodology and questionnaire responses will not be posted. Submissions for all teams will be tested with the same documents to derive their scores.
Feedback will be provided to all teams regarding the scoring of their submission with suggestions on where to focus their efforts to improve their scores. Teams whose approaches, in our opinion, are not suitable for the Cognitive Computing Challenge will be encouraged to resubmit an entry for the Qualifying Challenge with a different approach.
2. Cognitive Computing Challenge - Similar to the Qualifying Challenge, a set of training documents and a full description of the target attributes will be provided. The difference with this challenge is the following:
- A smaller number of training documents (approx. 100)
- A larger variance for the responses in each target field. This will require greater emphasis on cleaning the data extracted
- Variances in the document format and structure. It will not be one consistent format as per the Preliminary Challenge
Materials required to complete this stage of the challenge can be downloaded here:
URL: ftp.dynamicrisk.net
User: CCChallenge
PW: @@@halleng@
Filename: DRAS_sample_v2_20151104.rar
3. Submissions for the Cognitive Computing Challenge
4. Judging and announcement of the winner
Challenge Criteria and Challenge Prize award
One prize will be awarded at the end of the Cognitive Computing Challenge from the team or individual with a submission that meets the judging criteria will be the sole winner.
Who Can Participate?
The challenge is open to individuals, teams and organizations globally. To be eligible to compete, innovators must comply with all the terms of the challenge as defined in the Challenge Specific Agreement.
Judging Panel
The challenge will be judged by the official Judging Panel. The Judging Panel holds responsibility for evaluating all submissions against the winning criteria and the guidelines and rules for the challenge. The panel will be responsible for evaluating compliance with these rules and guidelines and will have the sole and absolute discretion to select the challenge prize recipient. All decisions made by the Judging Panel shall be rendered by a majority of the judges and are final and binding on both the competitors and Dynamic Risk Assessment Systems, Inc. and are not subject to review or contest.
A panel of highly qualified individuals will be selected to serve as the judges. All members of the Judging Panel will be required to sign Non-Disclosure Agreements acknowledging that they make no claim to the Intellectual Property developed by individual competitors, teams, team sponsors, or partners.
Submission Requirements and Rules
All submissions must meet the following requirements to be included in the judging process for the challenge. A submission that does not meet these requirements will be considered incomplete and will not be eligible for judging.
- The submission must be original work and be owned or the property of the competitor.
- All platform use agreements must be satisfied, if third-party technology is used, the competitor must have the right to use the technology in their submission.
- For all submissions, competitors must assign a royalty free, irrevocable, perpetual, transferable, assignable, worldwide license for the Intellectual Property and intellectual property rights to Dynamic Risk Assessment Systems, Inc. for commercial use. The license assigned will be exclusive for applications in the Energy industry. Competitors will own the Intellectual Property and intellectual property rights for their submissions.
- Award of the prize will be subject to international laws, regulations, withholding and reporting requirements where required.
- If any provision of the Challenge Specific Agreement is held unenforceable, the competitor agrees that such provision shall be modified to the minimum extent necessary to render it enforceable (including the adoption of equivalent terms that are specific to the jurisdiction applicable to the competitor).
- By completing the registration for this challenge, the competitor is deemed to have read, accepted and agreed to be bound by the Challenge Specific Agreement.
Challenge submissions must include:
- URL, remote access instructions, or download link with credentials to utilize the working system
- all set up instructions where necessary
- instructions for the use of the system
- completed questionairre
All submissions for the Cognitive Computing Challenge must be downloadable and will be installed by Dynamic Risk locally for judging.
Winning Criteria
Judging in each of the two stages will be based on:
- Accuracy (60% of the score) - Your trained solution will be tested on separate set of records reserved for judging. The data set produced by your solution will be compared to the data set that has been correctly processed. The comparison will be based on the following parameters:
- Precision*
- Recall
- f-measure with Precision weighted 2 to 1 vs Recall
A simple definition of these parameters can be found here:
*Note: Precision will be based on the correct insertion of a clean record into the database which will includes conversion to the appropriate units, correct number of significant digits, removal of abbreviations in text, correction of spelling errors, correct capitalization, etc.
-
Usabilty Testing (15% of the score) - Internal engineers at Dynamic Risk will assess the user interface and the practical usability of the resulting database when they apply the Judging data set. The speed of processing will also be noted. A score from 0-10 will be assessed
-
Questionnaire (25% of the score) - The Judging Panel will assess the innovators’ approach to solve the problem and score the submissions from 0 to 10 based on their flexibility and extensibility to accommodate other types of documents, different document formats, and its performance when limited training documents are available.
Selection of a Winner
Based on the above criteria, a single submission will be selected with the highest overall score as the winning innovation and will receive the prize. In case of a tie, the winner will be selected at the discretion of the Judging Panel.
Challenge Guidelines are subject to change. Registered competitors will receive notification when changes are made, however, we highly encourage you to visit the Challenge Site often to review updates.
Schedule
Milestone |
Date |
Challenge is live |
May 15, 2015 |
Registration is open |
June 8, 2015 |
Submission questionnaire is available |
June 18, 2015 |
Leaderboard updates for the Qualifying Challenge |
ongoing (June through
December 2015)
|
Competition closes/Final Challenge submissions due |
April 11, 2016 |
Judging |
April 12 to
May 31, 2016
(may be extended
depending on the
number of teams
competing) |
Winner Announced |
June 1, 2016 |
Questions
Teams/Innovators will have an opportunity to check in with the Challenge Team upon request via video conference. Details will be announced after registration opens.