Thursday, July 22, 2010

Call for coherent, systematic field summaries

The overview

As a Ph.D. student, figuring out what has already been done by past researchers is by far the hardest problem I have come across. The current expectation and requirement is that every student will browse tens of conferences and journals going back tens of years. At the end of this long and tedious process, there are several common pitfalls. First, if even a single paper in a single conference was missed, this could lead to months of work which will eventually be found to be wasted time because it was spent working on an idea that was already developed. Second, even if every relevant paper was indeed located, students almost always discover what happened in each paper individually, and rarely draw links between papers. This is critically important, as described below.

The details

I recently attended the International Computer Vision Summer School (2010). There was one session in which students were asked to read 3 papers and trace the ideas in these papers back as far as possible in the literature in the form of a tree. After reviewing the submissions, the organizer was quick to point out that there was very little overlap in the trees submitted for these 3 papers. He went on to explain how the 3 papers had essentially presented exactly the same concept, just from slightly different angles and using different terminology and notation. Thus, the trees for the 3 papers should in fact be identical! This was an excellent concrete example of the problem - it takes an "already expert" to extract these deep and extremely important connections from a literature review. Leaving it up to every new student is not only a fruitless effort (as they will not get the correct information out anyway), it is also enormously replicated effort! There should be a system in place where efforts are pooled to do this completely and correctly a single time.

Previous attempts

There are occasionally "survey papers" written. These are very close to a good solution. However, they suffer from a lack of view points, as well as a lack of frequency. These two issues are directly addressed in the following section.

The proposal

There are two phases of my proposal for action. The first is the "catch up" phase, followed by the "maintenance phase".

"Phase 1: Catch Up"

The "catch up" phase is the longest and most difficult, but also the most helpful.
I propose that the general population of a field nominate and elect a committee of experts who are the most qualified, accomplished, and knowledgeable people in the given field. These people would be charged with two tasks. The first is splitting the field into an appropriate number of subfields. I can only speak of my field of Computer Vision. One could break this field down into "Structure from Motion", "Object Recognition", "SLAM", etc. There should be 3-5 experts in each of these sub-fields on the committee. Each sub-committee is then charged with producing a survey paper of the work in this area starting as far back as possible and going to the present year. This will certainly be a large document, with many, many references, however it is important not to get lost in the task of listing references. These connection between papers and following the evolution of each idea is the central idea of this whole project. The payment for this exercise is an overwhelming sense of advancing the state of the art of scientific research procedures, as well as a resume line item which indicates that you are a recognized expert.

"Phase 2: Maintenance"

This is the easy phase! This process must be performed yearly (or at some other regular interval). Again, a committee must be selected. However, all that must be done is a short review of what has happened in this sub-field in the last year. References should NOT, for the most part, come from more than 1 year ago. This keeps these reviews linear and sequential, making them extremely easy to follow.

Potential problems

After some initial conversations with some of the field experts, it is apparent that there is potential for some political issues to be raised with a project like this. You may get people complaining "Why is my paper not included in the survey!?". You may also expose parallels that the original authors did not realize, making them feel "foolish". It is my opinion that the progress of the field and rapid absorption of young researchers is much more important than protecting an individual from this type of silly whining.

Potential benefits

If students could read a couple of these documents and be fully caught up on the state of the art of their sub-field, many new doors would be opened. First, people would not be so restricted to a single sub-field. It would be possible to keep current very easily in multiple sub-fields by simple reading through these documents when they are released yearly. I have seen many times where a solution from an outside field has been adapted to a problem in the field with amazing results. Second, students could move forward confident in the fact that their work is actually on a track that the field is interested. They could also be certain that their work has not been previously attempted. The time savings when multiplied by the number of students is incredible. By applying the correct resources (the experts) in the correct places (a directed effort of these systematic summaries), a much more efficient community can certainly be achieved.

Conference Summary Committees

At every professional conference, hundreds of papers are presented. This can quickly become quite overwhelming. For people in attendance, the game plan seems to be to scan the list of titles in the conference schedule to see which posters and talks seem most interesting and/or most applicable to the individuals research objectives. To be sure, a major goal of conference going is to network and make new contacts. However, there should be another major goal which is often talked about but overlooked for the most part. That goal is keeping current with ideas and discoveries in fields related but not exactly in your research area. This is nearly impossible by simply walking around and looking at posters.

Enter the solution. At each conference, a panel should either be appointed or elected. This panel should consist of leading experts in many or most of the sub-fields represented at the conference. These experts should have a discussion at the end of the conference to decide what the serious contributions were at this conference. It is no big secret that the majority of papers submitted to a conference are incremental improvements on existing methods with mildly better results. However it is quite tough to pick out these "serious" papers without a solid background in the sub-field that they came from. Therefore, it should be up to this proposed panel to construct a short document (<5 pages or so) "summarizing" the contributions of the conference. This would allow not only conference attendees to receive the "take home messages" at the end of the conference, but also for people who were not able to attend the conference to have the big picture idea of what they missed. Handing a colleague a DVD with 400 abstract and papers and saying "here are the proceedings" is almost certain to invoke the same exercise of scanning titles and reading only papers relevant to his current research. If, instead, one could hand a colleague a 5 page document and say "this is what happened at the conference", the entire field would stay much more informed and up to date.

Students are not being prepared for industry

I can only speak about my field (computer vision and image processing), but I imagine the situation is similar across the board. What we learn in college are "the fundamentals" - the theoretical (often too much so) ideas of many topics. We are seldom asked to implement these ideas in software. When we are, it is done with absolutely no consideration of the process - that is, you can use which ever language you want, which ever method you like for revision control (including none!), work by yourself or in a group of your size and choosing, and the list goes on. The only thing that is important is the result. When you get to an industrial setting, exactly the opposite is true. Working on a team of programmers is critical. You must understand how to share responsibilities, ideas, and code. These are the most important skills for success in any real setting, and they are rarely exercised - and definitely not taught - in college.

After a recent interaction with the hiring manager for GE Global Research Center, I have learned that they actually plan for at least an entire year of negative productivity from new students. That is, new hires are an investment. They hire a new student with the intention of training them for at least a year before they start adding value to the company. It seems to me like this transition should be much much smoother. It should (clearly?!) be part of the responsibility of post-secondary education institutions to prepare students for their next life role as an employee.