Computational experiments in MIP

7th July 2016

I would like to start by saying that I don't feel that I am the most qualified person to write this blog. I don't have the wealth of experience in computational mixed integer programming of some of my peers. However, I do think that I may be able to provide a perspective to get you thinking about your own research. I have certainly been thinking about what "computational experiments" really means and I would like to present my current thoughts.

The goal of this blog is to discuss what research in computational mixed integer programming looks like. Some of the questions that I want to discuss are: How are papers written? What constitutes a valid set of experiments? What research really falls into the category of mixed integer programming? To be more concise, I want to restrict my focus to how to design experiments and present your results in papers. By the end of this blog I hope to be a little clearer in myself as to what computational MIP research really is.

The science

We really need to look at the science behind computational research. There is a pure computer science approach that looks into the nitty gritty of algorithms and complexity analysis. As we move further away, we start to get into sciences that use computers as tools. Computational MIP is one of those sciences. Other examples can include bioinformatics, the modelling component of climate science and anything that carries the tag of "big data". Computational MIP is a little closer to computer science than those previously mentioned. The main focus of Computational MIP is to use computers to solve problems by employing a variety of different algorithms.

As a science, there are many questions that can be asked. This stems from the fact that we still don't fully understand how to best solve MIP problems. The state-of-the-art in computational MIP is encapsulated in incredibly powerful general purpose solvers.

Pushing the frontier of computational MIP can take on many different forms. Commonly, research involves the development of algorithms aimed at improving the performance of general purpose MIP solvers. This method of research is very successfully displayed by Matteo Fischetti and Andrea Lodi. These two researchers have been behind many performance improvements of CPLEX. Further, many researchers from my current institute have used SCIP as a platform to develop improved solution algorithm for general purpose solvers.

So I am going to write the blog from the perspective of research into algorithm performance improvements. To start with I want to discuss to experimental design.

Experimental design

Computers may be exact, but we don't have a complete understanding of how they will operate in all situations. An example of this is the seminal talk of Emilie Danna regarding performance variability of a MIP solver. It is shown by Danna that using different machines, permuting rows and/or columns, or introducing random perturbations can have significant impacts on the solver performance for a single instance. So to address this we need a good test set and methodology to assess any performance improvements.

Good news is that benchmark test sets exist. There have been three incarnations of the MIPLIB test set, two releases of the MINLPLib test set. Also, there are countless other problem specific test sets, such as those for VRP, TSP and STP.

While benchmark test sets exist, it is important to remember that not all instances are useful for all experiments. For example, heuristics designed for set partitioning problems will not be adequately tested on the MIPLIB benchmark sets. In such cases, special purpose testsets should be used. It is possible to derive special purpose test sets from instances listed on publicly available sources. This is definitely encouraged to allow for independent people to reproduce your results or just to make a comparison. Also, it is encouraged that you publish the test sets used in your experiments online—if it is not already publicly available.

Given the presence of performance variability, it is important to ensure that the results that you report are due to your amazing feature and not just the result of random perturbations in the solving process. Achieving this is difficult. Performance variability is difficult to quantify and it is something that we don't fully understand—yet. Part of the problem is that MIP solvers are large, complex software packages with many interconnected parts. A change to just one of those parts could impact the performance of other components in the solver.

I would like to point towards feature evaluations presented by Matteo Fischetti. I am focusing on the work of Fischetti and others because these are papers that I have read recently and not because I am biased towards or against any other scientist. It is likely that there is not one best approach for evaluating solver performance, however the papers of Matteo Fischetti and others demonstrate some good approaches. One characteristic is the use of a large test set. In the paper Exploiting Erraticism in Search, Operations Research, 2014, 62, 114-122, Fischetti and Monaci use a test set of 344 instances collected from the COR@L and MIPLIB2010 instance collections. Another good set of experiments is presented by Fischetti et al. Improving branch-and-cut performance by random sampling, Mathematical Programming Computation, 2016, 8, 113-132. In this paper, because it is focusing on the impact of performance variability it is important to perform a number of runs with different random seeds. In the presented computational results the authors use 117 instances from the MIPLIB2010 Benchmark and Primal test sets. This provides a nice overview of the developed algorithmic features.

Presentation of results

The presentation of results is a part of a paper that requires a lot of consideration. There are two things that need to be considered here. The first is being fair and ensuring that all results are representative. This means that you do not cherry pick the results or be biased against the competitors. The second is that you want to present a concise set of results that do not over complicate the story that you are trying to tell. It is important to balance these two points.

To be fair to your competitors, it is important to remove any bias. The method to remove bias, which is performed in the above papers, is to separated the results based upon the worst performing solver. This criterion comes from the fact that with any test set it is uncommon for one solver to completely dominate another (if this is the case, then your new feature is really amazing). As such, the winning solver will vary from instance to instance. So to remove bias, if you separate instances by the number of nodes processed, then you must perform this separation based upon the number of nodes processed by the worst solver for each instance. This is also true for any criteria that is used to separate instances.

It is also important to be concise in your results. It is not helpful to include all computational experiments that have been performed throughout the evaluation and testing of your feature. In a lot of cases there are experiments that do not really show anything useful. So, while being fair and accurate in your reporting, conciseness is valuable in writing papers.

Go ahead and compute

This is a blog that I have been thinking about writing for a while. These topics have really become more important to me since being at ZIB. I am finding more and more that the computational experiments that you perform play an important role in the outcomes you achieve. So it is possible to be misleading in your results and but also to miss the value of the research that you have completed. So I want to say that you must be mindful of your experiments and ensure you are fair and accurate in your reporting of results.