Slide 249 -
|
1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University gomes@cs.cornell.edu Ken McAloon and Carol Tretkoff ILOG {mcaloon,tretkoff}@ilog.com 2 AI, OR, and CS AI OR CS 3 Integration of Artificial Intelligence & Operations Research Techniques AI Representations
Constraint Languages
Logic Formalisms
Object-Oriented Prog.
Bayesian Nets
Rule Based Systems
• • •
Tools
Constraint Propagation
Systematic Search
Stochastic Search
• • •
Pros / Cons
Rich Representations
Computational Complexity OR Representations
Mathematical
Modeling Languages
Linear & Non-linear
(In)Equalities
• • •
Tools
Linear Programming
Mixed-Integer Prog.
Non-linear Models
• • •
Pros / Cons
More Tractable (LP)
Primarily Complete Info
Limited Representations Combinatorial Problems Planning Scheduling THE CHALLENGE
AI OR
UNIFY APPROACHES TO: SCALE UP SOLUTIONS
HANDLE UNCERTAINTY
ANALYZE COMPLEXITY (phase transition) EXPLOIT PROBLEM STRUCTURE
INCREASE ROBUSTNESS 4 Outline
I. Short Overview of OR
II. Disjunctive Programming and Hybrid Solvers
III. Exploiting Randomization to Solve Hard Combinatorial Problems
IV. Conclusions 5 I. Short OR Overview 6 Outline for Linear Programming and Integer Programming
Standard Form of LP and a Simple Example
Geometric Interpretation of LP
Complexity issues
MIP
Example: Fast Food
Example: Capacitated Warehouse
Example: 911
7 Outline
1. Short Overview of OR
2. Constraint Programming
3. Cooperating Solvers
4. Disjunctive Programming
5. Exploiting Randomization to Solve Hard Combinatorial Problems
6. Conclusions 8 Optimization Technology Evolution Dispatch
Rules 1960 1970 1980 1990 SA, GA, Tabu CPM
PERT Constraint-based Scheduling 1998 1947 Primal Simplex LP Parallel
LP/MIP Concurrent
Scheduling Interior Point Constraint
Propagation Large IPs MIP Shifting
Bottleneck First CP Systems Cooperating
Solvers (LP/CP) Global constraints Barrier LP Barrier Crossover Dual Simplex Implementation Dual Simplex 9 1. Short OR Overview 10 Outline for Linear Programming and Integer Programming
Standard Form of LP and a Simple Example
Geometric Interpretation of LP
Complexity issues
MIP
Example: Fast Food
Example: Capacitated Warehouse
Example: 911
11 An LP Story A factory can produce n products from m parts
For product j it needs aij units of part i
There are bi units of part i available
Each unit of product j sold earns cj
Amount of each product to make is unknown xj 0
Each part i determines a constraint
ai1 x1 + … + ain xn bi
Obvious solution: do nothing
Better: maximize c1 x1 + … + cn xn
12 Standard Forms of LP A linear program (LP) in standard form (Dantzig 1947)
max cTx
subject to Ax b
x 0
Input data: c (n x 1), A (m x n), b (m x 1).
Variables: x (n x 1)
13 Standard Forms of LP // The objective function
max c1 x1 + … + cn xn
// The constraints
subject to
a11 x1 + … + a1n xn b1
...
am1 x1 + … + amn xn bm
x1 0 , … , xn 0 14 Standard Forms of LP
In OR emphasis is on optimality
Solution means optimal solution
Feasible solution means solution in the ordinary sense 15 Standard Forms of LP Interpretation of standard form:
xj = amount of product j to make
cj = revenue per unit product j
bi = available amount of component i
aij = units of i used per unit of j produced
The constraints “say”:
aijxj = units of i used by j
= units of i used
bi
16 What are models? A model is a data-independent abstraction of a problem
A model lets you write down the mathematical representation of a model independently of the data
Project Model Data One
Problem
Instance 17 Products Could be Jewelry Products: Rings and Earrings
Components: Gold and Diamonds
One ring requires 3 units of Gold, and 1 Diamond
One set of earrings requires 2 units of Gold, and 2 Diamonds
Total Gold and Diamonds are limited
Profit is different for Rings than for Earrings Products = { rings, earrings };
Components = { Gold, Diamonds };
demand = [ [3, 1], [2, 2] ];
stock = [150, 180];
profit = [60, 40];
18 Products: Ammonium Gas = NH3 Ammonium Chloride = NH4Cl
Components: Nitrogen, Hydrogen, Chlorine
One unit of Gas requires 1 unit of Nitrogen, 3 units Hydrogen
One unit of Chloride requires 1 unit of Nitrogen, 4 units Hydrogen, and 1 unit of Chlorine
Total Nitrogen, Hydrogen, Chlorine is limited
Profit is different for Gas than Chloride Products Could be Chemicals Products = { gas, chloride };
Components = { nitrogen, hydrogen, chlorine };
demand = [ [1, 3, 0], [1, 4, 1] ];
stock = [50, 180, 40];
profit = [30, 40]; 19 The Problems Have One Model enum Products ...;
enum Components ...;
float+ demand[Products, Components] = ...;
float+ profit[Products] = ...;
float+ stock[Components] = ...;
var float+ production[Products];
maximize
sum (p in Products) profit[p] * production[p]
subject to {
forall (c in Components)
sum (p in Products)
demand[p, c] * production[p]
<= stock[c]
}; 20 OR Modeling Systems
OPL
AMPL
2LP
AIMMS
GAMS
MPL
ILOG Planner
etc
21 The Dual The dual linear program (von Neumann 1947);
min yTb
subject to yTA c
y 0
Variables y (m x 1)
Awesome Symmetry -
The dual of the dual is the primal
22 Rows and Columns Exchanged
min b1 y1 + … + bm yn
subject to
a11 y1 + … + am1 ym c1
...
a1n y1 + … + amn ym cn
y1 0 , … , ym 0 23 Duality Theorem Theorem: min yTb = max cTx
Consequence: This turns optimality problem into a feasibility problem in x and y
Ax b
x 0
yTA cT
y 0
yTb = cTx
Consequence: Enumeration not needed to verify optimality
24 Duality Theorem
Sensitivity Analysis
Consequence: The solution values y* for the y variables yield the Lagrange multipliers of the primal constraints which measure the rate of change of the objective function with respect to the right hand side bounds b
yi * = Z / bi where Z is the optimum
Reference: McAloon and Tretkoff [1996] Wiley Duality Two different views of the same phenomenon Point vs Set Arc vs Node Momentum vs Position Vector vs Hyperplane Landlord vs Renter 26 Simplex and Barrier
The simplex algorithm turns the feasibility problem into a iterative repair process with a powerful evaluation function
The barrier method transforms the LP into a system of differential equations that describe a vector field of flow on the polytope 27 Geometric Interpretation of LP
X Y Max: X
subject to:
-X + Y <= 4
X + 4*y <= 36
2*X + y <= 23
X + Y >= 4
Y >= X + 10 (0,4) (4,0) (8,0) (10,3) (4,8) Barrier Simplex 28 Complexity of Linear Programming Simplex Method
Worst-case --- exponential (Klee and Minty 72)
Practice --- good performance
Ellipsoid Method
Khachian’s Ellipsoid Method
Worst-case --- polynomial
Practice --- poor performance
29 Complexity of Linear Programming Interior Point Methods or Barrier Methods
“Karmarkar’s” (and variants) Method
Worst-case --- polynomial
Practice --- good performance
30 Complexity of Linear Programming
Despite its worst case exponential time complexity, the simplex method is usually the method of choice since it provides tools for sensitivity analysis and its performance is very competitive in practice.
Which method performs best is problem dependent. 31 Success Stories Industrial Planning
Given current resources, decide what to produce in what quantity
Supply Chain Management
Multiperiod planning models that link flow from one period to the next
Network Flow
How best to route goods across a network 32 Assumptions of Linear Programming
Linearity
when violated: ( xy = 50)
Nonlinear programming
Continuity
when violated: (x integral)
(Mixed) Integer programming
33 Assumptions of Linear Programming - continued
No Disjunctive Constraints
when violated: (x 100 or x 0)
Disjunctive programming
Additional 0-1 variables and Big M constraints
Certainty
when violated: (cost c is a random variable)
Stochastic programming
34 Search and MIP
In order to deal with variables that must have integer values in the solution, a search must be performed.
Mixed Integer Programming problems are combinatorial optimization problems and are NP hard
feasibility is NP-Complete
verifying optimality is co-NP-Complete
35 MIP and Combinatorial Optimization
These problems have been attacked by both the AI and OR communities.
In AI, these problems are attacked as CSPs or as Planning Problems.
In OR, they are done as MIPs and use linear relaxation to help guide the search.
The overriding idea in each case is to limit search.
36 Integer Program: All Integer Points in Region 37 Cut to Create Integer Vertex Integer Vertex 38 Example - Fast Food
Question: Is it possible for a male college student to eat at the local fast food outlet and still meet the requirements of a balanced diet?
If so, what is the least he can do it for? 39 Nutritional Requirements
At least 100% of vitamins A, C, B1, B2, niacin, calcium and iron
At least 55 grams of protein
At most 3000 milligrams of sodium
At most 30% of the calories can come from fat
Nutritional information is available from fast food outlets 40 College Student’s Requirements
At least 2000 calories a day
No more than 3 servings of any one food
Milk only with cereal and not as a stand-alone drink 41 Fast Food - MIP Model
We will have variables Servk to represent the number of servings of item k in the plan.
The variable Servk will have to take an integer value for the solution to be valid.
The objective function: Z for cost 42 Fast Food - MIP Model
Let foodk,j represent the percent of RDA of nutrient j in a serving of item k
The for each nutrient j, we have a constraint
foodk,j Servk 100
k
43 Fast Food - MIP Model
Let sodiumk represent the amount of salt in a serving of item k
For salt we have the constraint
sodiumk Servk 3000
k
Similarly for fat
44 Fast Food - MIP Model
Let costk represent the cost of a serving of item k
For the objective function we have the defining constraint
costk Servk = Z
k
45 Fast Food - Solution
With a MIP solver and a way to input these constraints we ask for
a solution that makes the variables Servk integral
and which minimizes Z 46 MIP Solution Technique
What the MIP solver does is to carry out a branch and bound search guided by
the linear relaxation
the solution to the problem with the integrality requirements relaxed
Initialize the global variable best_so_far to 1000 (or something else very big).
47 At a Node
Compute a solution to the linear relaxation which minimizes Z yielding z*. Prune this node if
z* best_so_far ,
If all values of Servk are integral, this is a solution. Set best_so_far = z*. Save this node.
48 Branching at a node
Choose a variable Servk whose value s* is not integral.
Typical heuristic: most non-integral variable
Create two child nodes,
add Servk floor(s*)
add Servk ceil(s*)
49 Good News
The linear relaxation can prune nodes before all variables Servk are forced to be integral.
Surprisingly often a node “high in the tree” will turn up with all relevant variables integer. Here’s why
A solution to the LP is at a vertex
A vertex is defined as the simultaneous solution of the equality form of n linearly independent constraints
Many of these constraints are integer bounding constraints yielding X = integer
50 Arboreally Speaking
Breadth first search is often preferred - it visits the “smallest” number of nodes needed to find and verify the optimal solution - analogous to A*
If the linear relaxation is tight
| z*linear - z*integral | is relatively small
then z*linear is an excellent evaluation function 51 Answer - Fast Food
Total cost is 8.71
Buy 3 burgers
Buy 2 fries
Buy 3 honeys
Buy 1 yogurt
... 52 Example - Fixed Cost
Warehouses must be rented in order to supply stores and we must decide which to use
For each store j we know its monthly demand dj
For each warehouse i we know its capacity ki
For each warehouse i we know the fixed cost to run it each month fci
For each pair i, j we know the monthly cost cij of supplying j from i 53 Example - Fixed Cost
Xij is the fraction of store j’s demand met by i
Xij 1
Yi is a “fuzzy” boolean
it will be 1 if the warehouse is rented
0 if it is not rented
Yi 1 54 Example - Fixed Cost
Each store must be supplied
X ij = 1
i
Warehouse capacity can not be exceeded
dj Xij ki
j
Tighter
dj Xij ki Yi
j 55 Example - Fixed Cost
Objective function
fci Yi + cij Xij
This yields a MIP with 0-1 variables Yi
56 Branch and Cut: An Enhanced Solution Method
Cuts - redundant constraints for the MIP model but not redundant for the linear relaxation
Xij Yi
Add at a node if violated by solution to linear relaxation
Powerful method - will solve the Imperial College OR lib CW problems very easily 57 Example - Call 911
PCTs answer the phone 24 hours a day, 7 days a week.
It is known how many PCTs should be on duty during each of the 168 hours during the week in order to assure the necessary response rate.
Workers can arrive at any hour and they work for 8 hours except for a one hour break after 4 hours. 58 Example - Call 911 Each PCT has a work week of 5 days followed by 2 days off.
Want to meet the demand with minimal or near-minimal number of PCTs.
So need to determine how many PCTs start their work week at each hour h of the week
59 Modeling 911
A continuous variable Pcth will represent the number of workers who start their work week at hour h, 0 h < 168.
60 Modeling 911
A continuous variable Z will represent the objective function
Pcth = Z
h
There will be a constraint for each hour h to assert that there are enough workers on duty at that time. The rhs of this constraint is bh = the number of workers needed.
61 Modeling 911
For this constraint we need to represent the number of workers who are on duty at time h
Certainly, those who start the week at time h are here, as are those who started the week at time h - 1
And so on back to time h - 7 with the exception of those who started at time h - 4 and who are now on break.
62 Modeling 911
This also applies to the previous 4 days. When the smoke clears, we sum over the workers w who are working at time h
Pctw bh
w
63 Call 911 solved with progressive roundoff int b[168] = { // New York City 911
30,24,18,15,14,14,15,25,34,36,38,40,
41,43,46,57,57,59,61,59,55,50,45,38,
32,25,20,17,15,13,17,25,32,35,38,40,
42,43,47,58,57,57,59,57,55,52,47,41,
33,25,20,17,15,13,15,25,32,33,37,39,
42,43,47,57,56,57,57,56,53,50,47,41,
34,27,22,19,16,15,16,25,31,35,37,40,
44,45,48,57,57,56,58,56,53,53,46,41,
34,28,23,19,16,15,17,25,33,37,39,42,
45,47,51,59,58,60,61,61,57,56,57,55,
48,41,35,30,26,20,18,22,26,32,42,46,
49,53,54,56,56,56,59,59,57,57,56,56,
52,46,41,34,29,23,18,19,25,31,36,41,
46,50,52,53,52,53,54,53,50,49,45,40
}; 64 Modeling 911
Subject to these constraint we want to find a solution which makes the Pcth integer and which makes Z small.
The naïve approach is to compute the minimal linear solution and to round up all the values of Pcth to the nearest integer.
The linear relaxation yields Z = 204.67 “fuzzy workers” but rounding yields a mediocre integral solution of 259 workers.
65 Modeling 911
For this and many other applications, heuristics can be used to develop good solutions
Progressive Roundoff - solve the linear relaxation, round up first variable and freeze it, re-solve etc.
66 Solving the Integer Problem main() // Planner Code
{
IlcInitFloat();
IlcManager m(IlcNoEdit);
IlcLinOpt simplex(m);
IlcFloatVarArray Pct(m,168,0,1000);
IlcFloatArray coeffs(m,168);
int i,j,k,h,n; 67 Solving the Integer Problem
// Pctw bh
w
for(h=0;h<168;h++) { // for each hour of 168 in week
for(j=0;j<168;j++)
coeffs[j] = 0;
for(k=0;k<5;k++) // for each of 5 days
for(j=k*24;j= b[h]);
}
68 Solving the Integer Linear Problem
IlcFloatVar Z = IlcSum(Pct);// Objective
simplex.setObjMin(Z);
for(i=0;i<168;i++) { //Progressive roundoff
n =ceil(simplex.getCurrentValue(Pct[i]));
// Fix variable and re-optimize
simplex.add(Pct[i] == n);
}
m.out() << “Number of Pcts needed is “ << Z << endl;
m.end();
} 69 Solution
This code finds a solution with 208 workers in a couple of seconds. The optimum is 207.
The heuristic works well in part because if there were no lunch breaks, it would find the guaranteed optimal solution
[Bartholdi,Ratliffe,Orlin]
70 2. Constraint Programming LP/MIP is Beautiful, except when Variable domain information is important to the search strategy
especially critical in scheduling The problem variables range over symbolic entities and there are lots of symmetries
timetabling The MIP representation can be too verbose or awkward
configuration There are just too many constraints
e.g. vehicle routing 72 Mathematical Basis of Constraint Programming (CP)
The Constraint Satisfaction Problem:
Suppose a finite set of variables is given and with each variable is associated a non-empty finite domain.
A constraint on k variables X1,…,Xk is a relation R(X1,…,Xk) D1 x …x Dk.
A constraint satisfaction problem (CSP) is given by a finite set of constraints.
A solution to a CSP is an assignment of values to all the variables so that the constraints are satisfied. 73 Domain Reduction In CP, each constraint of a CSP is considered as a subproblem and techniques are developed for handling frequently encountered constraints.
With each constraint is associated a domain reduction algorithm which reduces the domains of the variables that occur in the constraint.
Accelerates convergence toward a solution
Detects infeasibility
74 Constraint Propagation The other key issue is communication among the constraints or subproblems.
The basic method used is called constraint propagation which links the constraints through their shared variables.
The important thing about this setup is that it is very modular and independent of the particular structure of the individual constraints. Monsieur Jordan Phenomenon Like prose, you have been doing constraint propagation all your life.
Crossword puzzles
Incomplete and so backtracking is needed
NY Times Sunday Crossword
Optical Illusions
Origin: Vision analysis (Marr,Waltz et al) 76 Strengths of Constraint Programming Constraint Programming provides a rich Rich
Rich representation language.
CP variables naturally represent problem entities and the constraints do not have to be translated into a specific problem format such as MIP or SAT.
Opportunity to choose a good heuristic for the solution strategy. 77 Which Method for Which App? Product
Mix LP Production
Planning MIP Distribution
Planning MIP Scheduling Constraint
Based
Scheduling Dispatching CP
Local search Configuration CP Technology Application Linear => Disjunctive Constraints
Strategic => Operational Optimization 78 3. Cooperating Solvers First Stop CP/CP 80 Mother of All Examples - N Queens
Do we think in terms of queens
Where do we place this queen ?
Do we think in terms of squares
Will this square contain a queen ?
These views are dual to one other The Primal View For each queen assign it a square Place this queen in this square ? The Dual View For each square decide whether it will have a queen Place a queen in this square ? 83 The Primal Model In which row do we place q[j] - the queen in column j The constraints
q[i] != q[j]
q[i] - q[j] != i - j
q[i] - q[j] != j - i Note: no alldifferent constraint 84 Yet Another duality - rows vs columns In which column do we place qq[i] the queen in row i The constraints are the same
qq[i] != qq[j]
qq[i] - qq[j] != i - j
qq[i] - qq[j] != j - i 85 The Relationship Can link them as inverse functions:
q[qq[i]] = i
qq[q[j]] = j
The constraint propagation
i leaves domain of q[j] iff j leaves domain of qq[i] 86 In this primal/dual model
Apply first-fail to q[i]
Lo and behold
one-third fewer fails
(Example from Jean Jordan’s thesis)
87 88 Q Q X X X X X X X X X X X X X X X 89 Q Q X X X X X X X X X X X X X X X Q X X X X X X X X X X 90 Q Q X X X X X X X X X X X X X X X Q X X X X X X X X X X Q X X X X 91 So …
The cooperating primal-dual formulation “captured” the generalized arc consistency of the alldifferent constraint The arc consistency of this global constraint is non-trivial to maintain
Network flow algorithms
flow goes from values to variables
each variable has unit demand and capacity 92 Remarks
An IP model will encode the first dual solution
Will this square contain a queen
x[i][j] = 0 or x[i][j] = 1 A disaster beyond 30 queens
network structure on rows and columns lost Another example - sports scheduling 93 Constraints and Indices
In IP symbols are represented by indices as opposed to values for variables.
nurses, teams
Paris-St-Germain plays Manchester United on day k
xijk {0-1} to represent “team i plays team j on day k”
You can’t put symmetry breaking and other constraints on indices.
Second Stop CP/IP 95 CP Is Powerful, But ….
Sometimes, inconsistencies can be overlooked
X - Y 12
X + Y 10
X in [1..20]
Y in [1..20]
Domain reduction on each constraint and constraint propagation will not reduce the domains although the system has no linear solution
but an LP solver would spot this 96 2 Dimensional Bin Packing Application for the Automobile Industry built by Greg Glockner
97 2 Dimensional Bin Packing The problem here is to put as many small rectangles in a big rectangle with 90 degree rotation allowed.
The actual application involves circuit boards
There are two complete models, one a CP model and the other an IP model.
The CP model directs the search
The LP relaxation prunes the search space by detecting infeasible nodes
98 2-D Bin Packing
Arrange circuit boards onto raw material
Boards may be rotated
Use same number of each board
Objective: minimize scrap Classic combinatorial optimization problem 99 Solving 2-D Bin Packing Use CP to generate partial solutions (nodes)
Restrict placement to reduce fragmentation of blank space
Use tight LP to test feasibility
If any partial solution is infeasible in the LP, prune the tree immediately CP constraints reduce the tree width
LP allows us to prune quickly 100 2 Dimensional Bin Packing As the search tree is traversed, the two models are in sync.
Note that the variables used in the 2 models are disjoint
The two models are dual to each other
The IP sees the model from the point of view of the board, the large rectangle
The CP sees the model from the point of view of the small rectangles
Solutions are obtained in minutes 101 2DBP: Basic CP Formulation Let (xi, yi) be the location and (wi, hi) be the dimensions of the ith tile
Basic constraints:
Disjunctive constraints to prevent overlapping tiles xi + wi £ xj Ú yi + hi £ yj Ú xj + wj £ xi Ú yj + hj £ yi
Constraints to count the number of each tile type
Tile-oriented formulation 102 2DBP: Basic IP Formulation Let xijnt = 1 if tile n of type t is in position (i, j)
The constraints are:
t n j i x j i x t n x ijnt h j j w i i j j i i t n j i nt j i j i ijnt t t , , , } 1 , 0 { , 1 , 1 , , , : , , , , " Î " £ " = å å + ¢ < + ¢ < £ ¢ £ ¢ ¢ ¢ ¢ ¢
Grid-oriented formulation
103 2DBP: LP Issues
The LP is large
The LP exhibits significant primal degeneracy
The LP exhibits significant dual degeneracy
104 2DBP: LP Issues
The simplex algorithm cannot solve the LP
There is no way for a MIP solver to solve the IP as such
The barrier method can solve the LP 105 2DBP: Summary
CP as master problem
Orders tiles
Places tiles by position, then type
Selects tile type by frequency to scatter tiles throughout the bin
Uses a one-ply lookahead constraint to limit the position of following tile
LP relaxation prunes the CP search space
Checks whether the partial solution will lead to an infeasible instance
Use idiomatic formulations for CP and IP 106 2DBP: Remarks
The CP fixes significant numbers of variables at each node
The LP pre-processor greatly simplifies the LP
Therefore, the lack of incrementality of the barrier method does not cost us 107 2DBP: Cooperative Algorithm Demo 108 Last Stop Constraint Programming and Local Search cooperation Another example of duality in action 109 CP/LS Parallel machines with set-up times Ready times
Dues dates
Splittable jobs
Rogue machines Objectives
meet due dates
minimize setup costs
110 Two Phase Cooperation Phase I - the Primal (Work on first objective)
Configure and schedule the jobs
Use constraint based scheduling 111 Two Phase Cooperation Machines morph into trucks
112 Two Phase Cooperation Phase 2 - the Dual (Work on second objective)
Schedule the trucks
Use Lin, Lin-Kernighan, tabu etc 113 Parallel Machines: Cooperative Algorithm Demo 114 IC Park Example Hoist Scheduling (Rodosek and Wallace)
The original model is an IP
The CP model is “the same”
CP guides search, LP relaxation and CP share pruning duties
No apparent duality 115 Remarks One can get great benefit with CP/IP algorithms CP/CP algorithms and CP/LS algorithms
IP/LS is just around the corner IP/IP cooperation is hard because one can’t formulate truly dual views
either simply not there
or too verbose
counterexamples welcome 116 4. DISJUNCTIVE PROGRAMMING 117 Disjunctive Linear Programming
An extension of Mixed Integer Programming
A union of polyhedral sets (feasible regions) is called a disjunctive set.
118 Disjunctive Set 119 Disjunctive Linear Programming
The problem of determining whether the intersection of a family of disjunctive sets is non-empty is called the disjunctive linear programming problem or simply disjunctive programming problem.
The solution set of the disjunctive programming problem is
Ç È Fij
i= 100
or X == 0
Rather than
X <= BigM*Y ,
X >=100*Y,
Y a 0-1 variable
121 Solution Set Inside Initial Region 122 Disjunctive Linear Programming Examples - continued
Bollapragada, Ghattas and Hooker
Truss structure design problem
Branches directly on alternatives dictated by Hooke’s Law
Wyatt
Disjunctive programming and mean absolute deviation models (MAD) for portfolio optimization
Extends Bender’s decomposition to disjunctive linear programs
123 Disjunctive Linear Programming continued
Balas, Cornuejols and Ceria
Generating cuts for disjunctive programming problems.
McAloon and Tretkoff
Basic mathematical results: Optimization and Computational Logic, Wiley
124 Disjunctive Linear Programming continued
Dealing with the disjunctive part requires search.
This requires an engine which is not available in MIP packages
Also the linear relaxation is not as tight and the evaluation function is not as faithful
The solution is to use a CSP solver and an LP based solver in tandem - cooperating solvers
Beringer and DeBacker for MIP
125 To Keep It Simple GERALD + DONALD = ROBERT An AI classic
Newell and Simon Assignment problem + 1 constraint Surprisingly hard for MIP solvers
CPLEX MIP takes 1 minute and 29048 nodes (on Sun Enterprise) to find a feasible integer solution 126 The Disjunctive Program
One constraint for the equation
100000 G + … + D = 100000 R + … + T
For each variable X among G,…,T
X = 0 or X = 1 or … or X = 9
For each pair X, Y
X Y-1 or Y X-1
127 Solution Set: SOME of the Integer Points in the Region 128 The Twin Variables for Cooperating Solvers
Integer variables for the letters
0 g, e, r, a, l, d, o, n, b, t 9
With continuous doppelgangers
0 G, E, R, A, L, D, O, N, B, T 9
129 The Variables
One multi-variable constraint on the continuous doppelgangers posted to an LP solver and to the CSP solver
100000 G + 10000 E + 1000 R + … + D +
100000 D + 10000 O + 1000 N + … + D
=
100000 R + 10000 O + 1000 B + … + T
130 The Variables
One CSP constraint on the integer variables posted to a discrete constraint propagation engine
AllDifferent(g, e, r, a, l, d, r, n, b, t )
131 The Search
Bounding information from the discrete variables is passed to the continuous doppelgangers and conversely
The branching strategy is guided by the linear relaxation on the continuous variables
if there is a non-integral variable X, branch on it
X floor(X*)
or
X ceil(X*)
132 The Search
If the AllDifferent constraint, the initial bounding constraints and the bounding constraints from branching detect a contradiction on the discrete variables, both sides backtrack
If the linear relaxation is made infeasible by the bounding constraints that come from the discrete computation or from branching, both sides backtrack
133 The Search
New wrinkle
The solution to the linear relaxation might have all variables integral - but the AllDifferent constraint can be violated by this set of values
In this case, branch to keep them apart
either X Y - 1
or Y X - 1
134 The Variables void main()
{
IlcInitFloat();
IlcManager m(IlcNoEdit);
IlcIntVar D(m, 1, 9), O(m, 0, 9), N(m, 0, 9), A(m, 0, 9), L(m, 0, 9),
G(m, 1, 9), E(m, 0, 9), R(m, 1, 9), B(m, 0, 9), T(m, 0, 9);
IlcIntVarArray vars (m, 10, D, O, N, A, L, G, E, R, B, T);
// Continued on next slide
135 The Constraints m.add(IlcAllDiff(vars,IlcWhenValue));
IlcLinOpt simplex(m);
simplex.add(
100000*R + 10000*O + 1000*B + 100*E + 10*R + T
==
100000*G + 10000*E + 1000*R + 100*A + 10*L + D
+
100000*D + 10000*O + 1000*N + 100*A + 10*L + D ,
IlcTrue // Post to Solver as well
); 136 The Search for solutions
m.add(Generate(m,simplex,vars)); // Search strategy
if (m.nextSolution()) { // Find a solution
m.out() << " solution found " << endl;;
}
m.printInformation();
m.end();
} 137 Branch if a variable is non-integer ILCGOAL2(Generate, IlcSimplex, simplex, IlcIntVarArray, vars)
{
IlcInt varIndex = MostNotInteger(vars, simplex);
if (varIndex >= 0) // There is a non-integer variable
return IlcAnd(IlcTryUpwardFirst(vars[varIndex], simplex), this);
138 Is integer relaxation a solution ? IlcManager m = getManager();
if(m.solve(TestIntegerRelaxation(m,simplex)))
return 0;
139 Find two variables with same value IlcInt j;
for(i=0;i100000 183 Heavy-Tailed Distributions … infinite variance … infinite mean
Introduced by Pareto in the 1920’s
--- “probabilistic curiosity.”
Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena.
Examples: stock-market, earth-quakes, weather,... 184 Decay of Distributions Standard --- Exponential Decay
e.g. Normal:
Heavy-Tailed --- Power Law Decay
e.g. Pareto-Levy:
185 Standard Distribution
(finite mean & variance) Power Law Decay Exponential Decay 186 Normal, Cauchy, and Levy
Normal - Exponential Decay Cauchy -Power law Decay Levy -Power law Decay 187 Tail Probabilities (Standard Normal, Cauchy, Levy)
188 How to Check for “Heavy Tails”? Log-Log plot of tail of distribution
should be approximately linear.
Slope gives value of
infinite mean and infinite variance
infinite variance
189 Example of Heavy Tailed Model (Random Walk) Random Walk:
Start at position 0
Toss a fair coin:
with each head take a step up (+1)
with each tail take a step down (-1) X --- number of steps the random walk takes
to return to position 0. 190 The record of 10,000 tosses of an ideal coin
(Feller) Zero crossing Long periods without
zero crossing 191 Random Walk Heavy-tails vs. Non-Heavy-Tails
Normal
(2,1000000) Normal
(2,1) O,1%>200000 Median=2 1-F(x)
Unsolved fraction X - number of steps the walk takes to return to zero (log scale) 192 Number backtracks (log) 1-F(x)
Unsolved fraction => Infinite mean Heavy-tails in QCP Domain 193 The Log-Log plot shows a linear relation
over many orders of magnitude. This is
clear evidence of heavy-tailed behavior.
194 195
196 Heavy Tailed Cost Distribution 197 The Log-Log plot shows a linear relation
over many orders of magnitude. This is
clear evidence of heavy-tailed behavior.
198 By studying larger problems we discovered that not only does the heavy tail phenomenon occur at the right-hand side of the distribution, but we also observed a high frequency of data points on the left-hand side of the distribution.
Right-hand side: non-negligible fraction of very long runs
Left-hand side: non-negligible fraction of very short runs
199 70%>
250000 15! Sports Scheduling Number backtracks (log) Cumulative Distribution Function 200 Standard Distribution
(finite mean & variance) Power Law Decay Exponential Decay Also, heavy tails on left. (High probability of very short runs.) 201 Consequence for algorithm design:
Use rapid restarts or parallel / inter-leaved runs
Super linear speedups!!! 202 Super-linear Speedups Interleaved (1 machine): 10 x 1 = 10 seconds
5 x speedup 203 Rapid Restarts work particularly well on hard computational problems because of the Heavy Tailed Phenomena in the run time distribution.
RAPID RANDOMIZED RESTARTS strategy avoids the tail on the right and exploits the short runs on the left.
Restarts provably eliminate heavy tails (Gomes, Selman & Crato )
204 Sketch of proof of elimination of heavy tails
Let’s truncate the search procedure
after m backtracks.
Probability of solving problem with truncated version:
Run the truncated procedure and restart it repeatedly.
205
Y - does not have Heavy Tails 206 Restarts
70%
unsolved 250 ~ 62.5 restarts 1-F(x)
Unsolved fraction Number backtracks (log) 207 Example of Rapid Restart Speedup (planning) 20 2000
~100 restarts Cutoff (log) Number backtracks (log) ~10 restarts 100000 208 Deterministic Logistics Planning 108 mins. 95 sec. Scheduling 14 411 sec 250 sec (*) not found after 2 days Scheduling 16 ---(*) 1.4 hours Scheduling 18 ---(*) ~18 hrs Circuit Synthesis 1 ---(*) 165sec. Circuit Synthesis 2 ---(*) 17min. Summary Results R 3 209 Our results provide the first indication of heavy-tailed distri-butions in a computational model.
Overall insight: Randomized tie-breaking with rapid restarts gives powerful search strategy.
210 Heavy-Tailed Distributions in Other Domains
Quasigroup Completion Problem
Graph Coloring
Logistic Planning
Circuit Synthesis 211 Deterministic Logistics Planning 108 mins. 95 sec. Scheduling 14 411 sec 250 sec (*) not found after 2 days Scheduling 16 ---(*) 1.4 hours Scheduling 18 ---(*) ~18 hrs Circuit Synthesis 1 ---(*) 165sec. Circuit Synthesis 2 ---(*) 17min. Summary Results R 3 212 Rapid Restart Speedup 213 Our results provide the first indication of heavy-tailed distri-butions in a computational model.
Overall insight:
Randomized tie-breaking with
rapid restarts gives powerful
search strategy.
214 Heavy-Tailed Distributions in Other Domains
Quasigroup Completion Problem
Graph Coloring
Logistic Planning
Circuit Synthesis 215 Algorithm Portfolio Design 216 Motivation The runtime and performance of randomized algorithms can vary dramatically on the same instance and on different instances.
Goal: Improve the performance of different algorithms by combining them into a portfolio to exploit their relative strengths. 217 Branch & Bound: Best Bound vs. Depth First Search 218 Branch & Bound (Randomized) Standard OR approach for solving Mixed Integer Programs (MIPs)
Solve linear relaxation of MIP
Branch on the integer variables for which the solution of the LP relaxation is non-integer:
apply a good heuristic (e.g., max infeasibility) for variable selection ( + randomization ) and create two new nodes (floor and ceiling of the fractional value)
Once we have found an integer solution, its objective value can be used to prune other nodes, whose relaxations have worse values
219 Branch & Bound Depth First vs. Best bound Critical in performance of Branch & Bound:
the way in which the next node to be expanded is selected.
Best-bound - select the node with the best LP bound
(standard OR approach) --->
this case is equivalent to A*, the LP relaxation provides an admissible search heuristic
Depth-first - often quickly reaches an integer solution
(may take longer to produce an overall optimal value)
220 Portfolio of Algorithms A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors.
Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost
“risk” (variance)
Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk. 221 Depth-first vs. Best-bound (logistics planning)
Number of nodes Cumulative Frequencies Depth-First ~50% Best-Bound ~30% 222
Depth-First and Best and Bound do not dominate each other overall.
223 Heavy-tailed behavior of Depth-first
224 Portfolio for heavy-tailed search procedures (2 processors)
0 DF / 2 BB 2 DF / 0 BB Standard deviation of run time of portfolios Expected run time of portfolios 225 Portfolio for heavy-tailed search procedures (6 processors)
0 DF / 6 BB 6 DF / 0BB Standard deviation of run time of portfolios Expected run time of portfolios 5 DF / 1BB 3 DF / 3 BB 4 DF / 2 BB Efficient set 226 Portfolio for heavy-tailed search procedures (20 processors)
0 DF / 20 BB 20 DF / 0 BB Standard deviation of run time of portfolios Expected run time of portfolios 227 Portfolio for heavy-tailed search procedures (2-20 processors)
228
A portfolio approach can lead to substantial improvements in the expected cost and risk of stochastic algorithms, especially in the presence of heavy-tailed phenomena.
229 Summary of Randomization Considered randomized backtrack search.
Showed Heavy-Tailed Distributions.
Suggests: Rapid Restart Strategy.
--- cuts very long runs
--- exploits ultra-short runs
Experimentally validated on previously unsolved planning and scheduling problems.
Portfolio of Algorithms for cases where no single heuristic dominates
230 Summary of Randomization Considered randomized backtrack search.
Showed Heavy-Tailed Distributions.
Suggests: Rapid Restart Strategy.
--- exploits ultra-short runs
--- cuts very long runs
Experimentally validated on previously unsolved
planning and scheduling problems.
Portfolio of Algorithms for cases where no single heuristic dominates
231 IV. CONCLUSIONS 232 Important Themes in OR Linear Programming (Mixed) Integer Programming
Exploit Structure e.g., Network Flow Problems
Duality very elegant theory in LP
sensitivity analysis
233 Opportunities for Integration of AI/OR OR methods:
Have focused on tractable representations (LP)
Have demonstrated the ability to identify optimal and locally optimal solutions
LIMITATION: Restricted to rigid models with limited expressive power
AI methods:
Richer and more flexible representations,supporting constraint-based reasoning mechanisms as well as mixed initiative frameworks, allowing the human expertise to be in the loop.
LIMITATION: Rich representations in general lead to intractable problems
CHALLENGE: good representations / fast & good solutions
,
234 Opportunities for Integration of AI/OR AI methods are becoming competitive
AI methods used to be considered not suitable for realworld scheduling problems. Recent developments have shown they can be competitive. Examples:
SAP, Peoplesoft, I2, … -> provide solutions for scheduling combining constraint programming and mathematical programming approaches.
ILOG (CP language) has several fielded applications in different scheduling areas; ILOG has integrated a CSP solver with CPLEX.
OR people have acknowledge the benefits of combining OR and AI methods
235 Opportunities for Integration of AI/OR Exploiting Duality in CSP frameworks
Exploiting Randomization
Hybrid Solvers
236 Opportunities for Integration of AI/OR Hybrid Solvers - emerging area of research (CSP+OR); it started with CLP(R), Prolog III and CHIP; ILOG integrates a CSP solver with CPLEX
local constraint propagation - local consistency algs
global constraint propagation - LP relaxations
Only a hybrid approach could prove optimality, e.g.:
Hoist scheduling (Rodosek & Wallace 1998)
Multicommodity integer network flow problem (Dutch Railways) (McAloon, Tretkoff, Wetzel 1998)
237 Updated version of tutorial slides www.cs.cornell.edu/gomes/ Talks Demos 238 Appendix 239 Portfolio of Algorithms A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors.
A portfolio has an expected computational cost and a standard deviation, a measure of the dispersion of the computational cost.
The standard deviation of the portfolio is a measure of the risk inherent to the portfolio. 240 Portfolio of Algorithms Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost
“risk” (variance)
Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk. 241 Appendix Portfolio of Algorithms Goal: to improve on the performance of the component algorithms in terms of:
expected computational cost;
risk;
Efficient Set or Efficient Frontier - set of portfolios that are the best in terms of expected value and risk.
Within the efficient set, in order to minimize the risk, one has to deteriorate the expected value or, in order to improve the expected value, one has to increase the risk. 242 Appendix Portfolio of Two Algorithms Let us consider the random variables:
A1 - the number of backtracks that algorithm 1 takes to find a solution or prove that a solution doesn’t exist;
A2 - the number of backtracks that algorithm 2 takes to find a solution or prove that a solution doesn’t exist; 243 Appendix Portfolio of Two Algorithms Let us consider that we have N processors and we design a portfolio using n1 processors with algorithm 1 and n2 processors with algorithm2 (N = n1 + n2).
Let us consider the random variable:
X - the number of backtracks that the portfolio takes to find a solution or prove that a solution doesn’t exist;
244 Appendix Portfolio of Two Algorithms
Given N processors, and
245 Appendix Portfolio of Algorithms Given N processors, such that and
and the term in the summation is 0 when 246 Preliminary Research on Structure of Search Spaces 247 Fringe of Search Tree
248 Fractal Dimension
249 Fractal Dimension When plotting the length of a curve as a function of the measuring tool on a log-log plot, one obtains a linear relationship:
L - the measured length;
s - length of the yardstick;
c and d are constants;
Mandelbrot introduced the fractal dimension D = d +1;
A straight line has D = 1.0;
The coast of Britain has fractal dimension 1.22;
The higher D the more fractal the curve is.
|