TOP
紅利積點抵現金,消費購書更貼心
Approximate Dynamic Programming: Solving The Curses Of Dimensionality, Second Edition
滿額折

Approximate Dynamic Programming: Solving The Curses Of Dimensionality, Second Edition

商品資訊

定價
:NT$ 5926 元
優惠價
905333
若需訂購本書,請電洽客服 02-25006600[分機130、131]。
商品簡介
目次

商品簡介

Understanding approximate dynamic programming (ADP) in large industrial settings helps develop practical and high-quality solutions to problems that involve making decisions in the presence of uncertainty. With a focus on modeling and algorithms in conjunction with the language of mainstream operations research, artificial intelligence, and control theory, this second edition of Approximate Dynamic Programming Solving the Curses of Dimensionality uniquely integrates four distinct disciplines Markov design processes, mathematical programming, simulation, and statistics to show students, practitioners, and researchers how to successfully model and solve a wide range of real-life problems using ADP.

目次

Preface.

Acknowledgments.

1. The challenges of dynamic programming.

1.1 A dynamic programming example: a shortest path problem.

1.2 The three curses of dimensionality.

1.3 Some real applications.

1.4 Problem classes.

1.5 The many dialects of dynamic programming.

1.6 What is new in this book?

1.7 Pedagogy.

1.8 Bibliographic notes.

2. Some illustrative models.

2.1 Deterministic problems.

2.2 Stochastic problems.

2.3 Information acquisition problems.

2.4 A simple modeling framework for dynamic programs.

2.5 Bibliographic notes.

Problems.

3. Introduction to Markov decision processes.

3.1 The optimality equations.

3.2 Finite horizon problems.

3.3 Infinite horizon problems.

3.4 Value iteration.

3.5 Policy iteration.

3.6 Hybrid valuepolicy iteration.

3.7 Average reward dynamic programming.

3.8 The linear programming method for dynamic programs.

3.9 Monotone policies.

3.10 Why does it work?

3.11 Bibliographic notes.

Problems.

4. Introduction to approximate dynamic programming.

4.1 The three curses of dimensionality (revisited).

4.2 The basic idea.

4.3 Qlearning and SARSA.

4.4 Real time dynamic programming.

4.5 Approximate value iteration.

4.6 The postdecision state variable.

4.7 Lowdimensional representations of value functions.

4.8 So just what is approximate dynamic programming?

4.9 Experimental issues.

4.10 But does it work?

4.11 Bibliographic notes.

Problems.

5. Modeling dynamic programs.

5.1 Notational style.

5.2 Modeling time.

5.3 Modeling resources.

5.4 The states of our system.

5.5 Modeling decisions.

5.6 The exogenous information process.

5.7 The transition function.

5.8 The objective function.

5.9 A measuretheoretic view of information.

5.10 Bibliographic notes.

Problems.

6. Policies.

6.1 Myopic policies.

6.2 Lookahead policies.

6.3 Policy function approximations.

6.4 Value function approximations.

6.5 Hybrid strategies approximations.

6.6 Randomized policies.

6.7 How to choose a policy?

6.8 Bibliographic notes.

Problems.

7. Policy search.

7.1 Background.

7.2 Gradient search.

7.3 Direct policy search for finite alternatives.

7.4 The knowledge gradient algorithm for discrete alternatives.

7.5 Simulation optimization.

7.6 Why does it work?

7.7 Bibliographic notes.

Problems.

8. Approximating value functions.

8.1 Lookup tables and aggregation.

8.2 Parametric models.

8.3 Regression variations.

8.4 Nonparametric models.

8.5 Approximations and the curse of dimensionality.

8.6 Why does it work?

8.7 Bibliographic notes.

Problems.

9. Learning value function approximations.

9.1 Sampling the value of a policy.

9.2 Stochastic approximation methods.

9.3 Recursive least squares for linear models.

9.4 Temporal difference learning with a linear model.

9.5 Bellman’s equation using a linear model.

9.6 Analysis of TD(O), LSTD and LSPE using a single state.

9.7 Gradientbased.

9.8 Least squares temporal differencing with kernel regression.

9.9 Value function approximations based on Bayesian learning.

9.10 Why does it work.

9.11 Bibliographic notes.

Problems.

10. Optimizing while learning.

10.1 Overview of algorithmic strategies.

10.2 Approximate value iteration and Qlearning using lookup tables.

10.3 Statistical bias in the max operator.

10.4 Approximate value iteration and Qlearning using linear models.

10.5 Approximate policy iteration.

10.6 The actorcritic paradigm.

10.7 Policy gradient methods.

10.8 The linear programming method using basis functions.

10.9 Approximate policy iteration using kernel regression.

10.10 Finite horizon approximations for steadystate applications.

10.11 Bibliographic notes.

Problems.

11. Adaptive estimation and stepsizes.

11.1 Learning algorithms and stepsizes.

11.2 Deterministic stepsize recipes.

11.3 Stochastic stepsizes.

11.4 Optimal stepsizes for nonstationary time series.

11.5 Optimal stepsizes for approximate value iteration.

11.6 Convergence.

11.7 Guidelines for choosing stepsize formulas.

11.8 Bibliographic notes.

Problems.

12. Exploration vs. exploitation.

12.1 A learning exercise: the nomadic trucker.

12.2 An Introduction to learning.

12.3 Heuristic learning policies.

12.4 Gittins indices for online learning.

12.5 The knowledge gradient policy.

12.6 Learning with a physical state.

12.7 Bibliographic notes.

Problems.

13. Value function approximations for resource allocation problems.

13.1 Value functions versus gradients.

13.2 Linear approximations.

13.3 Piecewise linear approximations.

13.4 Solving a resource allocation problem using piecewise linear functions.

13.5 The SHAPE algorithm.

13.6 Regression methods.

13.7 Cutting planes.

13.8 Why does it work?

13.9 Bibliographic notes.

Problems.

14. Dynamic resource allocation problems.

14.1 An asset acquisition problem.

14.2 The blood management problem.

14.3 A portfolio optimization problem.

14.4 general resource allocation problem.

14.5 A fleet management problem.

14.6 A driver management problem.

14.7 Bibliographic references.

Problems.

15. Implementation challenges.

15.1 Will ADP work for your problem?

15.2 Designing an ADP algorithm for complex problems.

15.3 Debugging an ADP algorithm.

15.4 Practical Issues.

15.5 Modeling your problem.

15.6 Online vs. offline models.

15.7 If it works, patent it!

Index.

購物須知

外文書商品之書封,為出版社提供之樣本。實際出貨商品,以出版社所提供之現有版本為主。部份書籍,因出版社供應狀況特殊,匯率將依實際狀況做調整。

無庫存之商品,在您完成訂單程序之後,將以空運的方式為你下單調貨。為了縮短等待的時間,建議您將外文書與其他商品分開下單,以獲得最快的取貨速度,平均調貨時間為1~2個月。

為了保護您的權益,「三民網路書店」提供會員七日商品鑑賞期(收到商品為起始日)。

若要辦理退貨,請在商品鑑賞期內寄回,且商品必須是全新狀態與完整包裝(商品、附件、發票、隨貨贈品等)否則恕不接受退貨。

優惠價:90 5333
若需訂購本書,請電洽客服 02-25006600[分機130、131]。

暢銷榜

客服中心

收藏

會員專區