A program of a DP (dynamic programming) for pretty printing


The problem.

Input:
A text document (English), the number of characters in a line, a cost
function for redundant spaces.

Output:
A layout of the given document, right-justified, such that the total cost
is minimum.

Note: The output should be viewed with a monospaced font (i.e., fixed pitch
or non-proportional font).


An input example ----------------------------------------------------------

Metaheuristic algorithms are widely recognized as one of the most
practical approaches for combinatorial optimization problems.
Among representative metaheuristics are genetic algorithm,
simulated annealing, tabu search and so on. In this paper, we
explain essential ideas used in such metaheuristic algorithms
within a generalized framework of local search. We then conduct
numerical experiment of metaheuristic algorithms using rather
simple implementations, to observe general tendencies of their
performance. From these results, we propose a few recommendations
about the use of metaheuristics as simple optimization tools. We
also mention some advanced techniques to enhance the ability of
metaheuristics. Finally, we summarize some theoretical results on
metaheuristic algorithms.

The number of characters in a line: 75

Cost for redundant space: When k redundant spaces are placed after a word,
a cost of k^5 is added (one space is not redundant, so k=0 in such a case).

An output example ---------------------------------------------------------

Metaheuristic algorithms are widely recognized as one of the most practical
approaches for  combinatorial optimization  problems. Among  representative
metaheuristics  are  genetic  algorithm,  simulated  annealing, tabu search
and  so  on.  In  this  paper,  we  explain  essential  ideas  used in such
metaheuristic algorithms within a generalized framework of local search. We
then conduct numerical experiment of  metaheuristic algorithms using rather
simple implementations, to observe general  tendencies of their performance.
From  these  results,  we  propose  a  few recommendations about the use of
metaheuristics as simple optimization tools.  We also mention some advanced
techniques to enhance the ability  of metaheuristics. Finally, we summarize
some theoretical results on metaheuristic algorithms.

Total cost: 28
-----------------------------------------------------------------------------


Please see dp_eng.pdf to understand the idea of the DP algorithm.


Usage.

To compile, just type "make", or you can type:

  gcc -Wall -O2 -o typeset typeset.c -lm

(necessary files for this: typeset.c, cpu_time.c, makefile).

Input a document from stdin, then the result is output to stdout.
The number of characters can be specified as a parameter.
If the document has two paragraphs or more, please put an empty line
between two paragraphs.

For example, if you have a document whose file name is "sample.txt", you
can get a layout by

 cat sample.txt | ./typeset

If you would like to specify some parameters, type for example

 cat sample.txt | ./typeset (parameter name) (parameter value) ...
 cat sample.txt | ./typeset lmin 65
 cat sample.txt | ./typeset lmin 60 lmax 70 outlvl 1 


Parameters you can specify:
---------------------------------------------------------------------------
parameter	default		explanation
name		value
---------	-------		-------------------------------------------
lmax		80 		max number of characters in a line
				(if it's less than lmin -> modified to lmin)
lmin		70 		min number of characters in a line
				(if it's less than min word length
				-> set to min word length)
ind1		0		indent (#spaces) of the 1st paragraph
ind2		0		indent of 2nd paragraph and later
lines		1		number of empty lines between paragraphs
ctype		2		cost type ("exp" is a parameter)
				1: (#spaces in a line)^{exp}.
				   if this cost is specified, the layout is
				   not right-justified.
				2: When k redundant spaces are placed after
				   a word, a cost of k^{exp} is added (one
				   space is considered k=0).
exp		5 		The exponent of cost function
pslide		1		1: '.' and ',' at the end of a line is
				   placed one-character outside of
				   the text (available only if ctype=2).
				0: a punctuation is considered as a part of
				   a word.
outlvl		0		Output level.
				0: Text only.
				1: Text, adopted text width, optimal cost,
				   and computation time.
				2: In addition, optimal cost for each value
				   of the number of characters in a line
				   between lmin and lmax.

pslide = 1 was prepared because when "." or "," is at the end a line, I
felt such a line seems like a dent, and I felt it nicer to have them
one-character to the right. This will depend on the font set, so if you do
not like it, please set pslide = 0.


Usage on emacs:
You can use it from emacs. Specify a region (from the place marked by C-spc
to the current place of the cursor).
Then type "M-|" (or "M-x shell-command-on-region") and "typeset [option(s)]".


If you have comments (e.g., bugs), please contact me via e-mail
(yagiura "at" nagoya-u.jp).