next up previous
Next: How do I convert Up: Formats and Conversions Previous: How can I convert

How can I convert PostScript to ASCII?

In general, when you say ``I want to convert PostScript to ASCII'' what you really mean is ``I want to convert MacWrite (which makes PostScript output) to ASCII'' or ``I want to convert somebody's TeX document (which I have in PostScript) to ASCII''.

Unfortunately, programs like these (if they're smart) do a lot of fancy stuff like kerning, which means that where they would normally execute the postscript command for

    ``print water fountain''

instead they execute the postscript command for

    ``print wat''      (move a little to get the spacing *just* right)
    ``print er''       (move a little to get the spacing *just* right)
    ``print foun''     (move a little to get the spacing *just* right)
    ``print tain''     (move a little to get the spacing *just* right)

So if I write a program to look through a PostScript file for strings, like ps2ascii.pl, It can't tell where the words really end. Here my program would see 4 strings

``wat'' ``er'' ``foun'' ``tain''

And it doesn't see any difference between the spacing between ``found'' and ``tain'' (not a word break) and the spacing between ``er'' and ``foun'' (a real word break).

The problem is that PostScript for text formatting is usually produced machine generated by a text formatter. A PostScript generator like dvips might have a special command like ``boop'' that differentiates between a real world break and a fake one. But every text formatter that generates PostScript has their own name for the ``boop'' command.

So you really want a ``PostScript to ASCII converter for dvips output''.

The only general solution I can see would be to redefine the show operator to print out the currentpoint for every letter being printed, like gs2asc, and then make up an ASCII page based on this by sticking ASCII characters where they go in a two-dimensional array. That would convert PostScript to ASCII ``formatted''.

But even that wouldn't solve the problem, because special bitmap fonts and and standard fonts like Symbol don't always print a ``P'' when you say the letter ``P''. Sometimes they print the greek Pi symbol or a chess piece or a ZapfDingBat.

Use ps2a, ps2ascii, ps2txt, ps2ascii.ps or ps2ascii.pl.


next up previous
Next: How do I convert Up: Formats and Conversions Previous: How can I convert
Allen B
2/2/1998