degrotesque

Command Line

Running on the Command Line

degrotesque is implemented in Python. It is started on the command line.

Description

The option -i <PATH> / --input <PATH> tells the script which file(s) shall be read — you may name a file or a folder, here. If the option -r / --recursive is set, the given folder will be processed recursively.

The tool processes text files, HTML files, XML files, and their derivatives. Per default, all files are processed when -i points to a folder. You may limit the files to process by their extension using the -e <EXTENSION>[,<EXTENSION>]* / --extensions <EXTENSION>[,<EXTENSION>]* option. The files are assumed to be encoded using UTF-8 per default. You may change the encoding using the option -E <ENCODING> / --encoding <ENCODING>.

The files are read one by one and the replacement of plain ASCII chars by some nicer ones is based upon a chosen set of “actions”. Known and default actions are given in Appendix A. You may select the actions to apply using the -a <ACTION_NAME>[,<ACTION_NAME>]* / --actions <ACTION_NAME>[,<ACTION_NAME>]* option. The default actions are ‘masks’, ‘quotes.english’, ‘dashes’, ‘ellipsis’, ‘math’, ‘apostrophe’, and ‘commercial’.

Per default, Unicode entities are inserted (e.g. ‘&#8211;’ for an ‘—’). You may change this using the --format <FORMAT> / -f <FORMAT>. The following formats are currently supported:

  • unicode’: uses numeric entities (e.g. ‘&#8211;’ for an ‘—’);
  • html’: uses numeric entities (e.g. ‘&mdash;’ for an ‘—’);
  • text’: uses plain (utf-8) characters (e.g. ‘—’ for an ‘—’).

degrotesque tries to determine whether the read files are plain text files, markdown files, or XML or HTML derivatives using the files& extensions and contents. Appendix B lists the extensions by which files are recognized as HTML / markdown files. To be secure, one may set --html / -H when processing HTML files, --markdown / -M when processing markdown files, or --text / -T when processing plain text files.

When parsing XML/HTML files, the script does not change the quotation marks within elements, of course. As well, the contents of several elements, such as <code> or <pre>, are skipped. You may change the list of elements which contents shall not be processed using the option -s <ELEMENT_NAME>[,<ELEMENT_NAME>]* / --skip <ELEMENT_NAME>[,<ELEMENT_NAME>]*. The list of elements that are skipped per default is given in Appendix C.

When parsing markdown files, code — both indented and defined using ` — is skipped. Quotes as well.

After the actions have been applied to its contents, the file is saved. By default, a backup of the original file is saved under the same name, with the appendix “.orig”. You may omit the creation of these backup files using the option -B / --no-backup.

The option --help / -h prints a help screen. The option --version the degrotesque's version number.

Please note that “masks” is a special action set that disallows the application of some other actions so that, e.g., the dividers in ISBN numbers are not replaced by &ndash;. The masks action set is given in Appendix D.

Examples

degrotesque -i my_page.html -a quotes.german

Replaces single and double quotes within the file “my_page.html” by their typographic German counterparts.

degrotesque -i my_folder -r --no-backup

Applies the default actions to all files in the folder “my_folder” and all subfolders. No backup files are generated. The files format of each file is determined using the file's extension.

Command line arguments

The script can be started on the command line with the following options:

  • --input/-i <PATH>: the file or the folder to process
  • --recursive/-r: Set if the folder — if given — shall be processed recursively
  • --extensions/-e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed
  • --encoding/-E <ENCODING>: The assumed encoding of the files
  • --html/-H: Files are HTML/XML-derivatives
  • --text/-T: Files are plain text files
  • --markdown/-M: Files are markdown files
  • --format/-f <FORMAT>: Define the format of the replacements [‘html’, ‘unicode’, ‘text’]
  • --no-backup/-B: Set if no backup files shall be generated
  • --skip/-s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed
  • --actions/-a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied
  • --help: Prints the help screen
  • --version: Prints the version