degrotesque

API

API

Degrotesque

A tiny web type setter.

The main method "prettify" uses the list of actions to change the contents of the given HTML page.

XML-elements are skipped as well as the contents of specific elements. Additional methods support parsing and setting new values for actions and elements to skip.

Some internal methods exist for determining which parts of the document shall processed and which ones shall be skipped.

__init__()

Sets defaults for the elements which contents shall not be processed.

Sets defaults for actions to perform.

_restoreDefaultActions()

Instantiates default actions

setActions(actNames)

Sets the actions to apply.

If the given names of actions are None or empty, the default actions are used.

Otherwise, the actions matching the given names are retrieved from the internal database and their list is returned.

Parameters:

Name Type Description Default
actNames List[str]

The names of the actions to use (or None if default actions shall be used)

required

_restoreDefaultElementsToSkip()

Instantiates default elements to skip

setToSkip(toSkipNames)

Sets the elements which contents shall not be changed.

If the given names of elements are None or empty, the default elements to skip are used.

Otherwise, a list with the elements to skip is built.

Parameters:

Name Type Description Default
toSkipNames List[str]

The names of elements which shall not be changed

required
Todo

Warn user if a non-XML-character occurs?

setFormat(formatS)

Sets the target character representation

Parameters:

Name Type Description Default
formatS str

The format to use, one of "html", "unicode", "text"

required

_getTagName(html)

Returns the name of the tag that starts at the begin of the given string.

Parameters:

Name Type Description Default
html str

The HTML-subpart

required

Returns:

Type Description
str

The name of the tag

_markHTML(html)

Returns a string where all HTML-elements are denoted as '1' and plain content as '0'.

Parameters:

Name Type Description Default
html str

The HTML document (contents) to process

required

Returns:

Type Description
str

Annotation of the HTML document.

_markMarkdown(document)

Returns a string where all code and quotes are denoted as '1' and plain content as '0'.

Parameters:

Name Type Description Default
document str

The markdown document (contents) to process

required

Returns:

Type Description
str

Annotation of the markdown document.

prettify(document, isHTML, isMarkdown=False)

Prettifies (degrotesques) the given document.

It is assumed that the input is given in utf-8.

The result is returned in utf-8 as well.

Parameters:

Name Type Description Default
document str

The document (contents) to process.

required
isHTML bool

Whether the document is a HTML document

required
isMarkdown bool

Whether the document is a markdown document

False

Returns:

Type Description
str

The processed (prettified / degrotesqued) document.

_replFunc_KEEP(matchobj)

Unicode numbers conversion to itself

Parameters:

Name Type Description Default
matchobj Match

The match object to get a new representation for

required

Returns:

Type Description
str

The converted string (here: as Unicode number)

_replFunc_HTML(matchobj)

Unicode numbers conversion to HTML entities

Parameters:

Name Type Description Default
matchobj Match

The match object to get a new representation for

required

Returns:

Type Description
str

The converted string (here: as HTML entity)

_replFunc_UNICODE(matchobj)

Unicode numbers conversion to Unicode characters

Parameters:

Name Type Description Default
matchobj Match

The match object to get a new representation for

required

Returns:

Type Description
str

The converted string (here: as Unicode character)

getExtensions(extNames)

Returns the list of extensions of files to process.

If the given names of extensions are None or empty, the default extensions are used.

Otherwise, the given string is split and returned as a list.

Parameters:

Name Type Description Default
extNames List[str]

The names of extensions to process (or None if default extensions shall be used)

required

Returns:

Type Description
List[str]

The list of extensions to use.

todo

What about removing dots?

getFiles(name, recursive, extensions)

Returns the files to process.

If a file name is given, a list with only this file name is returned.

If a folder name is given, the files to process are determined by walking through the folder — recursively if wished — and collecting all files that match the extensions.

The list of collected files is returned.

Parameters:

Name Type Description Default
name str

The name of the file/folder

required
recursive bool

Whether the folder (if given) shall be processed recursively

required
extensions List[str]

The extensions of the files to process

required

Returns:

Type Description
List[str]

The list of collected files.

main(arguments=None)

The main method using parameter from the command line.

The application reads the given file or the files from the folder (optionally recursive) defined by the -i/--input option. If -r/--recursive option is set, the input folder will be scanned recursively. All files are processed but can be limited to those that match the extension defined using the -e/--extension option. The default encoding for the files is utf-8. This can be changed using the -E/--encoding option.

The default actions or those named using the -a/--actions option are applied. When parsing HTML documents, elements are skipped. The contents of default elements to skip or those defined using -s/--skip are skipped as well. degrotesque tries to determine the file type using the respective extension. The options -T/--text, -H/--html, and -M/--markdown overwrite this behaviour.

The target format of the replacements is unicode entity but may be changed using the -f/--format option.

The files are saved under their original name. If the option -B/--no-backup is not given, a backup of the original files is generated named as the original file with the appendix ".orig".

Parameters:

Name Type Description Default
arguments List[str]

The command line arguments, parsed as options using OptionParser.

None

Options

The following options must be set:

--input / -i <FILE or FOLDER NAME>: the file or the folder to process

The following options are optional:

--recursive / -r: Set if the folder — if given — shall be processed recursively

--extensions / -e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed

--encoding / -E <ENCODING>: File encoding (default: 'utf-8')

--html / -H: Files are HTML/XML-derivatives

--text / -T: Files are plain text files

--markdown / -M: Files are markdown files

--no-backup / -B: Set if no backup files shall be generated

--format / -f <FORMAT>: Define the format of the replacements ['html', 'unicode', 'text']

--skip / -s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed

--actions / -a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied

--help / -h: Prints the help screen

--version / -v: Prints the version