degrotesque

API

`Degrotesque`

A tiny web type setter.

The main method "prettify" uses the list of actions to change the contents of the given HTML page.

XML-elements are skipped as well as the contents of specific elements. Additional methods support parsing and setting new values for actions and elements to skip.

Some internal methods exist for determining which parts of the document shall processed and which ones shall be skipped.

`init()`

Sets defaults for the elements which contents shall not be processed.

Sets defaults for actions to perform.

`_restoreDefaultActions()`

Instantiates default actions

`setActions(actNames)`

Sets the actions to apply.

If the given names of actions are None or empty, the default actions are used.

Otherwise, the actions matching the given names are retrieved from the internal database and their list is returned.

Parameters:

Name	Type	Description	Default
`actNames`	`List[str]`	The names of the actions to use (or None if default actions shall be used)	required

`_restoreDefaultElementsToSkip()`

Instantiates default elements to skip

`setToSkip(toSkipNames)`

Sets the elements which contents shall not be changed.

If the given names of elements are None or empty, the default elements to skip are used.

Otherwise, a list with the elements to skip is built.

Parameters:

Name	Type	Description	Default
`toSkipNames`	`List[str]`	The names of elements which shall not be changed	required

Todo

Warn user if a non-XML-character occurs?

`setFormat(formatS)`

Sets the target character representation

Parameters:

Name	Type	Description	Default
`formatS`	`str`	The format to use, one of "html", "unicode", "text"	required

`_getTagName(html)`

Returns the name of the tag that starts at the begin of the given string.

Parameters:

Name	Type	Description	Default
`html`	`str`	The HTML-subpart	required

Returns:

Type	Description
`str`	The name of the tag

`_markHTML(html)`

Returns a string where all HTML-elements are denoted as '1' and plain content as '0'.

Parameters:

Name	Type	Description	Default
`html`	`str`	The HTML document (contents) to process	required

Returns:

Type	Description
`str`	Annotation of the HTML document.

`_markMarkdown(document)`

Returns a string where all code and quotes are denoted as '1' and plain content as '0'.

Parameters:

Name	Type	Description	Default
`document`	`str`	The markdown document (contents) to process	required

Returns:

Type	Description
`str`	Annotation of the markdown document.

`prettify(document, isHTML, isMarkdown=False)`

Prettifies (degrotesques) the given document.

It is assumed that the input is given in utf-8.

The result is returned in utf-8 as well.

Parameters:

Name	Type	Description	Default
`document`	`str`	The document (contents) to process.	required
`isHTML`	`bool`	Whether the document is a HTML document	required
`isMarkdown`	`bool`	Whether the document is a markdown document	`False`

Returns:

Type	Description
`str`	The processed (prettified / degrotesqued) document.

`_replFunc_KEEP(matchobj)`

Unicode numbers conversion to itself

Parameters:

Name	Type	Description	Default
`matchobj`	`Match`	The match object to get a new representation for	required

Returns:

Type	Description
`str`	The converted string (here: as Unicode number)

`_replFunc_HTML(matchobj)`

Unicode numbers conversion to HTML entities

Parameters:

Name	Type	Description	Default
`matchobj`	`Match`	The match object to get a new representation for	required

Returns:

Type	Description
`str`	The converted string (here: as HTML entity)

`_replFunc_UNICODE(matchobj)`

Unicode numbers conversion to Unicode characters

Parameters:

Name	Type	Description	Default
`matchobj`	`Match`	The match object to get a new representation for	required

Returns:

Type	Description
`str`	The converted string (here: as Unicode character)

`getExtensions(extNames)`

Returns the list of extensions of files to process.

If the given names of extensions are None or empty, the default extensions are used.

Otherwise, the given string is split and returned as a list.

Parameters:

Name	Type	Description	Default
`extNames`	`List[str]`	The names of extensions to process (or None if default extensions shall be used)	required

Returns:

Type	Description
`List[str]`	The list of extensions to use.

todo

What about removing dots?

`getFiles(name, recursive, extensions)`

Returns the files to process.

If a file name is given, a list with only this file name is returned.

If a folder name is given, the files to process are determined by walking through the folder — recursively if wished — and collecting all files that match the extensions.

The list of collected files is returned.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the file/folder	required
`recursive`	`bool`	Whether the folder (if given) shall be processed recursively	required
`extensions`	`List[str]`	The extensions of the files to process	required

Returns:

Type	Description
`List[str]`	The list of collected files.

`main(arguments=None)`

The main method using parameter from the command line.

The application reads the given file or the files from the folder (optionally recursive) defined by the -i/--input option. If -r/--recursive option is set, the input folder will be scanned recursively. All files are processed but can be limited to those that match the extension defined using the -e/--extension option. The default encoding for the files is utf-8. This can be changed using the -E/--encoding option.

The default actions or those named using the -a/--actions option are applied. When parsing HTML documents, elements are skipped. The contents of default elements to skip or those defined using -s/--skip are skipped as well. degrotesque tries to determine the file type using the respective extension. The options -T/--text, -H/--html, and -M/--markdown overwrite this behaviour.

The target format of the replacements is unicode entity but may be changed using the -f/--format option.

The files are saved under their original name. If the option -B/--no-backup is not given, a backup of the original files is generated named as the original file with the appendix ".orig".

Parameters:

Name	Type	Description	Default
`arguments`	`List[str]`	The command line arguments, parsed as options using OptionParser.	`None`

Options

The following options must be set:

--input / -i <FILE or FOLDER NAME>: the file or the folder to process

The following options are optional:

--recursive / -r: Set if the folder — if given — shall be processed recursively

--extensions / -e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed

--encoding / -E <ENCODING>: File encoding (default: 'utf-8')

--html / -H: Files are HTML/XML-derivatives

--text / -T: Files are plain text files

--markdown / -M: Files are markdown files

--no-backup / -B: Set if no backup files shall be generated

--format / -f <FORMAT>: Define the format of the replacements ['html', 'unicode', 'text']

--skip / -s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed

--actions / -a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied

--help / -h: Prints the help screen

--version / -v: Prints the version

Table of contents

degrotesque
Degrotesque
_replFunc_KEEP()
_replFunc_HTML()
_replFunc_UNICODE()
getExtensions()
getFiles()
main()
- Options

degrotesque

API

API

Degrotesque

__init__()

_restoreDefaultActions()

setActions(actNames)

_restoreDefaultElementsToSkip()

setToSkip(toSkipNames)

setFormat(formatS)

_getTagName(html)

_markHTML(html)

_markMarkdown(document)

prettify(document, isHTML, isMarkdown=False)

_replFunc_KEEP(matchobj)

_replFunc_HTML(matchobj)

_replFunc_UNICODE(matchobj)

getExtensions(extNames)

getFiles(name, recursive, extensions)

main(arguments=None)

Options

`Degrotesque`

`init()`

`_restoreDefaultActions()`

`setActions(actNames)`

`_restoreDefaultElementsToSkip()`

`setToSkip(toSkipNames)`

`setFormat(formatS)`

`_getTagName(html)`

`_markHTML(html)`

`_markMarkdown(document)`

`prettify(document, isHTML, isMarkdown=False)`

`_replFunc_KEEP(matchobj)`

`_replFunc_HTML(matchobj)`

`_replFunc_UNICODE(matchobj)`

`getExtensions(extNames)`

`getFiles(name, recursive, extensions)`

`main(arguments=None)`