API
Degrotesque
A tiny web type setter.
The main method "prettify" uses the list of actions to change the contents of the given HTML page.
XML-elements are skipped as well as the contents of specific elements. Additional methods support parsing and setting new values for actions and elements to skip.
Some internal methods exist for determining which parts of the document shall processed and which ones shall be skipped.
__init__()
Sets defaults for the elements which contents shall not be processed.
Sets defaults for actions to perform.
_restoreDefaultActions()
Instantiates default actions
setActions(actNames)
Sets the actions to apply.
If the given names of actions are None or empty, the default actions are used.
Otherwise, the actions matching the given names are retrieved from the internal database and their list is returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actNames |
List[str]
|
The names of the actions to use (or None if default actions shall be used) |
required |
_restoreDefaultElementsToSkip()
Instantiates default elements to skip
setToSkip(toSkipNames)
Sets the elements which contents shall not be changed.
If the given names of elements are None or empty, the default elements to skip are used.
Otherwise, a list with the elements to skip is built.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
toSkipNames |
List[str]
|
The names of elements which shall not be changed |
required |
Todo
Warn user if a non-XML-character occurs?
setFormat(formatS)
Sets the target character representation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
formatS |
str
|
The format to use, one of "html", "unicode", "text" |
required |
_getTagName(html)
Returns the name of the tag that starts at the begin of the given string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
html |
str
|
The HTML-subpart |
required |
Returns:
Type | Description |
---|---|
str
|
The name of the tag |
_markHTML(html)
Returns a string where all HTML-elements are denoted as '1' and plain content as '0'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
html |
str
|
The HTML document (contents) to process |
required |
Returns:
Type | Description |
---|---|
str
|
Annotation of the HTML document. |
_markMarkdown(document)
Returns a string where all code and quotes are denoted as '1' and plain content as '0'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
document |
str
|
The markdown document (contents) to process |
required |
Returns:
Type | Description |
---|---|
str
|
Annotation of the markdown document. |
prettify(document, isHTML, isMarkdown=False)
Prettifies (degrotesques) the given document.
It is assumed that the input is given in utf-8.
The result is returned in utf-8 as well.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
document |
str
|
The document (contents) to process. |
required |
isHTML |
bool
|
Whether the document is a HTML document |
required |
isMarkdown |
bool
|
Whether the document is a markdown document |
False
|
Returns:
Type | Description |
---|---|
str
|
The processed (prettified / degrotesqued) document. |
_replFunc_KEEP(matchobj)
Unicode numbers conversion to itself
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matchobj |
Match
|
The match object to get a new representation for |
required |
Returns:
Type | Description |
---|---|
str
|
The converted string (here: as Unicode number) |
_replFunc_HTML(matchobj)
Unicode numbers conversion to HTML entities
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matchobj |
Match
|
The match object to get a new representation for |
required |
Returns:
Type | Description |
---|---|
str
|
The converted string (here: as HTML entity) |
_replFunc_UNICODE(matchobj)
Unicode numbers conversion to Unicode characters
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matchobj |
Match
|
The match object to get a new representation for |
required |
Returns:
Type | Description |
---|---|
str
|
The converted string (here: as Unicode character) |
getExtensions(extNames)
Returns the list of extensions of files to process.
If the given names of extensions are None or empty, the default extensions are used.
Otherwise, the given string is split and returned as a list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
extNames |
List[str]
|
The names of extensions to process (or None if default extensions shall be used) |
required |
Returns:
Type | Description |
---|---|
List[str]
|
The list of extensions to use. |
todo
What about removing dots?
getFiles(name, recursive, extensions)
Returns the files to process.
If a file name is given, a list with only this file name is returned.
If a folder name is given, the files to process are determined by walking through the folder — recursively if wished — and collecting all files that match the extensions.
The list of collected files is returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the file/folder |
required |
recursive |
bool
|
Whether the folder (if given) shall be processed recursively |
required |
extensions |
List[str]
|
The extensions of the files to process |
required |
Returns:
Type | Description |
---|---|
List[str]
|
The list of collected files. |
main(arguments=None)
The main method using parameter from the command line.
The application reads the given file or the files from the folder (optionally recursive) defined by the -i/--input option. If -r/--recursive option is set, the input folder will be scanned recursively. All files are processed but can be limited to those that match the extension defined using the -e/--extension option. The default encoding for the files is utf-8. This can be changed using the -E/--encoding option.
The default actions or those named using the -a/--actions option are applied. When parsing HTML documents, elements are skipped. The contents of default elements to skip or those defined using -s/--skip are skipped as well. degrotesque tries to determine the file type using the respective extension. The options -T/--text, -H/--html, and -M/--markdown overwrite this behaviour.
The target format of the replacements is unicode entity but may be changed using the -f/--format option.
The files are saved under their original name. If the option -B/--no-backup is not given, a backup of the original files is generated named as the original file with the appendix ".orig".
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arguments |
List[str]
|
The command line arguments, parsed as options using OptionParser. |
None
|
Options
The following options must be set:
--input / -i <FILE or FOLDER NAME>: the file or the folder to process
The following options are optional:
--recursive / -r: Set if the folder — if given — shall be processed recursively
--extensions / -e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed
--encoding / -E <ENCODING>: File encoding (default: 'utf-8')
--html / -H: Files are HTML/XML-derivatives
--text / -T: Files are plain text files
--markdown / -M: Files are markdown files
--no-backup / -B: Set if no backup files shall be generated
--format / -f <FORMAT>: Define the format of the replacements ['html', 'unicode', 'text']
--skip / -s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed
--actions / -a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied
--help / -h: Prints the help screen
--version / -v: Prints the version