| |
- builtins.ValueError(builtins.Exception)
-
- UrlError
- builtins.object
-
- WebSiteCloner
- html.parser.HTMLParser(_markupbase.ParserBase)
-
- GetURLsFromHTML
class GetURLsFromHTML(html.parser.HTMLParser) |
|
GetURLsFromHTML(master)
This class get URLs from HTML. |
|
- Method resolution order:
- GetURLsFromHTML
- html.parser.HTMLParser
- _markupbase.ParserBase
- builtins.object
Methods defined here:
- __init__(self, master)
- Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
- handle_starttag(self, tag: str, attributes: List[Tuple[str, str]]) -> None
- This function get URLs attributes and values.
Methods inherited from html.parser.HTMLParser:
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- feed(self, data)
- Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_charref(self, name)
- # Overridable -- handle character reference
- handle_comment(self, data)
- # Overridable -- handle comment
- handle_data(self, data)
- # Overridable -- handle data
- handle_decl(self, decl)
- # Overridable -- handle declaration
- handle_endtag(self, tag)
- # Overridable -- handle end tag
- handle_entityref(self, name)
- # Overridable -- handle entity reference
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- handle_startendtag(self, tag, attrs)
- # Overridable -- finish processing of start+end tag: <tag.../>
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unknown_decl(self, data)
Data and other attributes inherited from html.parser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
Methods inherited from _markupbase.ParserBase:
- error(self, message)
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
Data descriptors inherited from _markupbase.ParserBase:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class UrlError(builtins.ValueError) |
| |
- Method resolution order:
- UrlError
- builtins.ValueError
- builtins.Exception
- builtins.BaseException
- builtins.object
Data descriptors defined here:
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from builtins.ValueError:
- __init__(self, /, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
Static methods inherited from builtins.ValueError:
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Methods inherited from builtins.BaseException:
- __delattr__(self, name, /)
- Implement delattr(self, name).
- __getattribute__(self, name, /)
- Return getattr(self, name).
- __reduce__(...)
- Helper for pickle.
- __repr__(self, /)
- Return repr(self).
- __setattr__(self, name, value, /)
- Implement setattr(self, name, value).
- __setstate__(...)
- __str__(self, /)
- Return str(self).
- with_traceback(...)
- Exception.with_traceback(tb) --
set self.__traceback__ to tb and return self.
Data descriptors inherited from builtins.BaseException:
- __cause__
- exception cause
- __context__
- exception context
- __dict__
- __suppress_context__
- __traceback__
- args
|
class WebSiteCloner(builtins.object) |
|
WebSiteCloner(url: str, directory: str = 'CloneWebSite', recursive: bool = False, replace_domain: str = None, loglevel: int = 20, logfile: str = None, replace_scheme: str = None)
This class Clone a Page and her ressources. |
|
Methods defined here:
- __init__(self, url: str, directory: str = 'CloneWebSite', recursive: bool = False, replace_domain: str = None, loglevel: int = 20, logfile: str = None, replace_scheme: str = None)
- Initialize self. See help(type(self)) for accurate signature.
- add_new_url(self, url: str) -> None
- This function add new url to parse.
- get_complete_url(self, url: str) -> str
- This function build a complete url.
- get_data(self, url: str) -> http.client.HTTPResponse
- This function return HTTP response.
- get_directory_from_url(self, url: str, url_parsed: urllib.parse.ParseResult) -> str
- This function return directory to write the file.
- get_full_path(self, url: str, directory: str, url_parsed: urllib.parse.ParseResult) -> str
- This function return full path (directory + filename) to write the file.
- launch(self) -> None
- Launcher to copy a website.
- write_file(self, url: str, data: bytes) -> bytes
- This function get URL path and write the file in the good location.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |