URLResolver
Overview
The URLResolver is an Custom Transformer that resolves relative URLs against a base URL with input validation. It ensures base URLs are absolute and relative URLs are relative before combining them using urllib.parse.urljoin,
Key Features
URL validation (base_url must be absolute, relative_url must be relative)
RFC 3986 compliant URL resolution using urllib.parse.urljoin
Handles all relative URL types (root-relative, parent-relative, document-relative, query-only, fragment-only)
Automatic path navigation resolution (.. and . segments)
Error reporting with detailed messages
Input Attributes
The transformer expects the following attributes on input features:
Attribute
Required
Validation
Description
Example
_base_url
Yes
Must be absolute with scheme
The base URL (must include protocol)
https://example.com/path/page.html
_relative_url
Yes
Must be relative without scheme
The relative URL to resolve
../other.html, /path/file.json, ?query=value
Output Attributes
The transformer adds the following attributes to output features:
Attribute
Type
Description
_resolved_url
String
The absolute resolved URL
_url_error
String
Error message if resolution failed
URL Validation
base_url - Must be an absolute URL with a scheme:
Allowed: https://example.com, http://localhost:8080/path/, ftp://server.com/file.txt
Allowed: URLs with paths: https://example.com/folder/page.html
Allowed: URLs with query strings: https://example.com/page?param=value
NOT allowed: Relative URLs such as example.com, //example.com, /path/page
relative_url - Must be a relative URL without a scheme:
Allowed: Document-relative: page.html, folder/file.json
Allowed: Parent-relative: ../page.html, ../../data/file.json
Allowed: Root-relative: /path/page, /api/data.json
Allowed: Query-only: ?param=value&other=test
Allowed: Fragment-only: #section1
Allowed: Protocol-relative: //cdn.example.com/script.js
NOT allowed: Absolute URLs like https://other.com/page, http://example.com
URL Resolution Examples
Document-Relative URLs
Base URL
Relative URL
Resolved URL
https://example.com/docs/guide.html
page.html
https://example.com/docs/page.html
https://example.com/docs/guide.html
images/logo.png
https://example.com/docs/images/logo.png
Parent-Relative URLs
Base URL
Relative URL
Resolved URL
https://example.com/docs/api/guide.html
../index.html
https://example.com/docs/index.html
https://example.com/docs/api/guide.html
../../home.html
https://example.com/home.html
Root-Relative URLs
Base URL
Relative URL
Resolved URL
https://example.com/docs/guide.html
/api/data.json
https://example.com/api/data.json
https://example.com/docs/guide.html
/
https://example.com/
Query and Fragment URLs
Base URL
Relative URL
Resolved URL
https://example.com/page.html
?search=test
https://example.com/page.html?search=test
https://example.com/page.html?old=value
?new=value
https://example.com/page.html?new=value
https://example.com/page.html
#section1
https://example.com/page.html#section1
Protocol-Relative URLs
Base URL
Relative URL
Resolved URL
https://example.com/page.html
//cdn.example.com/script.js
https://cdn.example.com/script.js
http://example.com/page.html
//cdn.example.com/script.js
http://cdn.example.com/script.js
Invalid Input Examples
Base URL
Relative URL
Result
Error Message
example.com/page
other.html
Error
"base_url must be an absolute URL with a scheme..."
//example.com/page
other.html
Error
"base_url must be an absolute URL with a scheme..."
https://example.com
https://other.com/page
Error
"relative_url must be a relative URL without a scheme..."
(empty)
page.html
Error
"Missing or empty _base_url attribute"
https://example.com
(empty)
Error
"Missing or empty _relative_url attribute"
Would you like to know more? Click here to find out more details!
