Since version 7.62.0, libcurl offers an API for parsing, updating and generating URLs. Using this, applications can take advantage of using libcurl's URL parser for its own purposes. By using the same parser, security problems due to different interpretations can be avoided.
You'd still only include <curl/curl.h>
in your code.
Create a handle that holds URL info and resources:
CURLU *h = curl_url();
When done with it, clean it up:
curl_url_cleanup(h);
When you need a copy of a handle, just duplicate it:
CURLU *nh = curl_url_dup(h);
rc = curl_url_set(h, CURLUPART_URL, "https://example.com:449/foo/bar?name=moo", 0);
(The zero in the function call is bitmask for changing specific features.)
If successful, this stores the URL in its individual parts within the handle.
When the handle already has parsed a URL, setting a relative URL will make it "redirect" to adapt to it.
rc = curl_url_set(h, CURLUPART_URL, "../test?another", 0);
The CURLU
handle represents a URL and you can easily extract that:
char *url;
rc = curl_url_get(h, CURLUPART_URL, &url, 0);
curl_free(url);
(The zero in the function call is bitmask for changing specific features.)
When a URL has been parsed or parts have been set, you can extract those pieces from the handle at any time.
rc = curl_url_get(h, CURLUPART_HOST, &host, 0);
rc = curl_url_get(h, CURLUPART_SCHEME, &scheme, 0);
rc = curl_url_get(h, CURLUPART_USER, &user, 0);
rc = curl_url_get(h, CURLUPART_PASSWORD, &password, 0);
rc = curl_url_get(h, CURLUPART_PORT, &port, 0);
rc = curl_url_get(h, CURLUPART_PATH, &path, 0);
rc = curl_url_get(h, CURLUPART_QUERY, &query, 0);
rc = curl_url_get(h, CURLUPART_FRAGMENT, &fragment, 0);
Extracted parts are not URL decoded unless the user asks for it with the
CURLU_URLDECODE
flag.
Remember to free the returned string with curl_free
when you are done with
it!
A user can opt to set individual parts, either after having parsed a full URL or instead of parsing such.
rc = curl_url_set(urlp, CURLUPART_HOST, "www.example.com", 0);
rc = curl_url_set(urlp, CURLUPART_SCHEME, "https", 0);
rc = curl_url_set(urlp, CURLUPART_USER, "john", 0);
rc = curl_url_set(urlp, CURLUPART_PASSWORD, "doe", 0);
rc = curl_url_set(urlp, CURLUPART_PORT, "443", 0);
rc = curl_url_set(urlp, CURLUPART_PATH, "/index.html", 0);
rc = curl_url_set(urlp, CURLUPART_QUERY, "name=john", 0);
rc = curl_url_set(urlp, CURLUPART_FRAGMENT, "anchor", 0);
Set parts are not URL encoded unless the user asks for it with the
CURLU_URLENCODE
flag.
An application can append a string to the right end of the query part with the
CURLU_APPENDQUERY
flag.
Imagine a handle that holds the URL https://example.com/?shoes=2
. An
application can then add the string hat=1
to the query part like this:
rc = curl_url_set(urlp, CURLUPART_QUERY, "hat=1", CURLU_APPENDQUERY);
It will even notice the lack of an ampersand (&
) separator so it will inject
one too, and the handle's full URL would then equal
https://example.com/?shoes=2&hat=1
.
The appended string can of course also get URL encoded on add, and if asked,
the encoding will skip the '=' character. For example, append candy=M&M
to
what we already have, and URL encode it to deal with the ampersand in the
data:
rc = curl_url_set(urlp, CURLUPART_QUERY, "candy=M&M", CURLU_APPENDQUERY | CURLU_URLENCODE);
Now the URL looks like https://example.com/?shoes=2&hat=1&candy=M%26M
.
libcurl 7.63.0 or later allows applications to pass in a CURLU
handle
instead of a URL string to tell curl what to transfer to or from. This is
particularly convenient for applications that already parse the URL and might
have it stored in such a handle already.