Skip to content

Commit f1d81e5

Browse files
committed
Update Public Suffix API proposal
1 parent 5ba391f commit f1d81e5

File tree

1 file changed

+80
-76
lines changed

1 file changed

+80
-76
lines changed

proposals/public-suffix.md

Lines changed: 80 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -179,11 +179,11 @@ This may save a few CPU cycles for every candidate domain lookup.
179179

180180
Example candidate domain: `foo.bar.baz`
181181

182-
| Step | Domain | Search in PSL? |
183-
|:----:|:------:|:------:|
184-
| 1 | `foo.bar.baz` | yes |
185-
| 2 | `bar.baz` | yes |
186-
| 3 | `baz` | no |
182+
| Step | Domain | Search in PSL? |
183+
|:----:|:-------------:|:--------------:|
184+
| 1 | `foo.bar.baz` | yes |
185+
| 2 | `bar.baz` | yes |
186+
| 3 | `baz` | no |
187187

188188
It is unclear how much of a performance benefit such an optimization would give
189189
in practice.
@@ -378,21 +378,21 @@ namespace publicSuffix {
378378
// METHODS
379379

380380
// Determines if the given hostname is itself a known eTLD (i.e. in the PSL).
381-
export function isKnownPublicSuffix(
381+
export function isKnownSuffix(
382382
hostname: string,
383383
)
384384
: boolean;
385385

386386
// Gets the known eTLD, if any, of a given hostname.
387-
export function getKnownPublicSuffix(
387+
export function getKnownSuffix(
388388
hostname: string,
389389
)
390390
: string | null;
391391

392392
// Gets the registrable domain of a given hostname.
393-
export function getRegistrableDomain(
393+
export function getDomain(
394394
hostname: string,
395-
options?: RegistrableDomainOptions,
395+
options?: DomainOptions,
396396
)
397397
: string | null;
398398

@@ -403,17 +403,17 @@ namespace publicSuffix {
403403
// INTERFACES
404404

405405
// Options that may be passed to the API method to control its behaviour.
406-
interface RegistrableDomainOptions {
407-
// If true, the resulting registrable domain should be encoded as Unicode.
406+
interface DomainOptions {
407+
// If true, the returned domain should be encoded as Unicode.
408408
// Default = false (Punycode)
409409
unicode?: boolean,
410-
// If true, an IP address is a registrable domain.
410+
// If true, the returned domain may be an IP address.
411411
// Default = false
412412
allowIP?: boolean,
413-
// If true, a known eTLD is a registrable domain.
413+
// If true, the returned domain may be a known eTLD.
414414
// Default = false
415415
allowPlainSuffix?: boolean,
416-
// If true, a hostname that lacks a known eTLD is a registrable domain.
416+
// If true, the returned domain may lack a known eTLD.
417417
// Default = false
418418
allowUnknownSuffix?: boolean,
419419
}
@@ -489,13 +489,13 @@ whose effects are demonstrated using the following examples.
489489

490490
#### 2. API Methods
491491

492-
##### 2.1 Public Suffix
492+
##### 2.1 Known Suffix
493493

494-
Method `getKnownPublicSuffix()` returns the input hostname's known eTLD (i.e. in the PSL)
494+
Method `getKnownSuffix()` returns the input hostname's known eTLD (i.e. in the PSL)
495495
if it has one, otherwise `null`.
496496

497-
Method `isKnownPublicSuffix()` returns `true` if and only if the input hostname is itself
498-
a known eTLD. In other words, this method returns `true` if calling `getKnownPublicSuffix()`
497+
Method `isKnownSuffix()` returns `true` if and only if the input hostname is itself
498+
a known eTLD. In other words, this method returns `true` if calling `getKnownSuffix()`
499499
with the input hostname returns the input hostname itself.
500500

501501
These methods are included in the API because the PSL algorithm returns the longest eTLD,
@@ -506,17 +506,17 @@ whose public suffix is 'io'.
506506

507507
###### Examples
508508

509-
| Input hostname | Public Suffix |
509+
| Input hostname | Known Suffix |
510510
|----------------|--------------:|
511511
| github.io | github.io |
512512
| foo.github.io | github.io |
513513
| facebook.co.uk | co.uk |
514514
| 192.168.2.1 | null |
515515
| green.banana | null |
516516

517-
##### 2.2 Registrable Domain
517+
##### 2.2 Domain
518518

519-
Method `getRegistrableDomain()` returns the input hostname's registrable domain,
519+
Method `getDomain()` returns the input hostname's registrable domain,
520520
as determined by running the PSL algorithm, otherwise `null`.
521521

522522
By default, this method returns `null` if the input hostname:
@@ -525,35 +525,42 @@ By default, this method returns `null` if the input hostname:
525525
* is itself a known eTLD
526526
* is an IP address - IPv4 or IPv6
527527

528-
##### 2.2.1 Options: Registrable Domain
528+
##### 2.2.1 Options: Domain
529529

530530
In order to support different use cases including those that need to determine
531531
a hostname's "site", additional options are provided, allowing a more
532-
general-purpose interpretation of what constitutes a registrable domain
533-
that includes IP addresses and unknown eTLDs.
532+
general-purpose interpretation of a domain to include not only registrable domains
533+
but also IP addresses and domains with unknown (non-registrable) eTLDs.
534534

535535
Options `allowIP`, `allowPlainSuffix` and `allowUnknownSuffix` each target
536536
a specific kind of input hostname lacking a registrable domain
537537
in the strictest sense (i.e. having a known eTLD as stipulated by
538538
the PSL algorithm), as follows:
539539

540-
| Option | Kind of Input Hostname Targetted |
540+
| Option | Kind of Input Hostname Targeted |
541541
|--------------------|---------------------------------:|
542542
| allowIP | IP Address (IPv4 of IPv6) |
543543
| allowPlainSuffix | is itself a known eTLD |
544544
| allowUnknownSuffix | lacks a known eTLD |
545545

546546
The effect of each option when applied to an input hostname of the
547-
kind targetted by the option is to change the registrable domain
548-
from being `null` to being instead *the full input hostname itself*.
547+
kind targeted by the option is to change the returned domain
548+
from being `null` to being the following:
549+
550+
| Option | Returned Domain | Returned Domain Kind |
551+
|--------------------|:------------------------------------------------:|:--------------------:|
552+
| allowIP | input hostname | IP address |
553+
| allowPlainSuffix | input hostname | eTLD |
554+
| allowUnknownSuffix | last 2 labels, or input hostname if single label | eTLD+1 or eTLD |
549555

550556
###### Examples
551557

552-
| Input hostname | Option = true | Registrable domain |
558+
| Input hostname | Option = true | Returned domain |
553559
|-------------------|--------------------|-------------------:|
554560
| 192.168.2.1 | allowIP | 192.168.2.1 |
555561
| github.io | allowPlainSuffix | github.io |
556-
| apple.pear.banana | allowUnknownSuffix | apple.pear.banana |
562+
| apple.pear.banana | allowUnknownSuffix | pear.banana |
563+
| banana | allowUnknownSuffix | banana |
557564

558565
##### 2.2.2 Options: Justification
559566

@@ -562,7 +569,7 @@ not only domains on the internet having known eTLDs, but also
562569
intranet hostnames having non-public (i.e. unknown) suffixes, or no suffix.
563570

564571
Reviewers of this proposal note that if it were the case that non-domains
565-
were included by default, `getRegistrableDomain()` would effectively
572+
were included by default, `getDomain()` would effectively
566573
return a string for almost every input.
567574

568575
As a result of the inclusion of unknown suffixes, the API implementation must
@@ -577,7 +584,7 @@ which may be an IP address or a domain name.
577584
An example of such a use case is Firefox's [Search vs Navigate](#4-search-vs-navigate),
578585
which involves determining if an entry in the URL bar is a navigable site,
579586
or a search term. If this functionality was based purely on the return value
580-
of `getRegistrableDomain()`, i.e. navigate if nonnull or search if null,
587+
of `getDomain()`, i.e. navigate if nonnull or search if null,
581588
then IP addresses would incorrectly cause a search. By using the `allowIP` option,
582589
the return value for an input IP address would be the IP address itself instead of null,
583590
thereby causing the desired result of navigating instead of searching.
@@ -586,31 +593,6 @@ Option `allowPlainSuffix` only exists because there are domains that do not have
586593
a registrable domain, due to themselves being PSL eTLDs, but can still be
587594
navigated to, such as github.io and blogspot.com.
588595

589-
##### 2.2.3 Options: Discussion
590-
591-
The effect of the options is that `getRegistrableDomain()` may return values
592-
that are not registrable domains in the strictest sense, e.g. they may
593-
be IP addresses.
594-
595-
The author of this proposal is of the view that:
596-
597-
1. Any method named `getXYZ()` should return a value of type `XYZ`. Therefore
598-
`getRegistrableDomain()` may not be the most suitable name, since it does
599-
not always return true registrable domains. Reviewers of this proposal
600-
feel this is not a significant enough issue to warrant alternative naming.
601-
602-
2. This API should provide a way not just to get a hostname's
603-
registrable-domain-like value, but also to know what kind of value that is,
604-
be it an IP address, a domain name, or an intranet hostname lacking a known eTLD.
605-
Reviewers of this proposal are of the view that no compelling use case has been
606-
identified to support the need for such additional functionality. However,
607-
reviewers have conceded that IP addresses have to be special-cased, because for
608-
most domain inputs, one could split at dots to try and get a different domain level,
609-
but that logic does not make sense for IP addresses. By not providing a way of
610-
knowing whether the return value of `getRegistrableDomain()` is an IP address
611-
or a domain name, it is more difficult for users of this API to implement
612-
the special-casing that the reviewers have identified.
613-
614596
#### 3. IDN
615597

616598
All API methods should accept hostnames passed as input parameters using either
@@ -625,15 +607,15 @@ using Unicode encoding.
625607

626608
`domain` = foo.bar.example.مليسيا
627609

628-
| Option | Registrable Domain |
610+
| Option | Returned Domain |
629611
|----------------------------|-----------------------:|
630612
| unicode == false (default) | example.xn--mgbx4cd0ab |
631613
| unicode == true | example.مليسيا |
632614

633615
#### 4. Invalid hostname
634616

635-
The promises returned by this API's methods should reject with an error if a hostname
636-
passed as an input parameter meets any of the following criteria:
617+
This API's methods should throw an error if a hostname passed as an input parameter
618+
meets any of the following criteria:
637619

638620
* Contains a character that is invalid in an Internationalized Domain Name (IDN) - e.g. symbols, whitespace
639621
* Is an empty string
@@ -642,10 +624,10 @@ passed as an input parameter meets any of the following criteria:
642624

643625
#### 5. Summary of behaviours
644626

645-
The following table sets out the eventual settled state of the promise returned by
646-
`getRegistrableDomain()` for different classes of input `hostname` parameter:
627+
The following table sets out the value returned by `getDomain()` for different
628+
classes of input `hostname` parameter:
647629

648-
| Input hostname | Description | Registrable domain |
630+
| Input hostname | Description | Returned domain |
649631
|:-------------------|:-------------------------------------------------|-----------------------:|
650632
| example.net | eTLD+1 | example.net |
651633
| www.example.net | eTLD+2 | example.net |
@@ -656,7 +638,7 @@ The following table sets out the eventual settled state of the promise returned
656638
| foobar | no matching eTLD in PSL, single-label | null |
657639
| foobar | as above, with `allowUnknownSuffix = true` | foobar |
658640
| my.net.foobar | no matching eTLD in PSL, multi-label | null |
659-
| my.net.foobar | as above, with `allowUnknownSuffix = true` | my.net.foobar |
641+
| my.net.foobar | as above, with `allowUnknownSuffix = true` | net.foobar |
660642
| foobar.net | has an eTLD in the ICANN section | foobar.net |
661643
| foobar.github.io | has an eTLD in the Private section | foobar.github.io |
662644
| 127.0.0.1 | IP address, IPv4 | null |
@@ -680,21 +662,21 @@ The following table sets out the eventual settled state of the promise returned
680662
#### 6. Sync vs Async
681663

682664
Browser extension APIs are most commonly async, with API methods returning Promises.
683-
Earlier versions of this proposal set out an async API, with `getRegistrableDomain()`
684-
returning a `Promise<String>`. However, some use cases require getting lists of
665+
Earlier versions of this proposal set out an async API, with `getDomain()`
666+
returning a `Promise<string>`. However, some use cases require getting lists of
685667
registrable domains all in one go. In theory, this could be achieved by simply calling
686-
`getRegistrableDomain()` multiple times.
668+
`getDomain()` multiple times.
687669

688670
The problem with this approach is that there is overhead associated with an extension
689671
calling an async function on the parent browser. For example, obtaining the registrable domains
690-
of a list of 50 domains would involve making 50 async calls to the parent browser.
672+
of a list of 50 hostnames would involve making 50 async calls to the parent browser.
691673
A batching method would allow the same result to be obtained with a single async call.
692674

693-
For this reason, batching method `getRegistrableDomains()` was added to this API.
675+
For this reason, batching method `getDomains()` was added to this API.
694676
The method accepted an array of hostnames as input and returning a promise resolving to
695677
an array of registrable domains. A quick mockup of the two approaches was built using
696678
a simplified implementation of this proposal's API in a modified Firefox, and the
697-
batching approach was about 2-3 times faster for 50 domains.
679+
batching approach was about 2-3 times faster for 50 hostnames.
698680

699681
Unfortunately, while this offered a solution to the performance problem,
700682
it added additional complexity to the API. To resolve this issue, the API
@@ -763,14 +745,17 @@ done by the host browser.
763745

764746
### Open Web API
765747

766-
The purpose of this API is to eliminate the potential for inconsistency between
767-
the host browser and its hosted extensions. The simplest way of achieving this
768-
is for extensions to access this functionality via the host browser itself rather
769-
than via some external source, such as an Open Web API.
748+
Implementing this proposal as an open web API is not realistic at this time because:
770749

771-
It is then a determination for the host browser itself as to whether
772-
the functionality (used by both the host browser and its extensions)
773-
should ultimately be obtained by means of an Open Web API.
750+
* Compared to web extension APIs, there is a higher bar for introducing web APIs,
751+
and in the past there has not been sufficient interest in moving forward a proposal
752+
like this one. Therefore the preferred approach is to start with extensions,
753+
and it will always be possible to propose a web API later if this work proves
754+
useful and there is appetite.
755+
756+
* The PSL is not appropriate for use in all circumstances. Extensions have a
757+
very compelling set of use cases that match browser use cases, but there
758+
is not a universal agreement this is the case more generally.
774759

775760
## Implementation Notes
776761

@@ -818,3 +803,22 @@ is released, however this may not always be the case.
818803
It may be useful to implement a notification mechanism so that extensions can take
819804
appropriate action when the host browser's PSL dataset changes, to avoid having to
820805
poll the `getVersion()` function provided by this API.
806+
807+
### 3. Get Domain and Kind
808+
809+
While API method `getDomain()` by default returns registrable domains,
810+
with additional options this method may return other types of domain:
811+
IP addresses, intranet hostnames lacking known suffixes, and public suffixes themselves.
812+
There is currently no straightfoward way for the method caller to determine
813+
which of these kinds of value was returned from an invocation such as:
814+
`getDomain(hostname, { allowIP, allowUnknownSuffix, allowPlainSuffix })`.
815+
816+
It may be beneficial to provide an additional API method that would
817+
return not only the domain value as returned by `getDomain()`,
818+
but also a designation of the kind of value returned:
819+
`RegistrableDomain`, `UnknownDomain`, `KnownSuffix`, `IPAddress`.
820+
821+
An example use case would be if extension developers wanted to prepend
822+
additional labels to the domain returned by `getDomain()`. This would
823+
not make sense for returned IP addresses, so developers would need a
824+
way of separating returned IP addresses from returned domain names.

0 commit comments

Comments
 (0)