ICANN Logo AT-LARGE ADVISORY COMMITTEE ADVICE TO GNSO WHOIS TASK FORCES
26 March 2004

I. WHOIS Task Force 1 Restricting Access to WHOIS Data For Marketing Purposes
II. WHOIS Task Force 2 Review of Data Collected and Displayed
III. WHOIS Task Force 3 Improving Accuracy of Collected Data


Note: Unless we specifically speak about registrars, our remarks apply to registrar and to thick registry WHOIS systems alike.

I. Restricting Access to WHOIS Data for Marketing Purposes

Policy proposal

We recommend a simple two-tiered system.

Tier 1 -- public access. Users who access a future WHOIS-like system anonymously get access to non-sensitive information concerning a domain name registration, to be defined in detail by task force 2.

Tier 2 -- authenticated access. Users who want to access a more complete data set (to be defined in detail by task force 2) need to reliably identify themselves, and indicate the purpose for which they want to access the data.

The identity of the data user and their purpose is recorded by registrars and registries, and made available to registrants when requested. This information could be withheld for a certain amount of time if the data user is (1) a law enforcement authority that is (2) accessing the data for law enforcement purposes.

Implementation remarks

We do not recommend any particular implementation of this proposal, but note that "reliable identification" could be provided by commercially available SSL certificates. In general, we would favor implementation of our proposal in a dedicated protocol (such as IRIS) over implementation through Web forms.

Rationale

The key aspect for deciding whether access to data gathered by registrars can be given to a third party is the purpose for which this data is going to be used. Obviously, registrars have no way to verify the purpose for which WHOIS data is being accessed.

The best heuristic we know of is to hold data users accountable for their activities, and to put enforcement of purpose limitations into the hands of registrants. This can be achieved by reliably identifying data uses and putting their identity, contact information, and purpose indication in the hands of registrants.

At the same time, a tiered system -- if implemented reasonably -- could preserve the ability of data users to automatically access WHOIS data in reasonable quantities. Registrars, on the other hand, would be enabled to limit the amount of data any particular party can access in a given interval of time.

Identifying data users and their purposes would also enable registrars to comply with legal obligations to make this kind of information available to data subjects.

Discussion of other proposals

There have been suggestions that "automated access" could be used as a heuristic to determine illegitimate access. In this scheme, automated access is blocked by attempting to require human attention with all queries. One set of implementations of these kinds of tests is known as CAPTCHA.

There is evidence that automated access is also being used for legitimate purposes; on the other hand, there is publicly available information on how CAPTCHA-like tests are being circumvented in other contexts. The circumvention here is based on a fundamental design problem of CAPTCHAs. <http://boingboing.net/2004_01_01_archive.html#107525288693964966>

One particularly popular CAPTCHA has been broken in academic more than a year ago, but is still being used by registrars. <http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html>

Accessibility problems posed by CAPTCHA-like tests are not fully understood by now; we note, though, that purely visual tests are insufficient from an accessibility point of view. <http://www.w3.org/TR/turingtest/>

In conclusion, CAPTCHA tests address the wrong problem, and they address it badly. We strongly recommend against going down this path.

II. Review of Data Collected and Displayed

Policy proposal

We recommend that the mandatory collection and display of personal information about registrants be reduced as far as possible. What information is actually required for placing a domain name registration should be a matter of registrars' business models, and of applicable law, not of ICANN policy.

We consider the removal of the following data elements from registrars' and registries WHOIS services (in a tiered model, from *all* tiers) a priority:

  • Registrant name, address, e-mail address, and phone number, unless registrant has requested that this information be made available.
  • Administrative contact name, address, e-mail address, and phone number, unless registrant (or admin-c) has requested that this information be made available.
  • Billing contact. These data are traditionally not published by registrars, but are included in many thick registries' public WHOIS services.

For the purposes of a tiered access system (see recommendations for task force 1), we would recommend that the following information be included in a public tier:

  • Registrar of record.
  • Name servers.
  • Status of domain name.
  • Contact data, if the data subject specifically requests that these data be included in the public tier.

Implementation remarks

None.

Rationale

For personal registrations, the registrant, administrative contact, and billing contact data sets are most likely to
concern sensitive information, such as the registrant's home address and phone number.

We recognize that domain name registrations by online merchants often imply less privacy concerns; it has been
argued that online merchants must make privacy information public in many jurisdictions. We are confident that businesses will also follow these duties by requesting registrars to make contact information about them available publicly. Conversely, if bad actors decide not to make contact information publicly available, that could actually make bad actors more easily recognizable, and provide consumers with a "red flag."

Discussion of other proposals

At the WHOIS workshop in Rome, we heard several lawyers praise the usefulness of registrant and other telephone numbers in WHOIS services. That way, we were told, many cases could be settled by a single phone call. The easier the contact, we were told, the merrier.

This argument is troubling: What we were hearing there is a request to ICANN to enable lawyers to make off the record contact with other parties to a dispute that may not have a lawyer readily available, and to make this contact in a way which makes it hard for the registrant to get legal counsel involved in early negotiations arising out of the dispute.

Telephone numbers of registrant and administrative contacts should be *removed* from WHOIS services for precisely this reason: Forcing the non-registrant party to a dispute to open up that dispute by on-the-record means (e-mail, fax [not universally available], postal mail) ensures that registrants have an opportunity to retain legal counsel in these disputes, and to fully understand any claims made by the non-registrant party. It also helps to avoid legal bluff and plain bullying.

To summarize, it may be true that availability of phone numbers enables quick settlement. But availability of phone numbers also favors situations in which these settlements are achieved by dubious means, to the detriment of the registrant.

III. Improving Accuracy of Collected Data

Additional comments submitted by ALAC

Summary and recommendations

The At-Large Advisory Committee would like to express appreciation for the difficult and time-consuming work that the Task Force has been doing.

However, we stress that trying to get accurate information from people who are not willing to provide it is a waste of time and effort. No automated verification scheme is able to tell between true data and plausible data, and thus such schemes would only have the effect of increasing the number of crimes such as identity theft and make reliable identification of actual fraudsters even more difficult. Generic TLDs are a global resource which should be impartially accessible to registrants from all parts of the world. Verification schemes usually do not cover all parts of the world with the same effectiveness, and often information which may seem implausible to an American eye will be actually true; so these schemes must not be used to unfairly discriminate access to gTLDs depending on the registrant's country. Also, any communication with the registrant should happen in the registrant's own language; and the registrant should not be asked to bear the cost of verification activities, since they are not part of the service he is asking for, but rather of services desired by some third-party data users.

The actual feasibility of a verification scheme that meets these requirements, even after the data gathering activity made by the task force, is still unproven. For these reasons, we recommend against taking any action in this field at this stage.

We thus suggest that the focus of the work on Whois accuracy is shifted from how to force unwilling people to provide their true information to how to effectively allow registrants who want to provide true information to do so. There are a number of practical hurdles for any registrant to keep his/her data up to date, and removing these hurdles would prove much more beneficial to the overall accuracy of the Whois databases than going after an impossible and worrying dream of a global centralized control system over registrants' identities.

Finally, we note that the Registrar Accreditation Agreement provisions about data collection, display and accuracy requirements and their enforcement are clearly illegal, and thus void, in a number of jurisdictions.

Thus we recommend that ICANN suspends any enforcement of those provisions until the RAA and the related policies are amended so to comply with existing laws; as clearly and repeatedly exposed in writing and in person by a number of relevant public authorities, any other choice is likely to bring ICANN and involved registrars to litigation with registrants and with the Privacy Authorities in European and other countries.

A deeper analysis on the problem of Whois accuracy

We think that, to be able to solve a problem, you should first investigate the reasons why it happens. In this case, you could roughly divide the registrants whose data are inaccurate into four categories:

1. Those who purposedly provide inaccurate data for fraudulent reasons.
2. Those who purposedly provide inaccurate data to protect their privacy.
3. Those who mistakenly provide inaccurate data.
4. Those who provide accurate data at registration, but then fail to keep them up to date so that the information becomes inaccurate.

Until now, the general discussion on accuracy has been almost completely focused on the first category and we think this is an error. The purpose of the Whois system is not to provide bullet-proof identification for those who register domains and operate services on top of them, but rather to provide quick contact information for those domain holders who want to be contacted. Turning the Whois system into a certified directory of domain name owners would go beyond its purpose and, as practice shows, is practically incompatible with its spirit and architecture.

Also, at the present state of technology and of operational practices, costs of very secure authentification of world-wide registrants for all domain name registrations would be high and would possibly destroy the domain name market as we know it today. We think it might be more cost-effective (and also more respectful of basic civil rights of people) to seek after fraudulent registrants once they actually commit a fraud, rather than to presume that all registrants are to commit frauds and so should be carefully screened in advance.

Finally, we point out that there is no verification system, other than requiring a person to physically show up and exhibit a secure proof of identity such as a passport or national ID document, that could tell between true personal data and plausible, but fake, personal data. If going down the path of imposing stricter and stricter checks on data as they are submitted by the registrant during the registration process, after spending lots of time and lots of money on them, we might actually discover that no benefit has arisen in terms of fraud prevention, but that the stricter checks have caused a huge increase in crimes like identity theft, which by the way are made easier by the very existence of the public and anonymously accessible Whois system.

Said this, we think that an increased accuracy in the Whois database, if limited to those registrants who actually agree to provide their data, would be highly desirable. This is why we think that future activities in the field of enhanced accuracy should not focus on the first category of the above list, but rather on the other three.

We will not discuss here the issue of privacy protection, which is the subject of another task force; we just stress that the overwhelming majority of those who purposedly provide inaccurate data does so for privacy protection reasons, rather than for fraudulent intentions. Just allowing these people not to disclose their data to the public, but just to the registrar, would actually avoid most cases of wilful inaccuracy.

The third category is, according to our experience, somewhat small also because this kind of errors is clerical and can easily be fixed in case there is actual need to contact the owner. Once the registrant's desire to publish their data is ascertained, some simple automated verifications could be made by the registrar's system, to warn the registrant about possible errors.

However, creating an automatical verification algorithm for all countries and scripts of the world might prove very difficult and prone to errors for less common countries; the current practical examples only come from TLDs and environments with geographically limited registrants. On the other hand, systems which provide automatical verification only for residents of some countries could be acceptable only as long as they do not prevent or make it unreasonably harder for residents of unverifiable countries to register domains. This is why we think that the output of this automated verification algorithms should only be used as a warning to the registrant, but should not prevent the registrant from submitting data that might seem incorrect, as they could possibly be absolutely correct.

We also note that requiring Roman-script information for registrants of those countries who do not use Roman characters would be unduly discriminating them in access to gTLDs. All registrants should be asked to provide their data only in their local language and script, and just as an option they could be asked whether they want to provide Romanized data as well. Requiring the ability to type in Roman script to register domains in global generic TLDs is unacceptable.

Finally, we think that much could be done to improve the situation of the fourth category those registrants who would be happy to provide accurate information, but who fail to keep it up to date. In fact, experience shows that updating Whois data is a long and difficult process for registrants. In many cases, the registrant has to send faxes, make phone calls, and suffer other costs while devoting a significant amount of time; in other cases, the authentication mechanism used by registries or registrars is based on the e-mail address (or on a username/password couple which, if forgot, will be resent to the current e-mail address), so that a change in the e-mail address of the registrant will make him/her unable to manage the information, and will make these domains orphan. If you add this to the fact that keeping personal data up to date in a public Whois registry certainly cannot be the first worry of a registrant when he's changing address, phone number or e-mail address, you realize that this is possibly the easiest cause of inaccuracy in Whois databases.

Also, in many cases the registrant is only the last link in a long chain of interactions that starts with a registry, then goes through an ICANN-accredited registrar, a domain name reseller, a web hosting company, or even an Internet-savvy friend who does the job for the registrant. We think that this is an unavoidable consequence of the average registrant turning from a skilled engineer in a small Internet, as it was when Whois was designed, to a non-technical average person in a mass Internet. It is very difficult to create the awareness of the existence and purpose of the Whois database for non-technical persons on a mass scale, and we think this is another reason why we should never expect the Whois to be a terribly accurate list of all registrants.

However, for this category the problem possibly lies in the lack of simple online systems for the registrant to edit his/her data in the database at no cost. Thus we think that one of the two following solutions should be tried:

1. Requiring registries to directly deal with registrants' update requests, by supplying them a virtual certificate or account at registration, plus offline procedures to recover access if such account is lost;
2. Changing the architecture of the Whois database from centralized to distributed.

Since the first option would raise many concerns in terms of business models, customer ownership, and cost recovery, the second could possibly be more interesting. After all, the very reason for which the DNS system was created, replacing the old centralized hosts table, was the impossibility of keeping this centralized table up to date. We should simply apply the same principle and move the data at the edge of the network, by embedding Whois servers into DNS server implementations. Whois queries could then be sent directly to the authoritative name servers for the domain, and only if no reply is received, the registry could be used as a fall-back. This way, registrants would be able to keep their Whois information up to date as easily as they keep their zone files up to date, and even if this would not completely solve the problem, it would possibly cause a dramatic increase in the number of Whois records that are actually kept updated.

We thus recommend a shift in the focus of accuracy-related discussions, so to deal with those types of inaccuracy that can and should actually be solved, rather than dealing with world-wide verification and law enforcement systems that are not practically conceivable at the present social and political state of our planet, and that would anyway have to be discussed at other political levels.


Comments concerning the layout, construction and functionality of this site
should be sent to webmaster@icann.org.

Page Updated 03-Aug-2004
© 2004 The Internet Corporation for Assigned Names and Numbers. All rights reserved.