156 lines
6.8 KiB
Markdown
156 lines
6.8 KiB
Markdown
# MSC2197 – Search Filtering in Public Room Directory over Federation
|
||
|
||
This MSC proposes introducing the `POST` method to the `/publicRooms` Federation
|
||
API endpoint, including a `filter` argument which allows server-side filtering
|
||
of rooms.
|
||
|
||
We are motivated by the opportunity to make searching the public Room Directory
|
||
more efficient over Federation.
|
||
|
||
## Motivation
|
||
|
||
Although the Client-Server API includes the filtering capability in
|
||
`/publicRooms`, the Federation API currently does not.
|
||
|
||
This leads to a situation that is wasteful of effort and network traffic for
|
||
both homeservers; searching a remote server involves first downloading its
|
||
entire room list and then filtering afterwards.
|
||
|
||
## Proposal
|
||
|
||
Having a filtered `/publicRooms` API endpoint means that irrelevant or
|
||
uninteresting rooms can be excluded from a room directory query response.
|
||
In turn, this means that these room directory query responses can be generated
|
||
more quickly and then, due to their smaller size, transmitted over the network
|
||
more quickly.
|
||
|
||
These benefits have been utilised in the Client-Server API, which implements
|
||
search filtering using the `filter` JSON body parameter in the `POST` method on
|
||
the `/publicRooms` endpoint.
|
||
|
||
Ignoring the `server` parameter in the Client-Server API, the following specific
|
||
differences are noticed between the Client-Server and Federation API's
|
||
`/publicRooms` endpoints:
|
||
|
||
* the Federation API endpoint only accepts the `GET` method whereas the
|
||
Client-Server API accepts the `POST` method as well.
|
||
* the Federation API accepts `third_party_instance_id` and
|
||
`include_all_networks` parameters through the `GET` method, whereas the
|
||
Client-Server API only features these in the `POST` method.
|
||
|
||
This MSC proposes to introduce support for the `POST` method in the Federation
|
||
API's `/publicRooms` endpoint, with all but one of the parameters from that of
|
||
the Client-Server API.
|
||
|
||
The parameter that is intentionally omitted is the `server` query parameter, as
|
||
it does not make sense to include it – the requesting homeserver could make a
|
||
direct request instead of requesting that a request be relayed.
|
||
|
||
The parameters which are copied, however, shall have the same semantics as
|
||
they do in the Client-Server API.
|
||
|
||
In the interest of clarity, the proposed parameter set is listed below, along
|
||
with a repetition of the definitions of used substructures. The response format
|
||
has been omitted as it is the same as that of the current Client-Server and
|
||
Federation APIs, which do not differ in this respect.
|
||
|
||
### `POST /_matrix/federation/v1/publicRooms`
|
||
|
||
#### Query Parameters
|
||
|
||
There are no query parameters. Notably, we intentionally do not inherit the
|
||
`server` query parameter from the Client-Server API.
|
||
|
||
#### JSON Body Parameters
|
||
|
||
* `limit` (`integer`): Limit the number of search results returned.
|
||
* `since` (`string`): A pagination token from a previous request, allowing
|
||
clients to get the next (or previous) batch of rooms. The direction of
|
||
pagination is specified solely by which token is supplied, rather than via an
|
||
explicit flag.
|
||
* `filter` (`Filter`): Filter to apply to the results.
|
||
* `include_all_networks` (`boolean`): Whether or not to include all known
|
||
networks/protocols from application services on the homeserver.
|
||
Defaults to false.
|
||
* `third_party_instance_id` (`boolean`): The specific third party
|
||
network/protocol to request from the homeserver.
|
||
Can only be used if `include_all_networks` is false.
|
||
|
||
### `Filter` Parameters
|
||
|
||
* `generic_search_term` (`string`): A string to search for in the room metadata,
|
||
e.g. name, topic, canonical alias etc. (Optional).
|
||
|
||
## Tradeoffs
|
||
|
||
An alternative approach might be for implementations to carry on as they are but
|
||
also cache (and potentially index) remote homeservers' room directories.
|
||
This would not require a spec change.
|
||
|
||
However, this would be unsatisfactory because it would lead to outdated room
|
||
directory results and/or caches that provide no benefit (as room directory
|
||
searches are generally infrequent enough that a cache would be outdated before
|
||
being reused, on small – if not most – homeservers).
|
||
|
||
## Potential issues
|
||
|
||
### Backwards Compatibility
|
||
|
||
After this proposal is implemented, outdated homeservers will still exist which
|
||
do not support the room filtering functionality specified in this MSC. In this
|
||
case, homeservers will have to fall-back to downloading the entire room
|
||
directory and performing the filtering themselves, as currently happens.
|
||
This is not considered a problem since it will not lead to a situation that is
|
||
any worse than the current one, and it is expected that large homeservers
|
||
– which cause the most work with the current search implementations –
|
||
would be quick to upgrade to support this feature once it is available.
|
||
|
||
As the `POST` method was not previously accepted on the `/publicRooms` endpoint
|
||
over federation, then requesting servers should fall back to the old behaviour,
|
||
if one of the following errors is encountered:
|
||
|
||
- an HTTP `400` response with an `M_UNRECOGNIZED` standard error response
|
||
`errcode` (this is what would be typically expected in this situation)
|
||
- a `404` (Not Found) HTTP error response
|
||
- a `405` (Method Not Allowed) HTTP error response
|
||
|
||
## Security considerations
|
||
|
||
There are no known security considerations.
|
||
|
||
## Privacy considerations
|
||
|
||
At present, remote homeservers do not learn about what a user has searched for.
|
||
|
||
However, under this proposal, in the context of using the Federation API to
|
||
forward on queries from the Client-Server API, a client's homeserver would end
|
||
up sharing the client's search terms with a remote homeserver, which may not be
|
||
operated by the same party or even trusted. For example, users' search terms
|
||
could be logged.
|
||
|
||
The privacy implications of this proposal are not overly major, as the data
|
||
that's being shared is [\[1\]][1]:
|
||
|
||
- only covered by GDPR if:
|
||
- the search terms contain personal data, or
|
||
- the user's homeserver IP address or domain name is uniquely identifying
|
||
(because it's a single-person homeserver, perhaps)
|
||
- likely to be *expected* to be shared with the remote homeserver
|
||
|
||
[1]: https://github.com/matrix-org/matrix-doc/pull/2197#issuecomment-517641751
|
||
|
||
For the sake of clarity, clients are strongly encouraged to display a warning
|
||
that a remote search will take the user's data outside the jurisdiction of their
|
||
own homeserver, before using the `server` parameter of the Client-Server API
|
||
`/publicRooms`, as it can be assumed that this will lead to the server invoking
|
||
the Federation API's `/publicRooms` – on the specified remote server – with the
|
||
user's search terms.
|
||
|
||
## Conclusion
|
||
|
||
By allowing homeservers to pass on search filters, we enable remote homeservers'
|
||
room directories to be efficiently searched, because, realistically speaking,
|
||
only the remote homeserver is in a position to be able to perform search
|
||
efficiently, by taking advantage of indexing and other such optimisations.
|
||
|