|
return `mongodb://${dbLoginPrepend}${dbHost}:${dbPort}/${dbName}?replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false` |
Use of the GET /cve endpoint by Secretariat clients has been designed to rely on:
- time_modified.gt
- there is a series of uses of this endpoint, and each item in the series follows the pagination instructions (e.g., after the first call, retrieve page=2, page=3, etc. if they exist)
- the time_modified.gt value for each item in the series is equal to (or slightly before) the start time of the previous item in the series
Under these conditions, and with the old db.js without secondaryPreferred, it was not possible for any CVE Records to be skipped. This was because the result set for time_modified.gt can only grow if there is ongoing write access to the Cve collection during pagination (i.e., documents can enter the matching set but never leave it). If a new document enters at a sort position before the current page offset, everything is shifted to the right by one. There are no circumstances in which a document can be shifted to the left (i.e., become a member of a page that has already been retrieved).
However, with readPreference=secondaryPreferred, this is no longer true and use of the GET /cve endpoint has now become unreliable. A specific API call, such as the one for page=N, can now go to a secondary that has fewer documents than the database node that was used for the page=N-1 call. In other words, because of replication lag, it is now possible that, from the perspective of the API caller, documents leave the matching set. In this case, a CVE Record can be permanently skipped (it is not present on any page during one item in the series, and is also not picked up in a subsequent item that has a later time_modified.gt value).
A CVE Record is also permanently skipped if the page=N-1 API call goes to a secondary that has fewer documents than the database node used for the page=N call. In other words, the action of responding to the page=N API call has no way of knowing that the response to the page=N-1 API call was missing anything.
In addition, a CVE Record is permanently skipped if a secondary, because it has fewer documents, does not set the nextPage property in its last response.
Or, more generally, offset-based pagination only works correctly when there is a single source of truth. It does not work in an "eventually consistent" scenario.
(Even if GET /cve_cursor were used, there still needs to be a single source of truth. It has one fewer failure mode than GET /cve but is still, for example, affected by a case where the first API call goes to a secondary that has fewer documents, or the last API call goes to a secondary with fewer documents such that nextPage is not set.)
cve-services/src/utils/db.js
Line 29 in fcd5556
Use of the
GET /cveendpoint by Secretariat clients has been designed to rely on:Under these conditions, and with the old db.js without secondaryPreferred, it was not possible for any CVE Records to be skipped. This was because the result set for time_modified.gt can only grow if there is ongoing write access to the Cve collection during pagination (i.e., documents can enter the matching set but never leave it). If a new document enters at a sort position before the current page offset, everything is shifted to the right by one. There are no circumstances in which a document can be shifted to the left (i.e., become a member of a page that has already been retrieved).
However, with readPreference=secondaryPreferred, this is no longer true and use of the
GET /cveendpoint has now become unreliable. A specific API call, such as the one for page=N, can now go to a secondary that has fewer documents than the database node that was used for the page=N-1 call. In other words, because of replication lag, it is now possible that, from the perspective of the API caller, documents leave the matching set. In this case, a CVE Record can be permanently skipped (it is not present on any page during one item in the series, and is also not picked up in a subsequent item that has a later time_modified.gt value).A CVE Record is also permanently skipped if the page=N-1 API call goes to a secondary that has fewer documents than the database node used for the page=N call. In other words, the action of responding to the page=N API call has no way of knowing that the response to the page=N-1 API call was missing anything.
In addition, a CVE Record is permanently skipped if a secondary, because it has fewer documents, does not set the nextPage property in its last response.
Or, more generally, offset-based pagination only works correctly when there is a single source of truth. It does not work in an "eventually consistent" scenario.
(Even if
GET /cve_cursorwere used, there still needs to be a single source of truth. It has one fewer failure mode thanGET /cvebut is still, for example, affected by a case where the first API call goes to a secondary that has fewer documents, or the last API call goes to a secondary with fewer documents such that nextPage is not set.)