Skip to content

Commit c1499d7

Browse files
committed
ci/nightly: stabilize UrlChecker
The `urlcheck` linter is intended to check for existence of all URLs embedded in the sources. There are in fact two different CI jobs which use the linter. `TestNightlyLint` is executed by `Lint URLs (Bazel)` whereas `URL Check (Bazel)` runs the linter directly (via main). Both jobs are part of the same `Nightlies` flow. While the latter checks _all_ URLs, the former checks only URLs found directly in SQL `helpMessages`. In the past year, exceedingly more websites are now behind WAFs, many of which deny even a HEAD request. The recent fix in [1] addressed the user-agent to be compliant with Wikipedia's WAF, which restored the `Lint URLs (Bazel)` CI job. Meanwhile, `URL Check (Bazel)` has been (silently) failing for several months. This PR fixes `URL Check (Bazel)` by extending the linter to use the Github APIs for _all_ github URLs [2]. We also added a simple cache of checked urls, stored in a gcs bucket. The cache has a 7-day TTL and avoids thousands of redundant URL requests. (Upon expiry, _all_ URLs are rechecked.) A number of broken as well as paywalled URLs have been updated with the corresponding archive.org URL. A small number of paywalled URLs have been added to the ignore list. [1] #152764 [2] https://github.com/orgs/community/discussions/159123 Resolves: #146319 Epic: none Release note: None
1 parent 29896e5 commit c1499d7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+916
-136
lines changed

build/teamcity/cockroach/nightlies/url_check.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ source "$dir/teamcity-support.sh" # For $root
1414
source "$dir/teamcity-bazel-support.sh" # For run_bazel
1515

1616
tc_start_block "Run url check"
17-
run_bazel build/teamcity/cockroach/nightlies/url_check_impl.sh
17+
BAZEL_SUPPORT_EXTRA_DOCKER_ARGS="-e GITHUB_API_TOKEN -e GOOGLE_EPHEMERAL_CREDENTIALS -e TC_BUILD_BRANCH" run_bazel build/teamcity/cockroach/nightlies/url_check_impl.sh
1818
tc_end_block "Run url check"

build/teamcity/cockroach/nightlies/url_check_impl.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,7 @@
88

99
set -xeuo pipefail
1010

11+
echo "$GOOGLE_EPHEMERAL_CREDENTIALS" > creds.json
12+
gcloud auth activate-service-account --key-file=creds.json
13+
1114
bazel run //pkg/cmd/urlcheck --run_under="cd $PWD && "

docs/RFCS/20160331_index_hints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ More syntax and detailed information [here][1].
8484
PG does not provide support for hints. Instead they provide various knobs for
8585
tuning the optimizer to do the right thing. Details [here][2].
8686

87-
[2]: http://blog.2ndquadrant.com/hinting_at_postgresql/
87+
[2]: https://web.archive.org/web/20221224144457/http://blog.2ndquadrant.com/hinting_at_postgresql
8888

8989
### Oracle
9090

docs/RFCS/20160425_drain_modes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,6 @@ being if that seems opportune.
239239
* Change the lease transfer mechanism so a transferrer can transfer its
240240
timestamp cache's high water mark which would act as the low water mark of the
241241
recipient's timestamp cache. This is conditioned on not [inserting reads in
242-
the command queue](https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360).
242+
the command queue](https://web.archive.org/web/20210419190146/https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360).
243243

244244
# Unresolved questions

docs/RFCS/20170117_enterprise.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,5 +137,5 @@ and/or in an email to the cluster administrator (address provided when they
137137
register for the license).
138138

139139
[#14114]: https://github.com/cockroachdb/cockroach/pull/14114
140-
[business model]: https://www.cockroachlabs.com/blog/how-were-building-a-business-to-last/
140+
[business model]: https://web.archive.org/web/20240529205150/https://www.cockroachlabs.com/blog/how-were-building-a-business-to-last/
141141
[settings table]: https://github.com/cockroachdb/cockroach/pull/14230

docs/RFCS/20170517_algebraic_data_types.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -604,4 +604,4 @@ const (
604604

605605
[\#16240]: https://github.com/cockroachdb/cockroach/pull/16240
606606
[separate RFC]: https://github.com/cockroachdb/cockroach/pull/10055
607-
[Clang]: https://clang.llvm.org/docs/LibASTMatchers.html
607+
[Clang]: https://web.archive.org/web/20250902012257/https://clang.llvm.org/docs/LibASTMatchers.html

docs/RFCS/20170908_sql_optimizer_statistics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -538,7 +538,7 @@ efficient than retrieving each histogram separately).
538538
data retrieved by the query could also be used to update the
539539
statistics, though this complicates normal query execution. See also
540540
[self-tuning
541-
histograms](https://ashraf.aboulnaga.me/pubs/sigmod99sthist.pdf).
541+
histograms](https://web.archive.org/web/20240413134934/https://ashraf.aboulnaga.me/pubs/sigmod99sthist.pdf).
542542

543543
* If the stats haven't been updated in a long time and we happen to be
544544
doing a full table-range scan for some query, we might as well

docs/RFCS/20171214_computed_columns.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,4 +168,4 @@ contains the expression used to compute the column otherwise.
168168
# Unresolved Questions
169169
[Partitioning]: https://github.com/cockroachdb/cockroach/blob/aa61db043e9c54c0b83a405cd76ce0ec7cc6a35d/docs/RFCS/20170921_sql_partitioning.md
170170
[Query planning changes in the RFC]: https://github.com/cockroachdb/cockroach/blob/aa61db043e9c54c0b83a405cd76ce0ec7cc6a35d/docs/RFCS/20170921_sql_partitioning.md#query-planning-changes
171-
[MySQL's extension]: https://dev.mysql.com/doc/refman/5.7/en/columns-table.html
171+
[MySQL's extension]: https://dev.mysql.com/doc/refman/8.4/en/information-schema-columns-table.html

docs/RFCS/20180501_change_data_capture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -707,7 +707,7 @@ an alias if we build support for CockroachDB to CockroachDB replication.
707707
[kafka]: https://kafka.apache.org/intro
708708
[other cdc syntaxes]: #appendix-other-cdc-syntaxes
709709
[proposed and sent through raft]: https://github.com/cockroachdb/cockroach/blob/381e4dafa596c5f3621a48fcb5fce1f62b18c186/docs/RFCS/20160420_proposer_evaluated_kv.md
710-
[schema resolution]: http://avro.apache.org/docs/current/spec.html#Schema+Resolution
710+
[schema resolution]: https://avro.apache.org/docs/++version++/specification/#schema-resolution
711711
[sql table sink]: #sql-table-sink
712712
[system jobs]: https://github.com/cockroachdb/cockroach/blob/381e4dafa596c5f3621a48fcb5fce1f62b18c186/docs/RFCS/20170215_system_jobs.md
713713
[transaction grouped changes]: #no-transaction-grouped-changes

docs/RFCS/20181204_copysets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ function. This design in this RFC is something simple and the respective
375375
algorithms can be tweaked independently later.
376376

377377
### Chainsets
378-
[Chainsets](http://hackingdistributed.com/2014/02/14/chainsets/) is one way
378+
[Chainsets](https://web.archive.org/web/20240714222832/https://hackingdistributed.com/2014/02/14/chainsets/) is one way
379379
to make incremental changes to copysets, but again potentially at the cost
380380
of reduced locality diversity. The length of the chain used in chainsets
381381
could be considered equivalent to replication factor in cockroach.

0 commit comments

Comments
 (0)