Web
The Web agent was used early in the development of Event Data but is no longer active.
Agent Source token | d9c55bad-73db-4860-be18-520d3891b01f |
Consumes Artifacts | domain-list |
Subject coverage | Any webpage. |
Object coverage | All DOIs, all Article Landing Pages |
Data contributor | Various |
Data origin | Authors of webpages |
Freshness | Infrequent |
Identifies | Linked DOIs, unlinked DOIs, landing page URLs |
License | Creative Commons CC0 1.0 Universal (CC0 1.0) |
Looks in | Text of webpages. |
Name | Web |
Operated by | Crossref |
Produces Evidence Records | Yes |
Produces relation types | mentions |
Source ID | web |
Updates or deletions | None expected |
What it is
The Web source is a catch-all name we give to Events collected from the Web when we follow links that fall outside any other source. As with all other sources, we don't visit webpages that belong to Crossref members.
Many Agents such as Reddit Links, Newsfeed, Wikipedia follow links.
What it is
Events from any non-member web page we think might be relevant. We monitor a list of URLs that we think might have links to Items via their DOIs or landing pages, and then follow them to see if we can find any Items. We curate this list, as best we can, to ensure that we never follow a link when we believe it belongs to a Crossref member or when directed not to by robots.txt.
The list of URLs can come from a range of sources, including those submitted by users. If you have such a list, feel free to contact us.
What it does
A list of URLs is maintained. The Agent submits every URL to the Percolator. The Percolator looks for linked or unlinked DOIs, or linked Article Landing Pages in the HTML of each page.
Where data comes from
- A list of URLs that we compile internally, and that are submitted by users.
- The content of each web page on the list.
Example Event
{
"obj_id": "https://doi.org/10.1017/s0963180100005168",
"source_token": "d9c55bad-73db-4860-be18-520d3891b01f",
"occurred_at": "2017-03-13T10:10:38Z",
"subj_id": "http://philpapers.org/rec/ANNAAS",
"id": "00003c22-1571-4bd3-924b-0438f6f7ff54",
"evidence_record": "https://0-evidence-eventdata-crossref-org.library.alliant.edu/evidence/20170313e86bef03-4556-4ecc-8401-0e71af4d0bb6",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"action": "add",
"subj": {
"pid": "http://philpapers.org/rec/ANNAAS",
"work-type": "webpage",
"url": "http://philpapers.org/rec/ANNAAS"
},
"source_id": "web",
"obj": {
"pid": "https://doi.org/10.1017/s0963180100005168",
"url": "https://doi.org/10.1017/s0963180100005168"
},
"timestamp": "2017-03-13T10:11:19Z",
"relation_type_id": "mentions"
}
Evidence Record
The Evidence Record contains observations of type content-url
which correspond to every URL visited.
Edits / deletion
We may mark Events as deleted if we subsequently find that the subj_id
doesn't conform to the Event Data aims (e.g. if it belongs to a member).
Quirks
- The selection of URLs doesn't follow any particular pattern.
Failure modes
- Publisher sites may block the Event Data Bot collecting landing pages.
Further information
None.