I demo'd the program for @jcowey this morning (he was very impressed), and we identified two features that will improve its functionality. I'll present them in separate issue tickets.
The first involves cases where the BP file has seg[@subtype="cr"] but the corresponding PN file does not (or does not have the same information).
Where PN has no entry for C.R. (or an incomplete entry for C.R.) and the user indicates that BP is correct for the line number corresponding to C.R., we need to do two things:
- Update the corresponding PN entry with the new data (which the program is already designed to facilitate)
- Create a new XML file for each review detailed in
seg[@subtype="cr"], unless such a review already appears in PN. This would be a new feature: yes, it is the case that reviews get their own individual XML files in PN Biblio.
So, for example, in 2010-0043, one finds a number of reviews, which are tokenized by -:
Regarding 1., the PN version of this file (20984) does not have seg[@subtype="cr"]. The program as currently designed will remedy that.
But 2. is trickier, since we need to do a few things:
- determine whether the change in BP indicates the publication of new review(s) (i.e., whether the better information in BP is cosmetic or substantive)
- check whether each review already exists in PN, as its own XML file
- where a review doesn't already exist, to create a new XML file for it
Only the user can determine whether BP's seg[@subtype="cr"] indicates the publication of a new review: if it does, that will trigger a new workflow.
Checking for an existing review should be straightforward: reviews are flagged in XML by relatedItem[@type="reviews"], where the item reviewed is identified via bibl/ptr/@target (see snippet for 97327, below, which points to 20984):
<?xml version="1.0" encoding="UTF-8"?>
<bibl xmlns="http://www.tei-c.org/ns/1.0" xml:id="b97327" type="review">
<author>
<forename>Peter</forename>
<surname>Van Minnen</surname>
</author>
<date>2013</date>
<biblScope type="pp" from="323" to="326">323-326</biblScope>
<biblScope type="no">Nos 171-172</biblScope>
<relatedItem type="appearsIn">
<bibl>
<ptr target="https://papyri.info/biblio/110"/>
<!--ignore - start, i.e. SoSOL users may not edit this-->
<!--ignore - stop-->
</bibl>
</relatedItem>
<biblScope type="issue">50</biblScope>
<relatedItem type="reviews" n="1">
<bibl>
<ptr target="https://papyri.info/biblio/20984"/>
<!--ignore - start, i.e. SoSOL users may not edit this-->
<!--ignore - stop-->
</bibl>
</relatedItem>
<idno type="pi">97327</idno>
<seg type="original" subtype="cr" resp="#BP">Peter van Minnen, BASP 50 (2013) pp. 323-326.</seg>
</bibl>
Simply looking for a review that points to the item in question will get us pretty far: a quick check reveals that both the Straus (82888) and van Minnen (97327) reviews already exist in PN (but that there are three others in the BP fiche that do not).
- If the answer is 'there are no matches/reviews of this work in PN
Biblio', then we will need to create the file(s).
- If the answer is 'there is one or more matches', then we will need to determine whether one is correct, or, where none is correct, to create the file.
Where one is correct, the user can identify it and the program can move on (since nothing needs to change). The easiest way to identify the correct file is probably via the page range, which will take the format of \d+-\d+ before the token-delimiting . - : the user could confirm that a review file pointing at the desired target whose page range matches that of the BP fiche is correct (or maybe the computer could simply move on if the page range is a match?). And, again, if there is no match, then we'd have to create the file.
For things that need to be created/added, I would certainly like a log file generated, in which the PN Biblio target for the review is indicated first, followed by the contents of seg[@subtype="cr"] (tokenized by -). Each token would appear on its own line, and we would only need the reviews that do not yet appear in PN Biblio as their own files. So, for 20984, the log file would contain the following output
20984: Thomas Schmidt, MusHelv 68 (2011) pp. 232-233.
20984: Lajos Berkes, Gnomon 85 (2013) pp. 464-466.
20984: Rudolf Stefec, Gymnasium 119 (2012) pp. 302-304.
Via XSLT, it will be possible to parse the log, and to populate the new XML with pretty much everything we need (as per the **** in the following template). But if you want to to try doing that using non-XSLT tools, I can provide a list of journal abbreviations used by BP and the corresponding PN Biblio idno that is stated in relatedItem[@type="appearsIn"].
<?xml version="1.0" encoding="UTF-8"?>
<bibl xmlns="http://www.tei-c.org/ns/1.0" xml:id="b****" type="review">
<author>
<forename>****</forename>
<surname>****</surname>
</author>
<date>****</date>
<biblScope type="pp" from="****" to="****">****</biblScope>
<relatedItem type="appearsIn">
<bibl>
<ptr target="https://papyri.info/biblio/****"/>
<!--ignore - start, i.e. SoSOL users may not edit this-->
<!--ignore - stop-->
</bibl>
</relatedItem>
<biblScope type="issue">****</biblScope>
<relatedItem type="reviews" n="1">
<bibl>
<ptr target="https://papyri.info/biblio/****"/>
<!--ignore - start, i.e. SoSOL users may not edit this-->
<!--ignore - stop-->
</bibl>
</relatedItem>
<idno type="pi">****</idno>
<seg type="original" subtype="cr" resp="#BP">****</seg>
</bibl>
I demo'd the program for @jcowey this morning (he was very impressed), and we identified two features that will improve its functionality. I'll present them in separate issue tickets.
The first involves cases where the BP file has
seg[@subtype="cr"]but the corresponding PN file does not (or does not have the same information).Where PN has no entry for
C.R.(or an incomplete entry forC.R.) and the user indicates that BP is correct for the line number corresponding toC.R., we need to do two things:seg[@subtype="cr"], unless such a review already appears in PN. This would be a new feature: yes, it is the case that reviews get their own individual XML files in PNBiblio.So, for example, in 2010-0043, one finds a number of reviews, which are tokenized by
-:Regarding 1., the PN version of this file (20984) does not have
seg[@subtype="cr"]. The program as currently designed will remedy that.But 2. is trickier, since we need to do a few things:
Only the user can determine whether BP's
seg[@subtype="cr"]indicates the publication of a new review: if it does, that will trigger a new workflow.Checking for an existing review should be straightforward: reviews are flagged in XML by
relatedItem[@type="reviews"], where the item reviewed is identified viabibl/ptr/@target(see snippet for 97327, below, which points to 20984):Simply looking for a review that points to the item in question will get us pretty far: a quick check reveals that both the Straus (82888) and van Minnen (97327) reviews already exist in PN (but that there are three others in the BP fiche that do not).
Biblio', then we will need to create the file(s).Where one is correct, the user can identify it and the program can move on (since nothing needs to change). The easiest way to identify the correct file is probably via the page range, which will take the format of
\d+-\d+before the token-delimiting. -: the user could confirm that a review file pointing at the desired target whose page range matches that of the BP fiche is correct (or maybe the computer could simply move on if the page range is a match?). And, again, if there is no match, then we'd have to create the file.For things that need to be created/added, I would certainly like a log file generated, in which the PN
Bibliotarget for the review is indicated first, followed by the contents ofseg[@subtype="cr"](tokenized by-). Each token would appear on its own line, and we would only need the reviews that do not yet appear in PNBiblioas their own files. So, for 20984, the log file would contain the following output20984: Thomas Schmidt, MusHelv 68 (2011) pp. 232-233.
20984: Lajos Berkes, Gnomon 85 (2013) pp. 464-466.
20984: Rudolf Stefec, Gymnasium 119 (2012) pp. 302-304.
Via XSLT, it will be possible to parse the log, and to populate the new XML with pretty much everything we need (as per the
****in the following template). But if you want to to try doing that using non-XSLT tools, I can provide a list of journal abbreviations used by BP and the corresponding PNBiblioidnothat is stated inrelatedItem[@type="appearsIn"].