diff --git a/.github/ISSUE_TEMPLATE/---bug-report.md b/.github/ISSUE_TEMPLATE/---bug-report.md index 2bc44b7ec..e0afd0b03 100644 --- a/.github/ISSUE_TEMPLATE/---bug-report.md +++ b/.github/ISSUE_TEMPLATE/---bug-report.md @@ -26,7 +26,7 @@ Steps to reproduce the behavior: - Slideflow Version (e.g., 1.0): - OS (e.g., Ubuntu): - - How you installed PyTorch (`pip`, source): + - How you installed Slideflow (`pip`, source): - Python version: - CUDA/cuDNN version: - GPU models and configuration: diff --git a/.github/workflows/publish-to-pypi.yml b/.github/workflows/publish-to-pypi.yml index 0ecda0efb..99e597539 100644 --- a/.github/workflows/publish-to-pypi.yml +++ b/.github/workflows/publish-to-pypi.yml @@ -20,6 +20,12 @@ jobs: python -m pip install -r requirements.txt --user + - name: Initialize submodule + run: >- + git submodule init + - name: Update submodule + run: >- + git submodule update --remote --recursive - name: Build a binary wheel run: >- python diff --git a/.github/workflows/publish-to-test-pypi.yml b/.github/workflows/publish-to-test-pypi.yml index ba27967a0..07045cbc9 100644 --- a/.github/workflows/publish-to-test-pypi.yml +++ b/.github/workflows/publish-to-test-pypi.yml @@ -20,6 +20,12 @@ jobs: python -m pip install -r requirements.txt --user + - name: Initialize submodule + run: >- + git submodule init + - name: Update submodule + run: >- + git submodule update --remote --recursive - name: Build a binary wheel run: >- python diff --git a/.gitignore b/.gitignore index 3406f71b4..745e123d5 100644 --- a/.gitignore +++ b/.gitignore @@ -1,8 +1,17 @@ docs/build/* *.pyc *.egg-info +*.ipynb_checkpoints +*.pdf +.vscode/* .neptune/* docs-source/build/* +docs-source/pytorch_sphinx_theme/yarn.lock.backup build/* dist/* -slideflow_test/* \ No newline at end of file +tutorials/* +slideflow_test/* +torch_test/* +warn_report*.txt +slideflow.log +.DS_Store diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 000000000..9ddcb959c --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "slideflow/simclr/simclr"] + path = slideflow/simclr/simclr + url = https://github.com/jamesdolezal/simclr.git diff --git a/LICENSE b/LICENSE index f288702d2..261eeb9e9 100644 --- a/LICENSE +++ b/LICENSE @@ -1,674 +1,201 @@ - GNU GENERAL PUBLIC LICENSE - Version 3, 29 June 2007 - - Copyright (C) 2007 Free Software Foundation, Inc. - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The GNU General Public License is a free, copyleft license for -software and other kinds of works. - - The licenses for most software and other practical works are designed -to take away your freedom to share and change the works. By contrast, -the GNU General Public License is intended to guarantee your freedom to -share and change all versions of a program--to make sure it remains free -software for all its users. We, the Free Software Foundation, use the -GNU General Public License for most of our software; it applies also to -any other work released this way by its authors. You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -them if you wish), that you receive source code or can get it if you -want it, that you can change the software or use pieces of it in new -free programs, and that you know you can do these things. - - To protect your rights, we need to prevent others from denying you -these rights or asking you to surrender the rights. Therefore, you have -certain responsibilities if you distribute copies of the software, or if -you modify it: responsibilities to respect the freedom of others. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must pass on to the recipients the same -freedoms that you received. You must make sure that they, too, receive -or can get the source code. And you must show them these terms so they -know their rights. - - Developers that use the GNU GPL protect your rights with two steps: -(1) assert copyright on the software, and (2) offer you this License -giving you legal permission to copy, distribute and/or modify it. - - For the developers' and authors' protection, the GPL clearly explains -that there is no warranty for this free software. For both users' and -authors' sake, the GPL requires that modified versions be marked as -changed, so that their problems will not be attributed erroneously to -authors of previous versions. - - Some devices are designed to deny users access to install or run -modified versions of the software inside them, although the manufacturer -can do so. This is fundamentally incompatible with the aim of -protecting users' freedom to change the software. The systematic -pattern of such abuse occurs in the area of products for individuals to -use, which is precisely where it is most unacceptable. Therefore, we -have designed this version of the GPL to prohibit the practice for those -products. If such problems arise substantially in other domains, we -stand ready to extend this provision to those domains in future versions -of the GPL, as needed to protect the freedom of users. - - Finally, every program is threatened constantly by software patents. -States should not allow patents to restrict development and use of -software on general-purpose computers, but in those that do, we wish to -avoid the special danger that patents applied to a free program could -make it effectively proprietary. To prevent this, the GPL assures that -patents cannot be used to render the program non-free. - - The precise terms and conditions for copying, distribution and -modification follow. - - TERMS AND CONDITIONS - - 0. Definitions. - - "This License" refers to version 3 of the GNU General Public License. - - "Copyright" also means copyright-like laws that apply to other kinds of -works, such as semiconductor masks. - - "The Program" refers to any copyrightable work licensed under this -License. Each licensee is addressed as "you". "Licensees" and -"recipients" may be individuals or organizations. - - To "modify" a work means to copy from or adapt all or part of the work -in a fashion requiring copyright permission, other than the making of an -exact copy. The resulting work is called a "modified version" of the -earlier work or a work "based on" the earlier work. - - A "covered work" means either the unmodified Program or a work based -on the Program. - - To "propagate" a work means to do anything with it that, without -permission, would make you directly or secondarily liable for -infringement under applicable copyright law, except executing it on a -computer or modifying a private copy. Propagation includes copying, -distribution (with or without modification), making available to the -public, and in some countries other activities as well. - - To "convey" a work means any kind of propagation that enables other -parties to make or receive copies. Mere interaction with a user through -a computer network, with no transfer of a copy, is not conveying. - - An interactive user interface displays "Appropriate Legal Notices" -to the extent that it includes a convenient and prominently visible -feature that (1) displays an appropriate copyright notice, and (2) -tells the user that there is no warranty for the work (except to the -extent that warranties are provided), that licensees may convey the -work under this License, and how to view a copy of this License. If -the interface presents a list of user commands or options, such as a -menu, a prominent item in the list meets this criterion. - - 1. Source Code. - - The "source code" for a work means the preferred form of the work -for making modifications to it. "Object code" means any non-source -form of a work. - - A "Standard Interface" means an interface that either is an official -standard defined by a recognized standards body, or, in the case of -interfaces specified for a particular programming language, one that -is widely used among developers working in that language. - - The "System Libraries" of an executable work include anything, other -than the work as a whole, that (a) is included in the normal form of -packaging a Major Component, but which is not part of that Major -Component, and (b) serves only to enable use of the work with that -Major Component, or to implement a Standard Interface for which an -implementation is available to the public in source code form. A -"Major Component", in this context, means a major essential component -(kernel, window system, and so on) of the specific operating system -(if any) on which the executable work runs, or a compiler used to -produce the work, or an object code interpreter used to run it. - - The "Corresponding Source" for a work in object code form means all -the source code needed to generate, install, and (for an executable -work) run the object code and to modify the work, including scripts to -control those activities. However, it does not include the work's -System Libraries, or general-purpose tools or generally available free -programs which are used unmodified in performing those activities but -which are not part of the work. For example, Corresponding Source -includes interface definition files associated with source files for -the work, and the source code for shared libraries and dynamically -linked subprograms that the work is specifically designed to require, -such as by intimate data communication or control flow between those -subprograms and other parts of the work. - - The Corresponding Source need not include anything that users -can regenerate automatically from other parts of the Corresponding -Source. - - The Corresponding Source for a work in source code form is that -same work. - - 2. Basic Permissions. - - All rights granted under this License are granted for the term of -copyright on the Program, and are irrevocable provided the stated -conditions are met. This License explicitly affirms your unlimited -permission to run the unmodified Program. The output from running a -covered work is covered by this License only if the output, given its -content, constitutes a covered work. This License acknowledges your -rights of fair use or other equivalent, as provided by copyright law. - - You may make, run and propagate covered works that you do not -convey, without conditions so long as your license otherwise remains -in force. You may convey covered works to others for the sole purpose -of having them make modifications exclusively for you, or provide you -with facilities for running those works, provided that you comply with -the terms of this License in conveying all material for which you do -not control copyright. Those thus making or running the covered works -for you must do so exclusively on your behalf, under your direction -and control, on terms that prohibit them from making any copies of -your copyrighted material outside their relationship with you. - - Conveying under any other circumstances is permitted solely under -the conditions stated below. Sublicensing is not allowed; section 10 -makes it unnecessary. - - 3. Protecting Users' Legal Rights From Anti-Circumvention Law. - - No covered work shall be deemed part of an effective technological -measure under any applicable law fulfilling obligations under article -11 of the WIPO copyright treaty adopted on 20 December 1996, or -similar laws prohibiting or restricting circumvention of such -measures. - - When you convey a covered work, you waive any legal power to forbid -circumvention of technological measures to the extent such circumvention -is effected by exercising rights under this License with respect to -the covered work, and you disclaim any intention to limit operation or -modification of the work as a means of enforcing, against the work's -users, your or third parties' legal rights to forbid circumvention of -technological measures. - - 4. Conveying Verbatim Copies. - - You may convey verbatim copies of the Program's source code as you -receive it, in any medium, provided that you conspicuously and -appropriately publish on each copy an appropriate copyright notice; -keep intact all notices stating that this License and any -non-permissive terms added in accord with section 7 apply to the code; -keep intact all notices of the absence of any warranty; and give all -recipients a copy of this License along with the Program. - - You may charge any price or no price for each copy that you convey, -and you may offer support or warranty protection for a fee. - - 5. Conveying Modified Source Versions. - - You may convey a work based on the Program, or the modifications to -produce it from the Program, in the form of source code under the -terms of section 4, provided that you also meet all of these conditions: - - a) The work must carry prominent notices stating that you modified - it, and giving a relevant date. - - b) The work must carry prominent notices stating that it is - released under this License and any conditions added under section - 7. This requirement modifies the requirement in section 4 to - "keep intact all notices". - - c) You must license the entire work, as a whole, under this - License to anyone who comes into possession of a copy. This - License will therefore apply, along with any applicable section 7 - additional terms, to the whole of the work, and all its parts, - regardless of how they are packaged. This License gives no - permission to license the work in any other way, but it does not - invalidate such permission if you have separately received it. - - d) If the work has interactive user interfaces, each must display - Appropriate Legal Notices; however, if the Program has interactive - interfaces that do not display Appropriate Legal Notices, your - work need not make them do so. - - A compilation of a covered work with other separate and independent -works, which are not by their nature extensions of the covered work, -and which are not combined with it such as to form a larger program, -in or on a volume of a storage or distribution medium, is called an -"aggregate" if the compilation and its resulting copyright are not -used to limit the access or legal rights of the compilation's users -beyond what the individual works permit. Inclusion of a covered work -in an aggregate does not cause this License to apply to the other -parts of the aggregate. - - 6. Conveying Non-Source Forms. - - You may convey a covered work in object code form under the terms -of sections 4 and 5, provided that you also convey the -machine-readable Corresponding Source under the terms of this License, -in one of these ways: - - a) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by the - Corresponding Source fixed on a durable physical medium - customarily used for software interchange. - - b) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by a - written offer, valid for at least three years and valid for as - long as you offer spare parts or customer support for that product - model, to give anyone who possesses the object code either (1) a - copy of the Corresponding Source for all the software in the - product that is covered by this License, on a durable physical - medium customarily used for software interchange, for a price no - more than your reasonable cost of physically performing this - conveying of source, or (2) access to copy the - Corresponding Source from a network server at no charge. - - c) Convey individual copies of the object code with a copy of the - written offer to provide the Corresponding Source. This - alternative is allowed only occasionally and noncommercially, and - only if you received the object code with such an offer, in accord - with subsection 6b. - - d) Convey the object code by offering access from a designated - place (gratis or for a charge), and offer equivalent access to the - Corresponding Source in the same way through the same place at no - further charge. You need not require recipients to copy the - Corresponding Source along with the object code. If the place to - copy the object code is a network server, the Corresponding Source - may be on a different server (operated by you or a third party) - that supports equivalent copying facilities, provided you maintain - clear directions next to the object code saying where to find the - Corresponding Source. Regardless of what server hosts the - Corresponding Source, you remain obligated to ensure that it is - available for as long as needed to satisfy these requirements. - - e) Convey the object code using peer-to-peer transmission, provided - you inform other peers where the object code and Corresponding - Source of the work are being offered to the general public at no - charge under subsection 6d. - - A separable portion of the object code, whose source code is excluded -from the Corresponding Source as a System Library, need not be -included in conveying the object code work. - - A "User Product" is either (1) a "consumer product", which means any -tangible personal property which is normally used for personal, family, -or household purposes, or (2) anything designed or sold for incorporation -into a dwelling. In determining whether a product is a consumer product, -doubtful cases shall be resolved in favor of coverage. For a particular -product received by a particular user, "normally used" refers to a -typical or common use of that class of product, regardless of the status -of the particular user or of the way in which the particular user -actually uses, or expects or is expected to use, the product. A product -is a consumer product regardless of whether the product has substantial -commercial, industrial or non-consumer uses, unless such uses represent -the only significant mode of use of the product. - - "Installation Information" for a User Product means any methods, -procedures, authorization keys, or other information required to install -and execute modified versions of a covered work in that User Product from -a modified version of its Corresponding Source. The information must -suffice to ensure that the continued functioning of the modified object -code is in no case prevented or interfered with solely because -modification has been made. - - If you convey an object code work under this section in, or with, or -specifically for use in, a User Product, and the conveying occurs as -part of a transaction in which the right of possession and use of the -User Product is transferred to the recipient in perpetuity or for a -fixed term (regardless of how the transaction is characterized), the -Corresponding Source conveyed under this section must be accompanied -by the Installation Information. But this requirement does not apply -if neither you nor any third party retains the ability to install -modified object code on the User Product (for example, the work has -been installed in ROM). - - The requirement to provide Installation Information does not include a -requirement to continue to provide support service, warranty, or updates -for a work that has been modified or installed by the recipient, or for -the User Product in which it has been modified or installed. Access to a -network may be denied when the modification itself materially and -adversely affects the operation of the network or violates the rules and -protocols for communication across the network. - - Corresponding Source conveyed, and Installation Information provided, -in accord with this section must be in a format that is publicly -documented (and with an implementation available to the public in -source code form), and must require no special password or key for -unpacking, reading or copying. - - 7. Additional Terms. - - "Additional permissions" are terms that supplement the terms of this -License by making exceptions from one or more of its conditions. -Additional permissions that are applicable to the entire Program shall -be treated as though they were included in this License, to the extent -that they are valid under applicable law. If additional permissions -apply only to part of the Program, that part may be used separately -under those permissions, but the entire Program remains governed by -this License without regard to the additional permissions. - - When you convey a copy of a covered work, you may at your option -remove any additional permissions from that copy, or from any part of -it. (Additional permissions may be written to require their own -removal in certain cases when you modify the work.) You may place -additional permissions on material, added by you to a covered work, -for which you have or can give appropriate copyright permission. - - Notwithstanding any other provision of this License, for material you -add to a covered work, you may (if authorized by the copyright holders of -that material) supplement the terms of this License with terms: - - a) Disclaiming warranty or limiting liability differently from the - terms of sections 15 and 16 of this License; or - - b) Requiring preservation of specified reasonable legal notices or - author attributions in that material or in the Appropriate Legal - Notices displayed by works containing it; or - - c) Prohibiting misrepresentation of the origin of that material, or - requiring that modified versions of such material be marked in - reasonable ways as different from the original version; or - - d) Limiting the use for publicity purposes of names of licensors or - authors of the material; or - - e) Declining to grant rights under trademark law for use of some - trade names, trademarks, or service marks; or - - f) Requiring indemnification of licensors and authors of that - material by anyone who conveys the material (or modified versions of - it) with contractual assumptions of liability to the recipient, for - any liability that these contractual assumptions directly impose on - those licensors and authors. - - All other non-permissive additional terms are considered "further -restrictions" within the meaning of section 10. If the Program as you -received it, or any part of it, contains a notice stating that it is -governed by this License along with a term that is a further -restriction, you may remove that term. If a license document contains -a further restriction but permits relicensing or conveying under this -License, you may add to a covered work material governed by the terms -of that license document, provided that the further restriction does -not survive such relicensing or conveying. - - If you add terms to a covered work in accord with this section, you -must place, in the relevant source files, a statement of the -additional terms that apply to those files, or a notice indicating -where to find the applicable terms. - - Additional terms, permissive or non-permissive, may be stated in the -form of a separately written license, or stated as exceptions; -the above requirements apply either way. - - 8. Termination. - - You may not propagate or modify a covered work except as expressly -provided under this License. Any attempt otherwise to propagate or -modify it is void, and will automatically terminate your rights under -this License (including any patent licenses granted under the third -paragraph of section 11). - - However, if you cease all violation of this License, then your -license from a particular copyright holder is reinstated (a) -provisionally, unless and until the copyright holder explicitly and -finally terminates your license, and (b) permanently, if the copyright -holder fails to notify you of the violation by some reasonable means -prior to 60 days after the cessation. - - Moreover, your license from a particular copyright holder is -reinstated permanently if the copyright holder notifies you of the -violation by some reasonable means, this is the first time you have -received notice of violation of this License (for any work) from that -copyright holder, and you cure the violation prior to 30 days after -your receipt of the notice. - - Termination of your rights under this section does not terminate the -licenses of parties who have received copies or rights from you under -this License. If your rights have been terminated and not permanently -reinstated, you do not qualify to receive new licenses for the same -material under section 10. - - 9. Acceptance Not Required for Having Copies. - - You are not required to accept this License in order to receive or -run a copy of the Program. Ancillary propagation of a covered work -occurring solely as a consequence of using peer-to-peer transmission -to receive a copy likewise does not require acceptance. However, -nothing other than this License grants you permission to propagate or -modify any covered work. These actions infringe copyright if you do -not accept this License. Therefore, by modifying or propagating a -covered work, you indicate your acceptance of this License to do so. - - 10. Automatic Licensing of Downstream Recipients. - - Each time you convey a covered work, the recipient automatically -receives a license from the original licensors, to run, modify and -propagate that work, subject to this License. You are not responsible -for enforcing compliance by third parties with this License. - - An "entity transaction" is a transaction transferring control of an -organization, or substantially all assets of one, or subdividing an -organization, or merging organizations. If propagation of a covered -work results from an entity transaction, each party to that -transaction who receives a copy of the work also receives whatever -licenses to the work the party's predecessor in interest had or could -give under the previous paragraph, plus a right to possession of the -Corresponding Source of the work from the predecessor in interest, if -the predecessor has it or can get it with reasonable efforts. - - You may not impose any further restrictions on the exercise of the -rights granted or affirmed under this License. For example, you may -not impose a license fee, royalty, or other charge for exercise of -rights granted under this License, and you may not initiate litigation -(including a cross-claim or counterclaim in a lawsuit) alleging that -any patent claim is infringed by making, using, selling, offering for -sale, or importing the Program or any portion of it. - - 11. Patents. - - A "contributor" is a copyright holder who authorizes use under this -License of the Program or a work on which the Program is based. The -work thus licensed is called the contributor's "contributor version". - - A contributor's "essential patent claims" are all patent claims -owned or controlled by the contributor, whether already acquired or -hereafter acquired, that would be infringed by some manner, permitted -by this License, of making, using, or selling its contributor version, -but do not include claims that would be infringed only as a -consequence of further modification of the contributor version. For -purposes of this definition, "control" includes the right to grant -patent sublicenses in a manner consistent with the requirements of -this License. - - Each contributor grants you a non-exclusive, worldwide, royalty-free -patent license under the contributor's essential patent claims, to -make, use, sell, offer for sale, import and otherwise run, modify and -propagate the contents of its contributor version. - - In the following three paragraphs, a "patent license" is any express -agreement or commitment, however denominated, not to enforce a patent -(such as an express permission to practice a patent or covenant not to -sue for patent infringement). To "grant" such a patent license to a -party means to make such an agreement or commitment not to enforce a -patent against the party. - - If you convey a covered work, knowingly relying on a patent license, -and the Corresponding Source of the work is not available for anyone -to copy, free of charge and under the terms of this License, through a -publicly available network server or other readily accessible means, -then you must either (1) cause the Corresponding Source to be so -available, or (2) arrange to deprive yourself of the benefit of the -patent license for this particular work, or (3) arrange, in a manner -consistent with the requirements of this License, to extend the patent -license to downstream recipients. "Knowingly relying" means you have -actual knowledge that, but for the patent license, your conveying the -covered work in a country, or your recipient's use of the covered work -in a country, would infringe one or more identifiable patents in that -country that you have reason to believe are valid. - - If, pursuant to or in connection with a single transaction or -arrangement, you convey, or propagate by procuring conveyance of, a -covered work, and grant a patent license to some of the parties -receiving the covered work authorizing them to use, propagate, modify -or convey a specific copy of the covered work, then the patent license -you grant is automatically extended to all recipients of the covered -work and works based on it. - - A patent license is "discriminatory" if it does not include within -the scope of its coverage, prohibits the exercise of, or is -conditioned on the non-exercise of one or more of the rights that are -specifically granted under this License. You may not convey a covered -work if you are a party to an arrangement with a third party that is -in the business of distributing software, under which you make payment -to the third party based on the extent of your activity of conveying -the work, and under which the third party grants, to any of the -parties who would receive the covered work from you, a discriminatory -patent license (a) in connection with copies of the covered work -conveyed by you (or copies made from those copies), or (b) primarily -for and in connection with specific products or compilations that -contain the covered work, unless you entered into that arrangement, -or that patent license was granted, prior to 28 March 2007. - - Nothing in this License shall be construed as excluding or limiting -any implied license or other defenses to infringement that may -otherwise be available to you under applicable patent law. - - 12. No Surrender of Others' Freedom. - - If conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot convey a -covered work so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you may -not convey it at all. For example, if you agree to terms that obligate you -to collect a royalty for further conveying from those to whom you convey -the Program, the only way you could satisfy both those terms and this -License would be to refrain entirely from conveying the Program. - - 13. Use with the GNU Affero General Public License. - - Notwithstanding any other provision of this License, you have -permission to link or combine any covered work with a work licensed -under version 3 of the GNU Affero General Public License into a single -combined work, and to convey the resulting work. The terms of this -License will continue to apply to the part which is the covered work, -but the special requirements of the GNU Affero General Public License, -section 13, concerning interaction through a network will apply to the -combination as such. - - 14. Revised Versions of this License. - - The Free Software Foundation may publish revised and/or new versions of -the GNU General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - - Each version is given a distinguishing version number. If the -Program specifies that a certain numbered version of the GNU General -Public License "or any later version" applies to it, you have the -option of following the terms and conditions either of that numbered -version or of any later version published by the Free Software -Foundation. If the Program does not specify a version number of the -GNU General Public License, you may choose any version ever published -by the Free Software Foundation. - - If the Program specifies that a proxy can decide which future -versions of the GNU General Public License can be used, that proxy's -public statement of acceptance of a version permanently authorizes you -to choose that version for the Program. - - Later license versions may give you additional or different -permissions. However, no additional obligations are imposed on any -author or copyright holder as a result of your choosing to follow a -later version. - - 15. Disclaimer of Warranty. - - THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY -APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT -HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY -OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, -THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM -IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF -ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - - 16. Limitation of Liability. - - IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS -THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY -GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE -USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF -DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD -PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), -EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF -SUCH DAMAGES. - - 17. Interpretation of Sections 15 and 16. - - If the disclaimer of warranty and limitation of liability provided -above cannot be given local legal effect according to their terms, -reviewing courts shall apply local law that most closely approximates -an absolute waiver of all civil liability in connection with the -Program, unless a warranty or assumption of liability accompanies a -copy of the Program in return for a fee. - - END OF TERMS AND CONDITIONS - - How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -state the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - - Copyright (C) - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see . - -Also add information on how to contact you by electronic and paper mail. - - If the program does terminal interaction, make it output a short -notice like this when it starts in an interactive mode: - - Copyright (C) - This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, your program's commands -might be different; for a GUI interface, you would use an "about box". - - You should also get your employer (if you work as a programmer) or school, -if any, to sign a "copyright disclaimer" for the program, if necessary. -For more information on this, and how to apply and follow the GNU GPL, see -. - - The GNU General Public License does not permit incorporating your program -into proprietary programs. If your program is a subroutine library, you -may consider it more useful to permit linking proprietary applications with -the library. If this is what you want to do, use the GNU Lesser General -Public License instead of this License. But first, please read -. + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md index 59e427076..e25f5e89b 100755 --- a/README.md +++ b/README.md @@ -1,40 +1,65 @@ -![slideflow logo](https://github.com/jamesdolezal/slideflow/raw/master/docs-source/pytorch_sphinx_theme/images/slideflow-banner.png) -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5703792.svg)](https://doi.org/10.5281/zenodo.5703792) -[![Python application](https://github.com/jamesdolezal/slideflow/actions/workflows/python-app.yml/badge.svg?branch=master)](https://github.com/jamesdolezal/slideflow/actions/workflows/python-app.yml) -[![PyPI version](https://badge.fury.io/py/slideflow.svg)](https://badge.fury.io/py/slideflow) +
+ slideflow logo -Slideflow provides a unified API for building and testing deep learning models for digital pathology, supporting both Tensorflow/Keras and PyTorch. + [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5703792.svg)](https://doi.org/10.5281/zenodo.5703792) + [![Python application](https://github.com/slideflow/slideflow/actions/workflows/python-app.yml/badge.svg?branch=master)](https://github.com/slideflow/slideflow/actions/workflows/python-app.yml) + [![PyPI version](https://badge.fury.io/py/slideflow.svg)](https://badge.fury.io/py/slideflow) -Slideflow includes tools for **whole-slide image processing** and segmentation, **customizable deep learning model training** with dozens of supported architectures, **explainability tools** including heatmaps and mosaic maps, **analysis of activations** from model layers, **uncertainty quantification**, and more. A variety of fast, optimized whole-slide image processing tools are included, including background filtering, blur/artifact detection, stain normalization, and efficient storage in `*.tfrecords` format. Model training is easy and highly configurable, with an easy drop-in API for training custom architectures. For external training loops, Slideflow can be used as an image processing backend, serving an optimized `tf.data.Dataset` or `torch.utils.data.DataLoader` to read and process slide images and perform real-time stain normalization. + [ArXiv](https://arxiv.org/abs/2304.04142) | [Docs](https://slideflow.dev) | [Slideflow Studio](https://slideflow.dev/studio/) | [Cite](#reference) | [✨ What's New in 3.0 ✨](https://github.com/slideflow/slideflow/releases/tag/3.0.0) -Slideflow has been used by: + ______________________________________________________________________ -- [Dolezal et al](https://www.nature.com/articles/s41379-020-00724-3), _Modern Pathology_, 2020 -- [Rosenberg et al](https://ascopubs.org/doi/10.1200/JCO.2020.38.15_suppl.e23529), _Journal of Clinical Oncology_ [abstract], 2020 -- [Howard et al](https://www.nature.com/articles/s41467-021-24698-1), _Nature Communications_, 2021 -- [Dolezal et al](https://arxiv.org/abs/2204.04516) [arXiv], 2022 -- [Storozuk et al](https://www.nature.com/articles/s41379-022-01039-1.pdf), _Modern Pathology_ [abstract], 2022 -- [Partin et al](https://arxiv.org/abs/2204.11678) [arXiv], 2022 -- [Dolezal et al](https://meetings.asco.org/abstracts-presentations/212459) [ASCO abstract], 2022 + ![Slideflow Studio: a visualization tool for interacting with models and whole-slide images.](https://github.com/slideflow/slideflow/assets/48372806/7f43d8cb-dc80-427d-84c4-3e5a35fa1472) + +
+ +**Slideflow is a deep learning library for digital pathology, offering a user-friendly interface for model development.** + +Designed for both medical researchers and AI enthusiasts, the goal of Slideflow is to provide an accessible, easy-to-use interface for developing state-of-the-art pathology models. Slideflow has been built with the future in mind, offering a scalable platform for digital biomarker development that bridges the gap between ever-evolving, sophisticated methods and the needs of a clinical researcher. For developers, Slideflow provides multiple endpoints for integration with other packages and external training paradigms, allowing you to leverage highly optimized, pathology-specific processes with the latest ML methodologies. + + + +## 🚀 Features +- Easy-to-use, highly customizable training pipelines +- Robust **[slide processing](https://slideflow.dev/slide_processing) and [stain normalization](https://slideflow.dev/norm)** toolkit +- Support for training with **[weakly-supervised](https://slideflow.dev/training) or [strongly-supervised](https://slideflow.dev/tile_labels)** labels +- Built-in, state-of-the-art **[foundation models](https://slideflow.dev/features)** +- **[Multiple-instance learning (MIL)](https://slideflow.dev/mil)** +- **[Self-supervised learning (SSL)](https://slideflow.dev/ssl)** +- **[Generative adversarial networks (GANs)](https://slideflow.dev/training)** +- **Explainability tools**: [Heatmaps](https://slideflow.dev/evaluation/#heatmaps), [mosaic maps](https://slideflow.dev/posthoc/#mosaic-maps), [saliency maps](https://slideflow.dev/saliency/), [synthetic histology](https://slideflow.dev/stylegan) +- Robust **[layer activation analysis](https://slideflow.dev/posthoc)** tools +- **[Uncertainty quantification](https://slideflow.dev/uq)** +- **[Interactive user interface](https://slideflow.dev/studio)** for model deployment +- ... and more! Full documentation with example tutorials can be found at [slideflow.dev](https://www.slideflow.dev/). ## Requirements -- Python >= 3.7 -- [Libvips](https://libvips.github.io/libvips/) >= 8.9. -- [OpenSlide](https://openslide.org/download/) -- [Tensorflow](https://www.tensorflow.org/) >= 2.5 _or_ [PyTorch](https://pytorch.org/) >= 1.9 -- [QuPath](https://qupath.github.io/) [_optional_] - Used for pathologist ROIs -- [CPLEX](https://www.ibm.com/docs/en/icos/12.10.0?topic=v12100-installing-cplex-optimization-studio) 20.1.0 with [Python API](https://www.ibm.com/docs/en/icos/12.10.0?topic=cplex-setting-up-python-api) [_optional_] - Used for preserved-site cross-validation +- Python >= 3.7 (<3.10 if using [cuCIM](https://docs.rapids.ai/api/cucim/stable/)) +- [PyTorch](https://pytorch.org/) >= 1.9 _or_ [Tensorflow](https://www.tensorflow.org/) 2.5-2.11 + +### Optional +- [Libvips](https://libvips.github.io/libvips/) >= 8.9 (alternative slide reader, adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files). +- Linear solver (for preserved-site cross-validation) + - [CPLEX](https://www.ibm.com/docs/en/icos/12.10.0?topic=v12100-installing-cplex-optimization-studio) 20.1.0 with [Python API](https://www.ibm.com/docs/en/icos/12.10.0?topic=cplex-setting-up-python-api) + - _or_ [Pyomo](http://www.pyomo.org/installation) with [Bonmin](https://anaconda.org/conda-forge/coinbonmin) solver + + +## 📥 Installation +Slideflow can be installed with PyPI, as a Docker container, or run from source. -## Installation -Slideflow can be installed either with PyPI or as a Docker container. To install via pip: +### Method 1: Install via pip ``` pip3 install --upgrade setuptools pip wheel -pip3 install slideflow +pip3 install slideflow[cucim] cupy-cuda11x ``` +The `cupy` package name depends on the installed CUDA version; [see here](https://docs.cupy.dev/en/stable/install.html#installing-cupy) for installation instructions. `cupy` is not required if using Libvips. + +### Method 2: Docker image + Alternatively, pre-configured [docker images](https://hub.docker.com/repository/docker/jamesdolezal/slideflow) are available with OpenSlide/Libvips and the latest version of either Tensorflow and PyTorch. To install with the Tensorflow backend: ``` @@ -49,34 +74,78 @@ docker pull jamesdolezal/slideflow:latest-torch docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch ``` +### Method 3: From source + +To run from source, clone this repository, install the conda development environment, and build a wheel: + +``` +git clone https://github.com/slideflow/slideflow +conda env create -f slideflow/environment.yml +conda activate slideflow +pip install -e slideflow/ cupy-cuda11x +``` + +### Non-Commercial Add-ons + +To add additional tools and pretrained models available under a non-commercial license, install `slideflow-gpl` and `slideflow-noncommercial`: + +``` +pip install slideflow-gpl slideflow-noncommercial +``` + +This will provide integrated access to 6 additional pretrained foundation models ([UNI](https://www.nature.com/articles/s41591-024-02857-3), [HistoSSL](https://www.medrxiv.org/content/10.1101/2023.07.21.23292757v2.full.pdf), [GigaPath](https://aka.ms/gigapath), [PLIP](https://www.nature.com/articles/s41591-023-02504-3), [RetCCL](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002730), and [CTransPath](https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043)), the MIL architecture [CLAM](https://www.nature.com/articles/s41551-020-00682-w), the UQ algorithm [BISCUIT](https://www.nature.com/articles/s41467-022-34025-x), and the GAN framework [StyleGAN3](https://nvlabs-fi-cdn.nvidia.com/stylegan3/stylegan3-paper.pdf). + +## ⚙️ Configuration + +### Deep learning (PyTorch vs. Tensorflow) + +Slideflow supports both PyTorch and Tensorflow, defaulting to PyTorch if both are available. You can specify the backend to use with the environmental variable `SF_BACKEND`. For example: + +``` +export SF_BACKEND=tensorflow +``` + +### Slide reading (cuCIM vs. Libvips) + +By default, Slideflow reads whole-slide images using [cuCIM](https://docs.rapids.ai/api/cucim/stable/). Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a [Libvips](https://libvips.github.io/libvips/) backend, which adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files. You can set the active slide backend with the environmental variable `SF_SLIDE_BACKEND`: + +``` +export SF_SLIDE_BACKEND=libvips +``` + + ## Getting started -Slideflow experiments are organized into [Projects](https://slideflow.dev/project_setup.html), which supervise storage of whole-slide images, extracted tiles, and patient-level annotations. To create a new project, create an instance of the `slideflow.Project` class, supplying a pre-configured set of patient-level annotations in CSV format: +Slideflow experiments are organized into [Projects](https://slideflow.dev/project_setup), which supervise storage of whole-slide images, extracted tiles, and patient-level annotations. The fastest way to get started is to use one of our preconfigured projects, which will automatically download slides from the Genomic Data Commons: ```python import slideflow as sf -P = sf.Project( - '/project/path', - annotations="/patient/annotations.csv" + +P = sf.create_project( + root='/project/destination', + cfg=sf.project.LungAdenoSquam(), + download=True ) ``` -Once the project is created, add a new dataset source with paths to whole-slide images, tumor Region of Interest (ROI) files [if applicable], and paths to where extracted tiles/tfrecords should be stored. This will only need to be done once. +After the slides have been downloaded and verified, you can skip to [Extract tiles from slides](#extract-tiles-from-slides). + +Alternatively, to create a new custom project, supply the location of patient-level annotations (CSV), slides, and a destination for TFRecords to be saved: ```python -P.add_source( - name="TCGA", +import slideflow as sf +P = sf.create_project( + '/project/path', + annotations="/patient/annotations.csv", slides="/slides/directory", - roi="/roi/directory", - tiles="/tiles/directory", tfrecords="/tfrecords/directory" ) ``` -This step should attempt to automatically associate slide names with the patient identifiers in your annotations file. After this step has completed, double check that the annotations file has a `slide` column for each annotation entry with the filename (without extension) of the corresponding slide. +Ensure that the annotations file has a `slide` column for each annotation entry with the filename (without extension) of the corresponding slide. -## Extract tiles from slides +### Extract tiles from slides -Next, whole-slide images are segmented into smaller image tiles and saved in `*.tfrecords` format. [Extract tiles](https://slideflow.dev/extract_tiles.html) from slides at a given magnification (width in microns size) and resolution (width in pixels) using `sf.Project.extract_tiles()`: +Next, whole-slide images are segmented into smaller image tiles and saved in `*.tfrecords` format. [Extract tiles](https://slideflow.dev/slide_processing) from slides at a given magnification (width in microns size) and resolution (width in pixels) using `sf.Project.extract_tiles()`: ```python P.extract_tiles( @@ -94,9 +163,9 @@ P.extract_tiles( ) ``` -## Training models +### Training models -Once tiles are extracted, models can be [trained](https://slideflow.dev/training.html). Start by configuring a set of [hyperparameters](https://slideflow.dev/model.html#modelparams): +Once tiles are extracted, models can be [trained](https://slideflow.dev/training). Start by configuring a set of [hyperparameters](https://slideflow.dev/model#modelparams): ```python params = sf.ModelParams( @@ -109,7 +178,7 @@ params = sf.ModelParams( ) ``` -Models can then be trained using these parameters. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is [highly configurable](https://slideflow.dev/training.html). For example, to train models in cross-validation to predict the outcome `'category1'` as stored in the project annotations file: +Models can then be trained using these parameters. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is [highly configurable](https://slideflow.dev/training). For example, to train models in cross-validation to predict the outcome `'category1'` as stored in the project annotations file: ```python P.train( @@ -120,30 +189,47 @@ P.train( ) ``` -## Evaluation, heatmaps, mosaic maps, and more +### Evaluation, heatmaps, mosaic maps, and more + +Slideflow includes a host of additional tools, including model [evaluation and prediction](https://slideflow.dev/evaluation), [heatmaps](https://slideflow.dev/evaluation#heatmaps), analysis of [layer activations](https://slideflow.dev/posthoc), [mosaic maps](https://slideflow.dev/posthoc#mosaic-maps), and more. See our [full documentation](https://slideflow.dev) for more details and tutorials. + +## 📚 Publications -Slideflow includes a host of additional tools, including model [evaluation](https://slideflow.dev/evaluation.html) and [prediction](https://slideflow.dev/project.html#slideflow.Project.predict), [heatmaps](https://slideflow.dev/project.html#slideflow.Project.generate_heatmaps), [mosaic maps](https://slideflow.dev/project.html#slideflow.Project.generate_mosaic), analysis of [layer activations](https://slideflow.dev/layer_activations.html), and more. See our [full documentation](https://slideflow.dev) for more details and tutorials. +Slideflow has been used by: + +- [Dolezal et al](https://www.nature.com/articles/s41379-020-00724-3), _Modern Pathology_, 2020 +- [Rosenberg et al](https://ascopubs.org/doi/10.1200/JCO.2020.38.15_suppl.e23529), _Journal of Clinical Oncology_ [abstract], 2020 +- [Howard et al](https://www.nature.com/articles/s41467-021-24698-1), _Nature Communications_, 2021 +- [Dolezal et al](https://www.nature.com/articles/s41467-022-34025-x) _Nature Communications_, 2022 +- [Storozuk et al](https://www.nature.com/articles/s41379-022-01039-1.pdf), _Modern Pathology_ [abstract], 2022 +- [Partin et al](https://doi.org/10.3389/fmed.2023.1058919) _Front Med_, 2022 +- [Dolezal et al](https://ascopubs.org/doi/abs/10.1200/JCO.2022.40.16_suppl.8549) _Journal of Clinical Oncology_ [abstract], 2022 +- [Dolezal et al](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9792820/) _Mediastinum_ [abstract], 2022 +- [Howard et al](https://www.nature.com/articles/s41523-023-00530-5) _npj Breast Cancer_, 2023 +- [Dolezal et al](https://www.nature.com/articles/s41698-023-00399-4) _npj Precision Oncology_, 2023 +- [Hieromnimon et al](https://doi.org/10.1101/2023.03.22.533810) [bioRxiv], 2023 +- [Carrillo-Perez et al](https://doi.org/10.1186/s40644-023-00586-3) _Cancer Imaging_, 2023 -## License -This code is made available under the GPLv3 License and is available for non-commercial academic purposes. +## 🔓 License +This code is made available under the Apache-2.0 license. -## Reference -The manuscript describing this protocol is in press. In the meantime, if you find our work useful for your research, or if you use parts of this code, please consider citing as follows: +## 🔗 Reference +If you find our work useful for your research, or if you use parts of this code, please consider citing as follows: -James Dolezal, Sara Kochanny, & Frederick Howard. (2022). Slideflow: A Unified Deep Learning Pipeline for Digital Histology (1.1.0). Zenodo. https://doi.org/10.5281/zenodo.5703792 +Dolezal, J.M., Kochanny, S., Dyer, E. et al. Slideflow: deep learning for digital histopathology with real-time whole-slide visualization. BMC Bioinformatics 25, 134 (2024). https://doi.org/10.1186/s12859-024-05758-x ``` -@software{james_dolezal_2022_5703792, - author = {James Dolezal and - Sara Kochanny and - Frederick Howard}, - title = {{Slideflow: A Unified Deep Learning Pipeline for - Digital Histology}}, - month = apr, - year = 2022, - publisher = {Zenodo}, - version = {1.1.0}, - doi = {10.5281/zenodo.5703792}, - url = {https://doi.org/10.5281/zenodo.5703792} +@Article{Dolezal2024, + author={Dolezal, James M. and Kochanny, Sara and Dyer, Emma and Ramesh, Siddhi and Srisuwananukorn, Andrew and Sacco, Matteo and Howard, Frederick M. and Li, Anran and Mohan, Prajval and Pearson, Alexander T.}, + title={Slideflow: deep learning for digital histopathology with real-time whole-slide visualization}, + journal={BMC Bioinformatics}, + year={2024}, + month={Mar}, + day={27}, + volume={25}, + number={1}, + pages={134}, + doi={10.1186/s12859-024-05758-x}, + url={https://doi.org/10.1186/s12859-024-05758-x} } ``` diff --git a/datasets/breast_er/breast_er.json b/datasets/breast_er/breast_er.json new file mode 100644 index 000000000..79b0f72fe --- /dev/null +++ b/datasets/breast_er/breast_er.json @@ -0,0 +1,4 @@ +{ + "name": "TCGA_BRCA", + "annotations": "./breast_labels.csv" +} \ No newline at end of file diff --git a/datasets/breast_er/breast_labels.csv b/datasets/breast_er/breast_labels.csv new file mode 100644 index 000000000..918e7053d --- /dev/null +++ b/datasets/breast_er/breast_labels.csv @@ -0,0 +1,1049 @@ +patient,slide,site,ER_Status_By_IHC +TCGA-A1-A0SH,TCGA-A1-A0SH-01Z-00-DX1.90E71B08-E1D9-4FC2-85AC-062E56DDF17C,A1,negative +TCGA-A1-A0SK,TCGA-A1-A0SK-01Z-00-DX1.A44D70FA-4D96-43F4-9DD7-A61535786297,A1,negative +TCGA-A1-A0SP,TCGA-A1-A0SP-01Z-00-DX1.20D689C6-EFA5-4694-BE76-24475A89ACC0,A1,negative +TCGA-A2-A04P,TCGA-A2-A04P-01Z-00-DX1.5B481E02-D269-4732-8FDD-6494E6EE2B71,A2,negative +TCGA-A2-A04Q,TCGA-A2-A04Q-01Z-00-DX1.DF7ED6B6-7701-486D-9007-F26B6F0682C4,A2,negative +TCGA-A2-A04T,TCGA-A2-A04T-01Z-00-DX1.71444266-BD56-4183-9603-C7AC20C9DA1E,A2,negative +TCGA-A2-A04U,TCGA-A2-A04U-01Z-00-DX1.06D17357-46A8-4DC3-A22B-2F4EB6EE3F79,A2,negative +TCGA-A2-A04W,TCGA-A2-A04W-01Z-00-DX1.F7E7B945-2ADC-4741-8FCE-ACEA657DB9C7,A2,negative +TCGA-A2-A0CM,TCGA-A2-A0CM-01Z-00-DX1.AC4901DE-4B6D-4185-BB9F-156033839828,A2,negative +TCGA-A2-A0D0,TCGA-A2-A0D0-01Z-00-DX1.4FF6B8E5-703B-400F-920A-104F56E0F874,A2,negative +TCGA-A2-A0EQ,TCGA-A2-A0EQ-01Z-00-DX1.5FCD8890-4594-4BE2-93B1-B8F698BFD2A0,A2,negative +TCGA-A2-A0ST,TCGA-A2-A0ST-01Z-00-DX1.AE05A5DB-4861-40DE-B0F5-7955FC903A96,A2,negative +TCGA-A2-A0SX,TCGA-A2-A0SX-01Z-00-DX1.219A994C-8974-4458-98FA-FB1F14868E04,A2,negative +TCGA-A2-A0T0,TCGA-A2-A0T0-01Z-00-DX1.51F904DA-A4B5-4451-8AEF-58E7EF7651DB,A2,negative +TCGA-A2-A0T1,TCGA-A2-A0T1-01Z-00-DX1.CD1E9C46-18A3-466B-AA0B-24F3055FA851,A2,negative +TCGA-A2-A0T2,TCGA-A2-A0T2-01Z-00-DX1.29A5C4C8-6AE8-44EE-98C2-ACBCBFBE9D60,A2,negative +TCGA-A2-A0YE,TCGA-A2-A0YE-01Z-00-DX1.8A2E3094-5755-42BC-969D-7F0A2ECA0F39,A2,negative +TCGA-A2-A0YM,TCGA-A2-A0YM-01Z-00-DX1.A48B4C96-2CC5-464C-98B7-F0F92AE56533,A2,negative +TCGA-A2-A1G1,TCGA-A2-A1G1-01Z-00-DX1.E52A474F-DE13-41CF-94D4-D00BAC46ECF4,A2,negative +TCGA-A2-A1G6,TCGA-A2-A1G6-01Z-00-DX1.BCC3EC46-CEBF-4B37-BCF0-266D18170B54,A2,negative +TCGA-A2-A25F,TCGA-A2-A25F-01Z-00-DX1.87499CD4-E41C-46B1-AAF3-5CE881F2B8BC,A2,negative +TCGA-A2-A3XS,TCGA-A2-A3XS-01Z-00-DX1.867925C0-91D8-40A0-9FEA-25A635AC31E7,A2,negative +TCGA-A2-A3XT,TCGA-A2-A3XT-01Z-00-DX1.336D6C78-576A-481B-8C83-F3A0FC4B182C,A2,negative +TCGA-A2-A3XU,TCGA-A2-A3XU-01Z-00-DX1.174A92D4-50B2-4A59-AD31-D5EC5BBF2F65,A2,negative +TCGA-A2-A3XX,TCGA-A2-A3XX-01Z-00-DX1.03C6D70F-2833-4E97-91B7-DE70CD083D92,A2,negative +TCGA-A2-A3XY,TCGA-A2-A3XY-01Z-00-DX1.E57FC9BF-411E-4028-AC10-8BCA5D0C8472,A2,negative +TCGA-A2-A3XZ,TCGA-A2-A3XZ-01Z-00-DX1.B7A1344E-7015-4A21-B3D4-EB760161BE7C,A2,negative +TCGA-A7-A0CE,TCGA-A7-A0CE-01Z-00-DX1.E67322FB-ED25-4B85-B3B0-2B8BD277BB4A,A7,negative +TCGA-A7-A0CE,TCGA-A7-A0CE-01Z-00-DX2.5AD1DB65-10E7-4996-AB5E-13D7851EC5FA,A7,negative +TCGA-A7-A0DA,TCGA-A7-A0DA-01Z-00-DX1.5F087009-16E9-4A07-BA24-62340E108B17,A7,negative +TCGA-A7-A0DA,TCGA-A7-A0DA-01Z-00-DX2.90C93176-C3C6-41B3-B34B-B16F1A1779E6,A7,negative +TCGA-A7-A13D,TCGA-A7-A13D-01Z-00-DX1.D206783C-FA6A-4B6A-B3AA-4132A2C9626B,A7,negative +TCGA-A7-A13D,TCGA-A7-A13D-01Z-00-DX2.8165A362-6A08-4BFD-9428-7EEB6C59D7F1,A7,negative +TCGA-A7-A26F,TCGA-A7-A26F-01Z-00-DX1.7ADE881D-325D-48ED-BA2D-30F589CA8EF2,A7,negative +TCGA-A7-A26G,TCGA-A7-A26G-01Z-00-DX1.29A27673-A16D-4044-BCCF-14BF6236D7D3,A7,negative +TCGA-A7-A26I,TCGA-A7-A26I-01Z-00-DX1.0077D012-BC14-4E96-84F7-A1A6A3A778DF,A7,negative +TCGA-A7-A4SD,TCGA-A7-A4SD-01Z-00-DX1.677152B3-17B2-42DB-AEEC-C9509ACCE831,A7,negative +TCGA-A7-A4SE,TCGA-A7-A4SE-01Z-00-DX1.16BC8401-E40E-4A1A-9BD9-12735C9AE3F6,A7,negative +TCGA-A7-A5ZV,TCGA-A7-A5ZV-01Z-00-DX1.21F2EA4A-4F31-43D6-A036-E20E326AF37E,A7,negative +TCGA-A7-A6VV,TCGA-A7-A6VV-01Z-00-DX1.07AE0E16-A883-4C86-BC74-4E13081175F2,A7,negative +TCGA-A7-A6VW,TCGA-A7-A6VW-01Z-00-DX1.1BC4790C-DB45-4A3D-9C97-92C92C03FF60,A7,negative +TCGA-A7-A6VY,TCGA-A7-A6VY-01Z-00-DX1.38D4EBD7-40B0-4EE3-960A-1F00E8F83ADB,A7,negative +TCGA-A8-A07C,TCGA-A8-A07C-01Z-00-DX1.1F069BCA-D2B3-49CF-81FD-9EBA49A3439F,A8,negative +TCGA-A8-A07O,TCGA-A8-A07O-01Z-00-DX1.3D657129-F2A7-4BC1-A910-805FBDCE2212,A8,negative +TCGA-A8-A07R,TCGA-A8-A07R-01Z-00-DX1.D716752E-86AF-468B-A905-A7894B978F22,A8,negative +TCGA-A8-A07U,TCGA-A8-A07U-01Z-00-DX1.69D356C3-C7FC-47E9-B753-BE421263343F,A8,negative +TCGA-A8-A08R,TCGA-A8-A08R-01Z-00-DX1.72FBD9A3-5EB9-4855-89EB-697883C2EA9B,A8,negative +TCGA-A8-A08X,TCGA-A8-A08X-01Z-00-DX1.01FB49CC-6B8E-4317-8C42-6B6D81187A40,A8,negative +TCGA-A8-A09X,TCGA-A8-A09X-01Z-00-DX1.17B5BE28-944F-427B-8F90-44885C3EDD36,A8,negative +TCGA-A8-A0A7,TCGA-A8-A0A7-01Z-00-DX1.A0FD4EEF-B536-47C1-BDA1-7AC71BA3E978,A8,negative +TCGA-AC-A2BK,TCGA-AC-A2BK-01Z-00-DX1.A3A1F275-9CDC-48BA-BB81-BCA6F02ACA94,AC,negative +TCGA-AC-A2QH,TCGA-AC-A2QH-01Z-00-DX1.00B8BFFF-F1E2-4F99-A969-8DD7EE4F8E0B,AC,negative +TCGA-AC-A2QJ,TCGA-AC-A2QJ-01Z-00-DX1.48C303BB-5A23-4037-BD28-77629A8CD9DA,AC,negative +TCGA-AC-A6IW,TCGA-AC-A6IW-01Z-00-DX1.C4514189-E64F-4603-8970-230FA2BB77FC,AC,negative +TCGA-AC-A7VC,TCGA-AC-A7VC-01Z-00-DX1.EB247544-EADE-419C-B7A7-BB9CC978336F,AC,negative +TCGA-AC-A8OQ,TCGA-AC-A8OQ-01Z-00-DX1.15946A44-711C-4765-AB46-5CFB7BD11E42,AC,negative +TCGA-AN-A0AL,TCGA-AN-A0AL-01Z-00-DX1.D9E446A3-175F-4242-AFC1-78FFE3FC9AC4,AN,negative +TCGA-AN-A0AR,TCGA-AN-A0AR-01Z-00-DX1.0CF1267E-C61B-4928-875E-59032F838F07,AN,negative +TCGA-AN-A0AT,TCGA-AN-A0AT-01Z-00-DX1.DFD68CD2-C25E-47BE-BC06-8CE3C657B9FD,AN,negative +TCGA-AN-A0FL,TCGA-AN-A0FL-01Z-00-DX1.20A041C6-A306-4599-A7D1-65032A252AA9,AN,negative +TCGA-AN-A0FV,TCGA-AN-A0FV-01Z-00-DX1.C0D02946-FCDA-472D-895D-7ACF5C96B264,AN,negative +TCGA-AN-A0FX,TCGA-AN-A0FX-01Z-00-DX1.C9656600-F823-4044-972B-7059B39FC539,AN,negative +TCGA-AN-A0G0,TCGA-AN-A0G0-01Z-00-DX1.BE0BB5DF-DEDA-48D8-B5D8-2735C767F28F,AN,negative +TCGA-AN-A0XN,TCGA-AN-A0XN-01Z-00-DX1.A8EB9F62-236E-4C59-A206-DF0C7654AC30,AN,negative +TCGA-AN-A0XS,TCGA-AN-A0XS-01Z-00-DX1.B4D4B73E-2912-4B2E-B18A-FBC35F8160D3,AN,negative +TCGA-AN-A0XU,TCGA-AN-A0XU-01Z-00-DX1.6B0DD0FF-A20D-4BA2-8D48-FC357BA5313F,AN,negative +TCGA-AO-A03U,TCGA-AO-A03U-01Z-00-DX1.AE2B55F3-8BA1-4546-82B7-4D2292BE1C78,AO,negative +TCGA-AO-A0J2,TCGA-AO-A0J2-01Z-00-DX1.7C9FEC7B-6040-4C58-9563-D10C0D7AC72E,AO,negative +TCGA-AO-A0J4,TCGA-AO-A0J4-01Z-00-DX1.2EFC978E-8DF5-4254-9598-C0910B17C5C8,AO,negative +TCGA-AO-A0J6,TCGA-AO-A0J6-01Z-00-DX1.D0C003CE-E112-4375-953D-78404C9D62DA,AO,negative +TCGA-AO-A0JE,TCGA-AO-A0JE-01Z-00-DX1.82D33053-E305-47D8-9B02-B55511EBB06D,AO,negative +TCGA-AO-A0JL,TCGA-AO-A0JL-01Z-00-DX1.473CC200-221A-4777-B19E-ADC41D10FC94,AO,negative +TCGA-AO-A124,TCGA-AO-A124-01Z-00-DX1.E3C7B017-6154-4630-9BDE-0CAC946D0209,AO,negative +TCGA-AO-A128,TCGA-AO-A128-01Z-00-DX1.4E6BFFBC-87AD-4ED4-959D-FEB5545400BE,AO,negative +TCGA-AO-A129,TCGA-AO-A129-01Z-00-DX1.BF485416-BF57-4F39-866B-4B1E201876FA,AO,negative +TCGA-AO-A12D,TCGA-AO-A12D-01Z-00-DX1.BA006BAD-5C6E-4099-BC99-3888E69F506E,AO,negative +TCGA-AO-A12F,TCGA-AO-A12F-01Z-00-DX1.847C8E4F-3F37-4B1D-8E6E-ACD14391AD89,AO,negative +TCGA-AO-A1KR,TCGA-AO-A1KR-01Z-00-DX1.BFB2E69B-E23C-4542-9CBF-EDD040B985AC,AO,negative +TCGA-AQ-A04J,TCGA-AQ-A04J-01Z-00-DX1.6A4E16D4-2696-4E5F-BCA8-708556FC3A8C,AQ,negative +TCGA-AQ-A54N,TCGA-AQ-A54N-01Z-00-DX1.E53271E9-AFD2-4071-9B1E-CD05F63B8136,AQ,negative +TCGA-AR-A0TS,TCGA-AR-A0TS-01Z-00-DX1.2DB81A2E-16C1-4EE0-8175-65FA46768277,AR,negative +TCGA-AR-A0TU,TCGA-AR-A0TU-01Z-00-DX1.2CBBDDAB-C1DD-4205-A555-431542F9C069,AR,negative +TCGA-AR-A0U0,TCGA-AR-A0U0-01Z-00-DX1.C8BECD9F-68D9-47B1-B31E-C1C6CAE456B6,AR,negative +TCGA-AR-A0U1,TCGA-AR-A0U1-01Z-00-DX1.276433E7-E841-42D2-AF21-762F2FEA3B9B,AR,negative +TCGA-AR-A0U4,TCGA-AR-A0U4-01Z-00-DX1.DE722DC5-859D-4866-ADCC-ED98EDBFB588,AR,negative +TCGA-AR-A1AI,TCGA-AR-A1AI-01Z-00-DX1.5EF2A589-4284-45CF-BF0C-169E3A85530C,AR,negative +TCGA-AR-A1AQ,TCGA-AR-A1AQ-01Z-00-DX1.09D5D7FC-0FA8-4176-94B2-995F44D8ED4C,AR,negative +TCGA-AR-A1AR,TCGA-AR-A1AR-01Z-00-DX1.E7B7F6F0-9CC0-4D4F-8C9F-443A74D2BE40,AR,negative +TCGA-AR-A1AY,TCGA-AR-A1AY-01Z-00-DX1.6AC0BE3B-FFC5-4EDA-9E40-B18CAAC52B81,AR,negative +TCGA-AR-A24U,TCGA-AR-A24U-01Z-00-DX1.220FD0D0-0DB6-4E3C-92A0-66CD6509F0AD,AR,negative +TCGA-AR-A256,TCGA-AR-A256-01Z-00-DX1.950D4546-4BF4-4380-9877-51D86A93D755,AR,negative +TCGA-AR-A2LH,TCGA-AR-A2LH-01Z-00-DX1.40D4FD67-FB13-46D9-A202-56A5791EA11C,AR,negative +TCGA-AR-A2LR,TCGA-AR-A2LR-01Z-00-DX1.C686A7D6-0361-49EE-B6CA-672B3086243C,AR,negative +TCGA-AR-A5QQ,TCGA-AR-A5QQ-01Z-00-DX1.4185D6D2-498E-4E71-8CB3-EDB4739C6229,AR,negative +TCGA-B6-A0I1,TCGA-B6-A0I1-01Z-00-DX1.86BEC5E4-A2D1-4039-8D08-0598FA8BCC2B,B6,negative +TCGA-B6-A0I6,TCGA-B6-A0I6-01Z-00-DX1.D597D207-BEB2-4CE5-B315-FE7AD6E9C30B,B6,negative +TCGA-B6-A0IE,TCGA-B6-A0IE-01Z-00-DX1.E6C4E5C0-338C-481D-B691-0385E8D090B1,B6,negative +TCGA-B6-A0IK,TCGA-B6-A0IK-01Z-00-DX1.1640DB44-4AC0-4A34-9E21-4673C4289A99,B6,negative +TCGA-B6-A0IQ,TCGA-B6-A0IQ-01Z-00-DX1.662EA039-825E-41FF-91D6-021EB0E099BD,B6,negative +TCGA-B6-A0RE,TCGA-B6-A0RE-01Z-00-DX1.8933BDB4-EF8D-41FA-9A50-05BB3AB976E5,B6,negative +TCGA-B6-A0RG,TCGA-B6-A0RG-01Z-00-DX1.C83D21A0-EC61-459C-9776-EA3D91469E72,B6,negative +TCGA-B6-A0RN,TCGA-B6-A0RN-01Z-00-DX1.0D02A3FB-D694-4A5B-80C1-CF1469E29BFD,B6,negative +TCGA-B6-A0RS,TCGA-B6-A0RS-01Z-00-DX1.F72B766D-FC49-4876-A93D-D88B05702267,B6,negative +TCGA-B6-A0RT,TCGA-B6-A0RT-01Z-00-DX1.6C2BA663-D61B-404B-B62C-B81747B6A3AC,B6,negative +TCGA-B6-A0RU,TCGA-B6-A0RU-01Z-00-DX1.4977531C-6003-4DA6-9925-30E2DA62C076,B6,negative +TCGA-B6-A0WX,TCGA-B6-A0WX-01Z-00-DX1.1C4F7BCC-8423-4941-9490-0F35038E1878,B6,negative +TCGA-B6-A0X1,TCGA-B6-A0X1-01Z-00-DX1.2A41630E-A8BF-4966-99F5-D5D8036B1759,B6,negative +TCGA-B6-A1KF,TCGA-B6-A1KF-01Z-00-DX1.2E08E830-4216-4CD0-9646-1F489300E11D,B6,negative +TCGA-B6-A1KN,TCGA-B6-A1KN-01Z-00-DX1.E769FD65-E5CE-4CA4-8BBE-6FC54A2ED870,B6,negative +TCGA-BH-A0AV,TCGA-BH-A0AV-01Z-00-DX1.5A686DA4-A29D-4BA7-9BCA-B707F8755E83,BH,negative +TCGA-BH-A0B3,TCGA-BH-A0B3-01Z-00-DX1.90CB0ED5-FBB7-4ABF-93A0-DD88D60D3D55,BH,negative +TCGA-BH-A0B9,TCGA-BH-A0B9-01Z-00-DX1.C23ADB4C-52D4-4DFD-B8E3-156E43F0E645,BH,negative +TCGA-BH-A0BG,TCGA-BH-A0BG-01Z-00-DX1.0838FB7F-8C85-4687-9F70-D136A1063383,BH,negative +TCGA-BH-A0BL,TCGA-BH-A0BL-01Z-00-DX1.D5A413B0-2141-4A6F-A671-6C5EA8641D25,BH,negative +TCGA-BH-A0BW,TCGA-BH-A0BW-01Z-00-DX1.39B0C363-46A2-4FFB-B1C5-C054A86F4A25,BH,negative +TCGA-BH-A0E0,TCGA-BH-A0E0-01Z-00-DX1.01E88024-83DF-4282-ABAF-F4E35807DA06,BH,negative +TCGA-BH-A0E6,TCGA-BH-A0E6-01Z-00-DX1.44DFEFB4-FA05-4246-8D6B-55BA8C3275DB,BH,negative +TCGA-BH-A0EE,TCGA-BH-A0EE-01Z-00-DX1.872EA586-25C8-4835-A138-152DFA1EBF30,BH,negative +TCGA-BH-A0RX,TCGA-BH-A0RX-01Z-00-DX1.B4ABAFB9-6696-4F22-9468-DA56E1AD32D3,BH,negative +TCGA-BH-A0WA,TCGA-BH-A0WA-01Z-00-DX1.C0EB7B73-5529-49EC-B660-3F849B041963,BH,negative +TCGA-BH-A18G,TCGA-BH-A18G-01Z-00-DX1.DB2B5819-CE83-4E07-BD03-2CD9CF2E246C,BH,negative +TCGA-BH-A18Q,TCGA-BH-A18Q-01Z-00-DX1.E89E49C7-D62A-4408-A3D9-19E79FCB249E,BH,negative +TCGA-BH-A18T,TCGA-BH-A18T-01Z-00-DX1.90123A74-F631-4538-AB2E-4076486ADF88,BH,negative +TCGA-BH-A18V,TCGA-BH-A18V-01Z-00-DX1.7797760A-05CD-436E-97DA-31D6201F943B,BH,negative +TCGA-BH-A1EN,TCGA-BH-A1EN-01Z-00-DX1.F535657F-E2FA-4283-87AE-7DAB663B196B,BH,negative +TCGA-BH-A1EW,TCGA-BH-A1EW-01Z-00-DX1.A6A5F9C5-FB97-4DA3-A1EA-1F84BC006681,BH,negative +TCGA-BH-A1F0,TCGA-BH-A1F0-01Z-00-DX1.E1557A67-230A-4337-AEC8-158258A917FF,BH,negative +TCGA-BH-A1F6,TCGA-BH-A1F6-01Z-00-DX1.E83F0DC0-EA2C-4641-81B0-8702B9C5D579,BH,negative +TCGA-BH-A1FC,TCGA-BH-A1FC-01Z-00-DX1.B816A21B-1559-42F5-902A-A1532E59C015,BH,negative +TCGA-BH-A1FJ,TCGA-BH-A1FJ-01Z-00-DX1.BCED4CEF-2B33-45BA-ABC7-29BEE818F4A3,BH,negative +TCGA-BH-A1FU,TCGA-BH-A1FU-01Z-00-DX1.3D195750-8C9E-48C6-92EC-44AC6AB56267,BH,negative +TCGA-BH-A42U,TCGA-BH-A42U-01Z-00-DX1.130D060A-3DD8-407E-B8B0-EBBF743491AE,BH,negative +TCGA-C8-A12L,TCGA-C8-A12L-01Z-00-DX1.01863BE4-F4AB-45DE-894A-46544777C519,C8,negative +TCGA-C8-A12P,TCGA-C8-A12P-01Z-00-DX1.670B5DE8-07B0-4E4C-93FA-FA3DFFCCE50D,C8,negative +TCGA-C8-A12Q,TCGA-C8-A12Q-01Z-00-DX1.CE74E5B7-FD30-4CBE-8716-ECCF2213AAC3,C8,negative +TCGA-C8-A12V,TCGA-C8-A12V-01Z-00-DX1.84B29360-B87B-4648-A697-B6610336C2BB,C8,negative +TCGA-C8-A12Z,TCGA-C8-A12Z-01Z-00-DX1.22616420-7C80-42DE-9C44-0F00327681C6,C8,negative +TCGA-C8-A131,TCGA-C8-A131-01Z-00-DX1.5CB27A29-9951-40B9-B4DB-26A4D2EA89B8,C8,negative +TCGA-C8-A134,TCGA-C8-A134-01Z-00-DX1.78D10BCC-98B3-4587-897D-3E27DA28D2EB,C8,negative +TCGA-C8-A135,TCGA-C8-A135-01Z-00-DX1.B69A605E-B577-4F43-9471-2C95312B05D9,C8,negative +TCGA-C8-A137,TCGA-C8-A137-01Z-00-DX1.87F3775D-A401-4D5E-843F-8FB1D4BE97F8,C8,negative +TCGA-C8-A1HF,TCGA-C8-A1HF-01Z-00-DX1.81EF9A57-54E1-4750-9816-5571FD18A297,C8,negative +TCGA-C8-A1HJ,TCGA-C8-A1HJ-01Z-00-DX1.745159F1-85E7-4CC0-B088-E1CA91916FB8,C8,negative +TCGA-C8-A1HK,TCGA-C8-A1HK-01Z-00-DX1.A15F452B-1260-4106-86D1-05F00082F0CE,C8,negative +TCGA-C8-A26X,TCGA-C8-A26X-01Z-00-DX1.FBD86F91-BAD7-4484-8D50-202340FBF242,C8,negative +TCGA-C8-A26Y,TCGA-C8-A26Y-01Z-00-DX1.166EE604-4EF3-401D-99E9-3A9711316CC4,C8,negative +TCGA-C8-A278,TCGA-C8-A278-01Z-00-DX1.188B3FE0-7B20-401A-A6B7-8F1798018162,C8,negative +TCGA-C8-A27B,TCGA-C8-A27B-01Z-00-DX1.5A8A14E8-6430-4147-9C71-805024E098CB,C8,negative +TCGA-C8-A3M7,TCGA-C8-A3M7-01Z-00-DX1.846C75F1-2E7E-44F7-B21F-C246141558FA,C8,negative +TCGA-C8-A8HP,TCGA-C8-A8HP-01Z-00-DX1.B6EE8271-2E51-4184-9B0C-2ADA69B4A714,C8,negative +TCGA-D8-A13Z,TCGA-D8-A13Z-01Z-00-DX1.517140BB-D34C-42F8-952C-340EF16D382F,D8,negative +TCGA-D8-A142,TCGA-D8-A142-01Z-00-DX1.F6D58989-9120-40C9-918D-6B1650C1A8E8,D8,negative +TCGA-D8-A143,TCGA-D8-A143-01Z-00-DX1.4697FB2F-91D5-4506-AF23-7DE304D44A3F,D8,negative +TCGA-D8-A147,TCGA-D8-A147-01Z-00-DX1.159094F2-BB78-4910-B7AE-3D7CAAB1DAD9,D8,negative +TCGA-D8-A1JA,TCGA-D8-A1JA-01Z-00-DX1.BD43F94A-D5A8-490E-AED4-5F3AB24080FA,D8,negative +TCGA-D8-A1JF,TCGA-D8-A1JF-01Z-00-DX1.224EDA43-F822-4A88-814A-BA7D4C60F8CC,D8,negative +TCGA-D8-A1JG,TCGA-D8-A1JG-01Z-00-DX1.BA6D5CC7-3A9B-4D17-A86A-B159D345A216,D8,negative +TCGA-D8-A1JK,TCGA-D8-A1JK-01Z-00-DX1.3190C919-A403-460D-9F6C-D2AB5FD3FD05,D8,negative +TCGA-D8-A1JL,TCGA-D8-A1JL-01Z-00-DX1.FE3F0C6B-F98A-4036-BF9A-25A8CC66B1FD,D8,negative +TCGA-D8-A1XK,TCGA-D8-A1XK-01Z-00-DX1.41EB1BBC-F230-4E3F-8C4E-CE331CAF1935,D8,negative +TCGA-D8-A1XQ,TCGA-D8-A1XQ-01Z-00-DX1.1A17A5C7-F14B-4AD2-AD5F-D3400D86A366,D8,negative +TCGA-D8-A1XT,TCGA-D8-A1XT-01Z-00-DX1.9E4DEEEB-AB0A-45F6-A69B-65B3E6F22D46,D8,negative +TCGA-D8-A1XW,TCGA-D8-A1XW-01Z-00-DX1.10187A1F-086B-4CD9-AC01-ADA2E435CD34,D8,negative +TCGA-D8-A1XW,TCGA-D8-A1XW-01Z-00-DX2.9849E503-BE3E-417C-ABE8-93A39583DDE0,D8,negative +TCGA-D8-A27F,TCGA-D8-A27F-01Z-00-DX1.C1A87F27-49F1-46BA-B2E2-A89293F7FD0C,D8,negative +TCGA-D8-A27H,TCGA-D8-A27H-01Z-00-DX1.BE9DFDD4-97C5-4327-B1DA-4B34A6F267C5,D8,negative +TCGA-D8-A27M,TCGA-D8-A27M-01Z-00-DX1.3020D223-2400-4A2D-8BFE-08A5B78FE13B,D8,negative +TCGA-E2-A14N,TCGA-E2-A14N-01Z-00-DX1.15F5644F-CA9F-4688-B56E-BCC00CA4769B,E2,negative +TCGA-E2-A14P,TCGA-E2-A14P-01Z-00-DX1.663B02FF-C64B-41A6-8685-FD61CD76F9C6,E2,negative +TCGA-E2-A14R,TCGA-E2-A14R-01Z-00-DX1.DDE62ED5-1FC0-4B3B-A874-95C08B33AB20,E2,negative +TCGA-E2-A14X,TCGA-E2-A14X-01Z-00-DX1.24ADDA43-F127-4A6B-9AAD-2FAD982A853D,E2,negative +TCGA-E2-A150,TCGA-E2-A150-01Z-00-DX1.55643D23-83E6-4058-8EA5-18AAA8547839,E2,negative +TCGA-E2-A158,TCGA-E2-A158-01Z-00-DX1.994C60FE-E651-4224-95E7-4669834F2338,E2,negative +TCGA-E2-A159,TCGA-E2-A159-01Z-00-DX1.C8DC3356-CAE9-4E7A-AEFF-ABB4BE38192B,E2,negative +TCGA-E2-A1AZ,TCGA-E2-A1AZ-01Z-00-DX1.FE769F5F-385B-4DCF-819A-735C98D84FA8,E2,negative +TCGA-E2-A1B0,TCGA-E2-A1B0-01Z-00-DX1.4C50EC91-A62A-450E-88C0-7DCAB8E9724C,E2,negative +TCGA-E2-A1B6,TCGA-E2-A1B6-01Z-00-DX1.8CD458BE-C4F9-4AF3-A927-18C042E9B4B7,E2,negative +TCGA-E2-A1II,TCGA-E2-A1II-01Z-00-DX1.7F782477-0F92-4C57-8735-A1E3F95A6B94,E2,negative +TCGA-E2-A1L7,TCGA-E2-A1L7-01Z-00-DX1.BE796CD2-2E81-44E8-8CA2-85B4D2A31B64,E2,negative +TCGA-E2-A1LB,TCGA-E2-A1LB-01Z-00-DX1.B4AFDB08-BA09-4F6E-806E-4054B790AB8E,E2,negative +TCGA-E2-A1LE,TCGA-E2-A1LE-01Z-00-DX1.22856B2A-FBAA-4530-AEEC-E8F77BDA7F7F,E2,negative +TCGA-E2-A1LH,TCGA-E2-A1LH-01Z-00-DX1.F85384B7-1EBF-4F57-A45B-4A668B68E535,E2,negative +TCGA-E2-A1LI,TCGA-E2-A1LI-01Z-00-DX1.503dd2fa-23ef-4b11-8aab-301e069eaa88,E2,negative +TCGA-E2-A1LK,TCGA-E2-A1LK-01Z-00-DX1.5EBAA1F4-F1B4-4938-A51F-0246621BB0ED,E2,negative +TCGA-E2-A1LL,TCGA-E2-A1LL-01Z-00-DX1.D1AA1613-902C-4579-9612-48A76042B0AE,E2,negative +TCGA-E2-A1LS,TCGA-E2-A1LS-01Z-00-DX1.9a43a58a-df40-4375-8456-ef0071bc9eb9,E2,negative +TCGA-E2-A573,TCGA-E2-A573-01Z-00-DX1.D6633BC2-524D-4153-952F-6B1D8D067370,E2,negative +TCGA-E2-A574,TCGA-E2-A574-01Z-00-DX1.60341091-B118-4F20-9ADB-FB2886790B0E,E2,negative +TCGA-E9-A1N8,TCGA-E9-A1N8-01Z-00-DX1.1243AB1C-75A3-4A5E-9674-53992E6DD377,E9,negative +TCGA-E9-A1N9,TCGA-E9-A1N9-01Z-00-DX1.982C83A6-0E06-47B3-A7C7-1773A2CD1771,E9,negative +TCGA-E9-A1NC,TCGA-E9-A1NC-01Z-00-DX1.20edf036-8ba6-4187-a74c-124fc39f5aa1,E9,negative +TCGA-E9-A1ND,TCGA-E9-A1ND-01Z-00-DX1.5C2F5250-A3B2-4B97-9E11-56CF3C7CA2DB,E9,negative +TCGA-E9-A22G,TCGA-E9-A22G-01Z-00-DX1.2b47e321-7910-40ec-ad1c-c4e08ed26334,E9,negative +TCGA-E9-A5FL,TCGA-E9-A5FL-01Z-00-DX1.FB810D6A-303E-45DF-BEF1-D9CE83B4417E,E9,negative +TCGA-EW-A1OV,TCGA-EW-A1OV-01Z-00-DX1.93698123-5B34-4163-848B-2D75A5F7B001,EW,negative +TCGA-EW-A1OW,TCGA-EW-A1OW-01Z-00-DX1.97888686-EBB6-4B13-AB5D-452F475E865B,EW,negative +TCGA-EW-A1P1,TCGA-EW-A1P1-01Z-00-DX1.4B670029-4B3B-4D76-8EA4-F4F29EEF9E37,EW,negative +TCGA-EW-A1P4,TCGA-EW-A1P4-01Z-00-DX1.3E9AE553-83D4-4B09-AB7F-D096BCE3BC4D,EW,negative +TCGA-EW-A1P7,TCGA-EW-A1P7-01Z-00-DX1.97575C9F-C318-45A5-A4B7-1A902B93FA3F,EW,negative +TCGA-EW-A1P8,TCGA-EW-A1P8-01Z-00-DX1.E9852193-8CDD-49EF-B49B-DA6931198F0D,EW,negative +TCGA-EW-A1PB,TCGA-EW-A1PB-01Z-00-DX1.B2E9B88F-C371-4821-9B7B-8A63BB120D95,EW,negative +TCGA-EW-A1PH,TCGA-EW-A1PH-01Z-00-DX1.77e0f907-fe59-4dc2-b582-6f10330a9c01,EW,negative +TCGA-EW-A2FR,TCGA-EW-A2FR-01Z-00-DX1.FF521BBF-F5BA-4B03-9CA1-101A337F4B45,EW,negative +TCGA-EW-A3U0,TCGA-EW-A3U0-01Z-00-DX1.526F60F9-D1EB-4613-B428-18A1FC0BAAE2,EW,negative +TCGA-EW-A6SB,TCGA-EW-A6SB-01Z-00-DX1.D56E1922-01A9-4AEE-AB95-D69447DD13EE,EW,negative +TCGA-EW-A6SD,TCGA-EW-A6SD-01Z-00-DX1.32D8240E-2076-492B-BB95-300A9FCA96E7,EW,negative +TCGA-GM-A2DB,TCGA-GM-A2DB-01Z-00-DX1.9EE36AA6-2594-44C7-B05C-91A0AEC7E511,GM,negative +TCGA-GM-A2DD,TCGA-GM-A2DD-01Z-00-DX1.01332FF3-7488-41B7-A515-7841715DE92B,GM,negative +TCGA-GM-A2DF,TCGA-GM-A2DF-01Z-00-DX1.CD0BE6D7-2DB3-4193-84CC-F9BE7BF18CC2,GM,negative +TCGA-GM-A2DH,TCGA-GM-A2DH-01Z-00-DX1.5790F15F-2A0F-4929-8FE4-18A557007989,GM,negative +TCGA-GM-A2DI,TCGA-GM-A2DI-01Z-00-DX1.5E9715EA-9D2B-49D0-8F1D-972CF24B1960,GM,negative +TCGA-GM-A3NW,TCGA-GM-A3NW-01Z-00-DX1.3B0324F0-4007-4390-AF26-3789A50684E5,GM,negative +TCGA-GM-A3XL,TCGA-GM-A3XL-01Z-00-DX1.CCE8AA1D-9194-4E49-9546-DBF25A35847C,GM,negative +TCGA-HN-A2NL,TCGA-HN-A2NL-01Z-00-DX1.C2EAF378-4B37-4C1C-BB0F-18FAC62EEC13,HN,negative +TCGA-LD-A9QF,TCGA-LD-A9QF-01Z-00-DX1.092108DF-1A60-459E-ACE6-5A71826A98D1,LD,negative +TCGA-LL-A441,TCGA-LL-A441-01Z-00-DX1.33B13078-BB14-4C42-88A6-7460AD2BCD00,LL,negative +TCGA-LL-A5YO,TCGA-LL-A5YO-01Z-00-DX1.B5B6DFDB-1020-41FF-AA50-9C633E17DE5F,LL,negative +TCGA-LL-A6FR,TCGA-LL-A6FR-01Z-00-DX1.107010FD-8EDD-406D-B4DE-DEC9E85576FF,LL,negative +TCGA-LL-A73Y,TCGA-LL-A73Y-01Z-00-DX1.50C20931-3AA9-40B4-8A73-56B1976423A8,LL,negative +TCGA-LL-A740,TCGA-LL-A740-01Z-00-DX1.757D94A5-EF0F-4A0E-99A9-8809B66438DA,LL,negative +TCGA-OL-A5D6,TCGA-OL-A5D6-01Z-00-DX1.6B11331B-4A0D-4E13-B054-A5C7A6FC3AAC,OL,negative +TCGA-OL-A5D7,TCGA-OL-A5D7-01Z-00-DX1.A4A45393-9AE1-4370-8B92-CB85CDB04934,OL,negative +TCGA-OL-A66I,TCGA-OL-A66I-01Z-00-DX1.8CE9DCAB-98D3-4163-94AC-1557D86C1E25,OL,negative +TCGA-OL-A66P,TCGA-OL-A66P-01Z-00-DX1.5ADD0D6D-37C6-4BC9-8C2B-64DB18BE99B3,OL,negative +TCGA-OL-A6VO,TCGA-OL-A6VO-01Z-00-DX1.291D54D6-EBAF-4622-BD42-97AA5997F014,OL,negative +TCGA-S3-AA10,TCGA-S3-AA10-01Z-00-DX1.C0468882-0DD8-4FC5-8C2F-E18BE8000F69,S3,negative +TCGA-S3-AA15,TCGA-S3-AA15-01Z-00-DX1.A2456A4A-E6E8-4429-8F09-B997AA497BB0,S3,negative +TCGA-UU-A93S,TCGA-UU-A93S-01Z-00-DX1.C4809779-DF5F-4F5D-A78C-B7F95F2D050F,UU,negative +TCGA-3C-AALI,TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291,3C,positive +TCGA-3C-AALI,TCGA-3C-AALI-01Z-00-DX2.CF4496E0-AB52-4F3E-BDF5-C34833B91B7C,3C,positive +TCGA-3C-AALJ,TCGA-3C-AALJ-01Z-00-DX1.777C0957-255A-42F0-9EEB-A3606BCF0C96,3C,positive +TCGA-3C-AALJ,TCGA-3C-AALJ-01Z-00-DX2.62DFE56B-B84C-40F9-9625-FCB55767B70D,3C,positive +TCGA-3C-AALK,TCGA-3C-AALK-01Z-00-DX1.4E6EB156-BB19-410F-878F-FC0EA7BD0B53,3C,positive +TCGA-4H-AAAK,TCGA-4H-AAAK-01Z-00-DX1.ABF1B042-1970-4E28-8671-43AAD393D2F9,4H,positive +TCGA-5L-AAT1,TCGA-5L-AAT1-01Z-00-DX1.F3449A5B-2AC4-4ED7-BF44-4C8946CDB47D,5L,positive +TCGA-5T-A9QA,TCGA-5T-A9QA-01Z-00-DX1.B4212117-E0A7-4EF2-B324-8396042ACEC1,5T,positive +TCGA-A1-A0SB,TCGA-A1-A0SB-01Z-00-DX1.B34C267B-CAAA-4AB6-AD5C-276C26F997A1,A1,positive +TCGA-A1-A0SD,TCGA-A1-A0SD-01Z-00-DX1.DB17BFA9-D951-42A8-91D2-F4C2EBC6EB9F,A1,positive +TCGA-A1-A0SE,TCGA-A1-A0SE-01Z-00-DX1.04B09232-C6C4-46EF-AA2C-41D078D0A80A,A1,positive +TCGA-A1-A0SF,TCGA-A1-A0SF-01Z-00-DX1.7F252D89-EA78-419F-A969-1B7313D77499,A1,positive +TCGA-A1-A0SI,TCGA-A1-A0SI-01Z-00-DX1.AB717348-F964-4F29-BBE2-972B7C640432,A1,positive +TCGA-A1-A0SJ,TCGA-A1-A0SJ-01Z-00-DX1.C196FA66-C376-4016-86BC-3D8E745B3A51,A1,positive +TCGA-A1-A0SM,TCGA-A1-A0SM-01Z-00-DX1.AD503DBD-4D93-4476-B467-F091254FDF78,A1,positive +TCGA-A1-A0SN,TCGA-A1-A0SN-01Z-00-DX1.5E9B85AE-AFB7-41DC-8A1B-BD6DA39B6540,A1,positive +TCGA-A1-A0SQ,TCGA-A1-A0SQ-01Z-00-DX1.36071264-3407-4224-BCBB-2ED00294569A,A1,positive +TCGA-A2-A04N,TCGA-A2-A04N-01Z-00-DX1.9E9B7DB0-1CF1-4631-8292-A9DBCA0BD37C,A2,positive +TCGA-A2-A04R,TCGA-A2-A04R-01Z-00-DX1.BE3661A3-A2A0-4248-B7D5-D8986E529B6C,A2,positive +TCGA-A2-A04V,TCGA-A2-A04V-01Z-00-DX1.EB430DDB-8FE9-4302-8829-53DBAD09E79F,A2,positive +TCGA-A2-A04X,TCGA-A2-A04X-01Z-00-DX1.E01A4522-67B3-4FEF-BD6B-99DFED9E7C85,A2,positive +TCGA-A2-A04Y,TCGA-A2-A04Y-01Z-00-DX1.4DC97AD6-4806-4A3A-A998-FD36F93590A4,A2,positive +TCGA-A2-A0CK,TCGA-A2-A0CK-01Z-00-DX1.C3226BD1-11B0-4034-946A-BAB42958CE85,A2,positive +TCGA-A2-A0CL,TCGA-A2-A0CL-01Z-00-DX1.5342E971-DCD2-42C4-B4FF-E6942A95829E,A2,positive +TCGA-A2-A0CO,TCGA-A2-A0CO-01Z-00-DX1.B191DADE-7DEA-4575-AED7-1C52BE15B8D5,A2,positive +TCGA-A2-A0CP,TCGA-A2-A0CP-01Z-00-DX1.ECFD263C-BB17-4ADA-8F2C-654C2AA4C45F,A2,positive +TCGA-A2-A0CQ,TCGA-A2-A0CQ-01Z-00-DX1.4E5FB4E5-A08C-4C87-A3BE-0640A95AE649,A2,positive +TCGA-A2-A0CR,TCGA-A2-A0CR-01Z-00-DX1.F7D36CA5-A4C6-443A-AEED-5B23CD924FA3,A2,positive +TCGA-A2-A0CS,TCGA-A2-A0CS-01Z-00-DX1.3986B545-63E8-4727-BCC1-701DE947D1FB,A2,positive +TCGA-A2-A0CT,TCGA-A2-A0CT-01Z-00-DX1.A8564130-49CF-4F5B-B5AB-F4D1A10479FF,A2,positive +TCGA-A2-A0CU,TCGA-A2-A0CU-01Z-00-DX1.5B77D21C-A497-478B-9752-3730322AD9ED,A2,positive +TCGA-A2-A0CV,TCGA-A2-A0CV-01Z-00-DX1.B02D017A-61CD-45FE-BB31-705CD127DDAB,A2,positive +TCGA-A2-A0CW,TCGA-A2-A0CW-01Z-00-DX1.8E313A22-B0E8-44CF-ADEA-8BF29BA23FFE,A2,positive +TCGA-A2-A0CX,TCGA-A2-A0CX-01Z-00-DX1.F07C75AB-E568-45CB-B497-37C712490393,A2,positive +TCGA-A2-A0CY,TCGA-A2-A0CY-01Z-00-DX1.8815F011-317B-4BB7-A48F-64EABD0E558B,A2,positive +TCGA-A2-A0CZ,TCGA-A2-A0CZ-01Z-00-DX1.A433A414-4F1B-4F99-8FD9-E64803F5E042,A2,positive +TCGA-A2-A0D3,TCGA-A2-A0D3-01Z-00-DX1.D18CE1C0-291D-41C1-8F71-2A0551F8C661,A2,positive +TCGA-A2-A0D4,TCGA-A2-A0D4-01Z-00-DX1.35E43827-DABC-4685-A62A-333672923349,A2,positive +TCGA-A2-A0EM,TCGA-A2-A0EM-01Z-00-DX1.305CF011-7451-4880-9A5D-AB4432CF53A5,A2,positive +TCGA-A2-A0EN,TCGA-A2-A0EN-01Z-00-DX1.F43A5D17-9267-4963-993A-93AC91BDC031,A2,positive +TCGA-A2-A0EO,TCGA-A2-A0EO-01Z-00-DX1.D7A09E23-2CAB-4B3B-814C-BF0BA75B7589,A2,positive +TCGA-A2-A0EP,TCGA-A2-A0EP-01Z-00-DX1.1180C406-5C18-4373-8621-1B7B70875113,A2,positive +TCGA-A2-A0ER,TCGA-A2-A0ER-01Z-00-DX1.18123C1B-A0CC-4957-9373-5FFAB985C2E8,A2,positive +TCGA-A2-A0ES,TCGA-A2-A0ES-01Z-00-DX1.5652EE30-48C1-4EC1-AD77-8C72DFCB9A97,A2,positive +TCGA-A2-A0ET,TCGA-A2-A0ET-01Z-00-DX1.41E86BB9-EA52-4615-94E7-0F09DDB094F4,A2,positive +TCGA-A2-A0EU,TCGA-A2-A0EU-01Z-00-DX1.13335BEB-63D4-469B-A389-8A079A096129,A2,positive +TCGA-A2-A0EV,TCGA-A2-A0EV-01Z-00-DX1.EA8C5594-BA4F-47A8-949B-D536E00E62C9,A2,positive +TCGA-A2-A0EW,TCGA-A2-A0EW-01Z-00-DX1.F24495CB-63D8-483F-9834-F761E3F16BF0,A2,positive +TCGA-A2-A0EX,TCGA-A2-A0EX-01Z-00-DX1.F4C293E9-9BB2-48A1-8080-5F3082E219C5,A2,positive +TCGA-A2-A0EY,TCGA-A2-A0EY-01Z-00-DX1.2F2428B3-0767-48E0-AC22-443C244CBD16,A2,positive +TCGA-A2-A0SU,TCGA-A2-A0SU-01Z-00-DX1.22420EE2-4FEB-42F3-9547-4739F0F73D50,A2,positive +TCGA-A2-A0SV,TCGA-A2-A0SV-01Z-00-DX1.F5645E47-3540-4753-AF5D-F5709BD8DFC1,A2,positive +TCGA-A2-A0SW,TCGA-A2-A0SW-01Z-00-DX1.E1EA0407-B831-4D75-826E-80B82B821797,A2,positive +TCGA-A2-A0SY,TCGA-A2-A0SY-01Z-00-DX1.279A5479-E183-4F79-AF40-50BF1834BA4A,A2,positive +TCGA-A2-A0T3,TCGA-A2-A0T3-01Z-00-DX1.5E96BC87-F4FB-4ABA-8D81-FAB7F4A80661,A2,positive +TCGA-A2-A0T4,TCGA-A2-A0T4-01Z-00-DX1.AC61ED08-2E42-4D4A-BB81-746D3FEBB653,A2,positive +TCGA-A2-A0T5,TCGA-A2-A0T5-01Z-00-DX1.128C288B-B357-439B-A8D4-8E7DEBF73E4E,A2,positive +TCGA-A2-A0T6,TCGA-A2-A0T6-01Z-00-DX1.207A3883-5B41-40EB-95A1-D64C9610650B,A2,positive +TCGA-A2-A0T7,TCGA-A2-A0T7-01Z-00-DX1.EA4DF9B5-8D04-4BCC-9ECB-CB8CB8ACBE1C,A2,positive +TCGA-A2-A0YC,TCGA-A2-A0YC-01Z-00-DX1.F01F2268-17D8-45FD-8917-FD89454D0709,A2,positive +TCGA-A2-A0YD,TCGA-A2-A0YD-01Z-00-DX1.B81FF541-F154-4C49-950B-B6CB723369E2,A2,positive +TCGA-A2-A0YF,TCGA-A2-A0YF-01Z-00-DX1.6166E995-0669-43D6-B9C7-FE39BCE529CF,A2,positive +TCGA-A2-A0YG,TCGA-A2-A0YG-01Z-00-DX1.89A39319-F880-47DE-B311-DA1F6A64B6F3,A2,positive +TCGA-A2-A0YH,TCGA-A2-A0YH-01Z-00-DX1.FBA5E711-8275-45B3-BC20-5A786EE23548,A2,positive +TCGA-A2-A0YI,TCGA-A2-A0YI-01Z-00-DX1.1CF2EC2D-C722-467F-8832-409B823E8D8F,A2,positive +TCGA-A2-A0YJ,TCGA-A2-A0YJ-01Z-00-DX1.8135C74E-DAA8-4C8E-AF14-A4B5B57695BE,A2,positive +TCGA-A2-A0YK,TCGA-A2-A0YK-01Z-00-DX1.FB23FC30-B3B2-452A-A6B0-94129B333207,A2,positive +TCGA-A2-A0YL,TCGA-A2-A0YL-01Z-00-DX1.69A438C7-B1E0-4990-B1E6-586C551DC79C,A2,positive +TCGA-A2-A0YT,TCGA-A2-A0YT-01Z-00-DX1.63CA3E36-B3E4-4D3F-9F80-2703970DBE5B,A2,positive +TCGA-A2-A1FV,TCGA-A2-A1FV-01Z-00-DX1.F70ACD9A-7819-4296-BD28-4D24BBA8AFF4,A2,positive +TCGA-A2-A1FW,TCGA-A2-A1FW-01Z-00-DX1.DC2BC5D7-E57B-444E-833F-9A1B6E11DE6D,A2,positive +TCGA-A2-A1FX,TCGA-A2-A1FX-01Z-00-DX1.35B93F3A-9959-4D72-A702-08B27563D24F,A2,positive +TCGA-A2-A1FZ,TCGA-A2-A1FZ-01Z-00-DX1.0BAAEF41-DCA4-4677-9A27-09E990033FA6,A2,positive +TCGA-A2-A1G0,TCGA-A2-A1G0-01Z-00-DX1.9ECB0B8A-EF4E-45A9-82AC-EF36375DEF65,A2,positive +TCGA-A2-A1G4,TCGA-A2-A1G4-01Z-00-DX1.0274DF5B-5920-419A-B778-2F04D9C602F2,A2,positive +TCGA-A2-A259,TCGA-A2-A259-01Z-00-DX1.7289CD72-CB74-41D4-B4AC-4EA5FDFEC666,A2,positive +TCGA-A2-A25A,TCGA-A2-A25A-01Z-00-DX1.6D311E39-9F96-4775-B9DA-8D034D1AD1DC,A2,positive +TCGA-A2-A25B,TCGA-A2-A25B-01Z-00-DX1.58D7BEDE-5558-4A9E-A95E-DDF24C9267EF,A2,positive +TCGA-A2-A25C,TCGA-A2-A25C-01Z-00-DX1.F8E6044A-435E-42D5-94FF-C0F572F1ED99,A2,positive +TCGA-A2-A25D,TCGA-A2-A25D-01Z-00-DX1.41DADDB8-3E3F-4F8F-8BE7-C43F8FBCFD2B,A2,positive +TCGA-A2-A25E,TCGA-A2-A25E-01Z-00-DX1.46D1950B-0BAF-4A44-B58F-CD738E74CB27,A2,positive +TCGA-A2-A3KC,TCGA-A2-A3KC-01Z-00-DX1.2532878B-49E2-48D5-82D5-00730C90EEF8,A2,positive +TCGA-A2-A3KD,TCGA-A2-A3KD-01Z-00-DX1.E400D529-A0D1-4C78-AD5C-33EB0508B128,A2,positive +TCGA-A2-A3XV,TCGA-A2-A3XV-01Z-00-DX1.4B58B7CD-B3F6-4C31-863F-924D291EF17C,A2,positive +TCGA-A2-A3XW,TCGA-A2-A3XW-01Z-00-DX1.45F5F36F-5503-4A38-AF37-E526915A8DBE,A2,positive +TCGA-A2-A3Y0,TCGA-A2-A3Y0-01Z-00-DX1.A1EDDF82-FFAA-44A4-A749-7FB6F8133CC0,A2,positive +TCGA-A2-A4RW,TCGA-A2-A4RW-01Z-00-DX1.4AF8656E-2478-4895-8C59-B0C969F3E474,A2,positive +TCGA-A2-A4RX,TCGA-A2-A4RX-01Z-00-DX1.E2C96C9D-5023-4F41-8CC8-24C6E015B252,A2,positive +TCGA-A2-A4RY,TCGA-A2-A4RY-01Z-00-DX1.143E32FC-E10F-4590-9C72-32094B24EF8C,A2,positive +TCGA-A2-A4S0,TCGA-A2-A4S0-01Z-00-DX1.5878187D-325F-4274-BE41-86BB2BA5F841,A2,positive +TCGA-A2-A4S1,TCGA-A2-A4S1-01Z-00-DX1.C69B14F6-5021-4FD8-8F2D-70227A5DA2B1,A2,positive +TCGA-A2-A4S2,TCGA-A2-A4S2-01Z-00-DX1.D779B63B-8817-43F1-A8F4-236C82DB78F8,A2,positive +TCGA-A2-A4S3,TCGA-A2-A4S3-01Z-00-DX1.50699E65-E536-4C9B-93A7-4C9894624A29,A2,positive +TCGA-A7-A0CD,TCGA-A7-A0CD-01Z-00-DX1.F045B9C8-049C-41BF-8432-EF89F236D34D,A7,positive +TCGA-A7-A0CG,TCGA-A7-A0CG-01Z-00-DX1.D77019C2-96B1-4EF5-A61E-5F2D5B8D9852,A7,positive +TCGA-A7-A0CG,TCGA-A7-A0CG-01Z-00-DX2.81138E90-373A-4C12-B9CA-6EEF5F84A5B5,A7,positive +TCGA-A7-A0CJ,TCGA-A7-A0CJ-01Z-00-DX1.E26F2F62-D688-4373-BB7B-790A06734E49,A7,positive +TCGA-A7-A0CJ,TCGA-A7-A0CJ-01Z-00-DX2.4B591117-5FC9-4B43-8A45-444CCCABC666,A7,positive +TCGA-A7-A0D9,TCGA-A7-A0D9-01Z-00-DX1.FBC3B90F-C58B-4476-8354-0AF9248324E3,A7,positive +TCGA-A7-A0D9,TCGA-A7-A0D9-01Z-00-DX2.66CD9ED8-223B-4AC8-AA1A-2481FB0C47B3,A7,positive +TCGA-A7-A0DB,TCGA-A7-A0DB-01Z-00-DX1.9CE855BC-0C37-43FB-8806-6625E176BE2E,A7,positive +TCGA-A7-A0DB,TCGA-A7-A0DB-01Z-00-DX2.6C6A5F9C-294F-4A86-A0F1-B68D4729B535,A7,positive +TCGA-A7-A13E,TCGA-A7-A13E-01Z-00-DX1.891954FF-316A-4562-AA14-429631944F22,A7,positive +TCGA-A7-A13E,TCGA-A7-A13E-01Z-00-DX2.1E1262AE-A32D-4814-94A5-D951CA8BA35D,A7,positive +TCGA-A7-A13F,TCGA-A7-A13F-01Z-00-DX1.AB4ADDD3-8E1C-4AB9-813E-D137A2CFB950,A7,positive +TCGA-A7-A13F,TCGA-A7-A13F-01Z-00-DX2.8CC23BAD-B3CC-4BBC-832A-A3879D6EF62D,A7,positive +TCGA-A7-A13G,TCGA-A7-A13G-01Z-00-DX1.C258C545-8C1F-41D4-846F-962A746CBDFB,A7,positive +TCGA-A7-A13G,TCGA-A7-A13G-01Z-00-DX2.72EF429E-75A7-4D1B-AFFC-8767CB213CDA,A7,positive +TCGA-A7-A13H,TCGA-A7-A13H-01Z-00-DX1.FCB33A3C-6209-474C-83C1-02551F242937,A7,positive +TCGA-A7-A13H,TCGA-A7-A13H-01Z-00-DX2.6C937651-4D3E-475D-99F6-4A9FD25CD48B,A7,positive +TCGA-A7-A26E,TCGA-A7-A26E-01Z-00-DX1.BA4A7E28-0563-4C23-82D0-AB34A2E79AE3,A7,positive +TCGA-A7-A26H,TCGA-A7-A26H-01Z-00-DX1.3344CFD3-5A19-4B01-BEB2-AB89F83FD53A,A7,positive +TCGA-A7-A26J,TCGA-A7-A26J-01Z-00-DX1.86A92FBD-F346-4206-A1BE-0CBA1596135F,A7,positive +TCGA-A7-A2KD,TCGA-A7-A2KD-01Z-00-DX1.BFF2DFC8-1D95-4DC2-98FC-5BD7D7C6CDF5,A7,positive +TCGA-A7-A3IY,TCGA-A7-A3IY-01Z-00-DX1.B4E75265-3B2A-456C-996A-F95B5A645BF8,A7,positive +TCGA-A7-A3IZ,TCGA-A7-A3IZ-01Z-00-DX1.7437130B-DF49-40C8-ABCD-FF9364B9B1B0,A7,positive +TCGA-A7-A3J0,TCGA-A7-A3J0-01Z-00-DX1.59DB32BC-F59C-43B3-8DDD-4A728EBA8AB8,A7,positive +TCGA-A7-A3J1,TCGA-A7-A3J1-01Z-00-DX1.92895B92-5F95-4EA3-9237-4CAF63CDE3FF,A7,positive +TCGA-A7-A3RF,TCGA-A7-A3RF-01Z-00-DX1.E60647BC-229D-4409-A564-EE0945F912EE,A7,positive +TCGA-A7-A426,TCGA-A7-A426-01Z-00-DX1.3058E873-9442-4872-80DA-E8A5B35054D2,A7,positive +TCGA-A7-A4SA,TCGA-A7-A4SA-01Z-00-DX1.B48FFD3E-9B45-4CEA-ACEE-5B08058CCF5F,A7,positive +TCGA-A7-A4SB,TCGA-A7-A4SB-01Z-00-DX1.E8925605-5189-441E-A719-92CFC1118B0B,A7,positive +TCGA-A7-A4SC,TCGA-A7-A4SC-01Z-00-DX1.171E89D2-2D64-4C3F-A42C-B1167654858B,A7,positive +TCGA-A7-A4SF,TCGA-A7-A4SF-01Z-00-DX1.CDCFD4BC-4363-4CF2-95F5-4922E04C3B9D,A7,positive +TCGA-A7-A56D,TCGA-A7-A56D-01Z-00-DX1.87D9CCF4-04EE-46D2-B7FC-A83561D7752C,A7,positive +TCGA-A7-A5ZW,TCGA-A7-A5ZW-01Z-00-DX1.5F571EE3-A1A3-4604-BE89-1A46E0EC508B,A7,positive +TCGA-A7-A5ZX,TCGA-A7-A5ZX-01Z-00-DX1.A4C651F0-DB8E-4534-84A2-AD875F62209F,A7,positive +TCGA-A7-A6VX,TCGA-A7-A6VX-01Z-00-DX1.F74DA243-C65A-4997-BCA0-F1C89675978C,A7,positive +TCGA-A8-A06N,TCGA-A8-A06N-01Z-00-DX1.E25E65F8-DFE0-47CF-9FAB-B82EE8E321F7,A8,positive +TCGA-A8-A06O,TCGA-A8-A06O-01Z-00-DX1.FA4495B2-5B13-4448-ADCB-EF5316E0955B,A8,positive +TCGA-A8-A06P,TCGA-A8-A06P-01Z-00-DX1.37660D0F-1595-43C5-9D30-58D6CB93B52C,A8,positive +TCGA-A8-A06Q,TCGA-A8-A06Q-01Z-00-DX1.622AC6E4-DB1A-4C8D-8185-52D4FD1F30B4,A8,positive +TCGA-A8-A06R,TCGA-A8-A06R-01Z-00-DX1.41476D0D-BA72-4FB8-B143-9EB679F26D28,A8,positive +TCGA-A8-A06T,TCGA-A8-A06T-01Z-00-DX1.BA8B8FC3-6169-48C1-BE1F-37F140CB4D3B,A8,positive +TCGA-A8-A06U,TCGA-A8-A06U-01Z-00-DX1.56070AB9-A73E-44B8-8051-D93E8F881BBE,A8,positive +TCGA-A8-A06X,TCGA-A8-A06X-01Z-00-DX1.21F19EDD-ABCB-4398-B210-D795BB7A34E3,A8,positive +TCGA-A8-A06Y,TCGA-A8-A06Y-01Z-00-DX1.3BC58446-524E-43F3-AAE6-FBC4AA543E66,A8,positive +TCGA-A8-A06Z,TCGA-A8-A06Z-01Z-00-DX1.2BC8B71D-E01B-4CC1-B5E7-09BB54A5E1F8,A8,positive +TCGA-A8-A075,TCGA-A8-A075-01Z-00-DX1.8E06AF51-951F-48E8-934E-42A455F65E5F,A8,positive +TCGA-A8-A076,TCGA-A8-A076-01Z-00-DX1.BAAC4F9C-0C04-42FB-95F4-D765305089B8,A8,positive +TCGA-A8-A079,TCGA-A8-A079-01Z-00-DX1.DC32BC20-A3BC-43B3-B69F-1E71E289EA34,A8,positive +TCGA-A8-A07B,TCGA-A8-A07B-01Z-00-DX1.950B5E4E-C5F0-4445-9F88-E8C32CDFE6DA,A8,positive +TCGA-A8-A07E,TCGA-A8-A07E-01Z-00-DX1.AC684481-979A-46F9-91D7-C56CE85992F2,A8,positive +TCGA-A8-A07F,TCGA-A8-A07F-01Z-00-DX1.E6FE5B3A-A412-4E7B-A710-7DFD3596F4E3,A8,positive +TCGA-A8-A07G,TCGA-A8-A07G-01Z-00-DX1.37E8A762-8141-4BE6-935A-B3DCB712BB4A,A8,positive +TCGA-A8-A07I,TCGA-A8-A07I-01Z-00-DX1.37E7BB2E-8210-4216-B75F-0FD06D6C9AE3,A8,positive +TCGA-A8-A07J,TCGA-A8-A07J-01Z-00-DX1.FE16248C-D890-4750-A9FC-5F72BE8D8B85,A8,positive +TCGA-A8-A07L,TCGA-A8-A07L-01Z-00-DX1.D9E1051F-724E-4E88-940A-C97005DC5BEB,A8,positive +TCGA-A8-A07P,TCGA-A8-A07P-01Z-00-DX1.2C7C75EF-EEE2-4A42-994C-A1A40850C87A,A8,positive +TCGA-A8-A07S,TCGA-A8-A07S-01Z-00-DX1.34AFF327-E7FD-49E1-B918-0C09129581FB,A8,positive +TCGA-A8-A07W,TCGA-A8-A07W-01Z-00-DX1.5970CF33-D675-4AF4-800F-7182AA1A44A6,A8,positive +TCGA-A8-A07Z,TCGA-A8-A07Z-01Z-00-DX1.713945D9-6855-456D-87FB-EF8AF8980F51,A8,positive +TCGA-A8-A081,TCGA-A8-A081-01Z-00-DX1.06D727F7-4B41-419B-A251-3D804235ACE5,A8,positive +TCGA-A8-A082,TCGA-A8-A082-01Z-00-DX1.5F82E38D-0DFD-4C2C-BC05-8F0EFEFC507C,A8,positive +TCGA-A8-A083,TCGA-A8-A083-01Z-00-DX1.AD121BA9-B5FA-4A03-869B-8FE2C5014CE0,A8,positive +TCGA-A8-A084,TCGA-A8-A084-01Z-00-DX1.2B52D1B8-5AD4-4BD6-ADF7-9D65B8EE2621,A8,positive +TCGA-A8-A085,TCGA-A8-A085-01Z-00-DX1.0DA29016-AEA4-4E5F-B679-83095A79F5B3,A8,positive +TCGA-A8-A086,TCGA-A8-A086-01Z-00-DX1.E5FD8C53-2068-4018-B3C0-08EED253ACD2,A8,positive +TCGA-A8-A08A,TCGA-A8-A08A-01Z-00-DX1.3BB777A0-8652-4483-8DD6-8EB88AB0A8EF,A8,positive +TCGA-A8-A08B,TCGA-A8-A08B-01Z-00-DX1.416316FD-E8BF-402C-939A-08920F08A181,A8,positive +TCGA-A8-A08C,TCGA-A8-A08C-01Z-00-DX1.0602211C-1098-4711-AA41-8F56B11DB36F,A8,positive +TCGA-A8-A08F,TCGA-A8-A08F-01Z-00-DX1.9D2222F6-1425-46FD-BC94-728EF835FAC2,A8,positive +TCGA-A8-A08G,TCGA-A8-A08G-01Z-00-DX1.C3BB7FEC-91B7-4B45-8DFC-36CABCC0FD57,A8,positive +TCGA-A8-A08H,TCGA-A8-A08H-01Z-00-DX1.6AA6A530-3BD3-4582-9D6B-8AE3B4A73948,A8,positive +TCGA-A8-A08I,TCGA-A8-A08I-01Z-00-DX1.96A0065E-5430-40F6-835E-1A7238A79DDA,A8,positive +TCGA-A8-A08J,TCGA-A8-A08J-01Z-00-DX1.4607286A-033B-4C44-8306-50DB9C596166,A8,positive +TCGA-A8-A08L,TCGA-A8-A08L-01Z-00-DX1.0FC652B1-336D-4198-93FE-58D2214866D3,A8,positive +TCGA-A8-A08O,TCGA-A8-A08O-01Z-00-DX1.BC87C01D-F081-41CA-939A-875C61588E88,A8,positive +TCGA-A8-A08P,TCGA-A8-A08P-01Z-00-DX1.431A1290-9CF4-4CE7-A9F0-13F9ADA46ADA,A8,positive +TCGA-A8-A08S,TCGA-A8-A08S-01Z-00-DX1.C0E044C2-FC3F-4E3D-A779-A725FF375F21,A8,positive +TCGA-A8-A08T,TCGA-A8-A08T-01Z-00-DX1.ABA1ABF2-DDB1-4E31-89DC-0F7821311D11,A8,positive +TCGA-A8-A08Z,TCGA-A8-A08Z-01Z-00-DX1.D0AADCA6-C27C-4FB8-89C1-94B5901146E7,A8,positive +TCGA-A8-A090,TCGA-A8-A090-01Z-00-DX1.01574070-D65E-486F-B69D-0F8E3816D057,A8,positive +TCGA-A8-A091,TCGA-A8-A091-01Z-00-DX1.1F38CADB-A093-416E-A8A8-31E61CD730BA,A8,positive +TCGA-A8-A092,TCGA-A8-A092-01Z-00-DX1.2A55EA0C-47CD-426B-9697-F2B761730585,A8,positive +TCGA-A8-A093,TCGA-A8-A093-01Z-00-DX1.1C8056D1-11CD-482D-9A23-3A9D1B4E63F0,A8,positive +TCGA-A8-A094,TCGA-A8-A094-01Z-00-DX1.6750B72A-FEC7-49FA-8520-FCF101CA59AC,A8,positive +TCGA-A8-A095,TCGA-A8-A095-01Z-00-DX1.97072E53-7323-4836-9D58-22B7C346A7CE,A8,positive +TCGA-A8-A096,TCGA-A8-A096-01Z-00-DX1.FCECFB0D-4E02-4C69-A0FE-7D6C7F9FEB07,A8,positive +TCGA-A8-A097,TCGA-A8-A097-01Z-00-DX1.4D06990E-E759-40F0-9106-59408CF350E5,A8,positive +TCGA-A8-A099,TCGA-A8-A099-01Z-00-DX1.B19C28B5-FEBC-49B4-A60E-E6B85BB00DD7,A8,positive +TCGA-A8-A09A,TCGA-A8-A09A-01Z-00-DX1.78773A6C-BA8A-46BE-AC2A-9C6FDAB45622,A8,positive +TCGA-A8-A09B,TCGA-A8-A09B-01Z-00-DX1.97CCDA89-AB73-4F05-A900-668160227B78,A8,positive +TCGA-A8-A09C,TCGA-A8-A09C-01Z-00-DX1.19E0C0E9-0ED6-4E3A-95FE-A5778F5DDBCD,A8,positive +TCGA-A8-A09D,TCGA-A8-A09D-01Z-00-DX1.66312A8A-88BA-4B58-96DF-1A7AC39F9E4A,A8,positive +TCGA-A8-A09E,TCGA-A8-A09E-01Z-00-DX1.239A8E03-49EA-4C29-AF86-FC1BD8A9DB20,A8,positive +TCGA-A8-A09G,TCGA-A8-A09G-01Z-00-DX1.B834728C-DAC5-4463-961D-7E3B220905C1,A8,positive +TCGA-A8-A09I,TCGA-A8-A09I-01Z-00-DX1.B738E479-8766-4766-B012-B32AC2F27533,A8,positive +TCGA-A8-A09K,TCGA-A8-A09K-01Z-00-DX1.41B2DF5F-C0E1-43BB-BAA5-2946A9EC4650,A8,positive +TCGA-A8-A09M,TCGA-A8-A09M-01Z-00-DX1.E2CEA8B5-C86D-4873-BE67-359FBEECCA09,A8,positive +TCGA-A8-A09N,TCGA-A8-A09N-01Z-00-DX1.A0E4EF10-5BFF-46A0-94C7-578EA3A88599,A8,positive +TCGA-A8-A09Q,TCGA-A8-A09Q-01Z-00-DX1.C7240AFB-5971-4577-812C-A255C249001A,A8,positive +TCGA-A8-A09R,TCGA-A8-A09R-01Z-00-DX1.392580F3-0CE5-4EDB-91CF-814AAD0DB649,A8,positive +TCGA-A8-A09T,TCGA-A8-A09T-01Z-00-DX1.4183BBA4-AF11-4602-9EC2-35083C34A393,A8,positive +TCGA-A8-A09V,TCGA-A8-A09V-01Z-00-DX1.2D6731CA-5712-4EB8-8602-E53301BF4238,A8,positive +TCGA-A8-A09W,TCGA-A8-A09W-01Z-00-DX1.BF33B0F9-5786-4682-B59E-DCBD33D7BB03,A8,positive +TCGA-A8-A09Z,TCGA-A8-A09Z-01Z-00-DX1.D56497BE-5099-4537-9600-60F3213F7BF5,A8,positive +TCGA-A8-A0A1,TCGA-A8-A0A1-01Z-00-DX1.CA64E221-E2A8-42B5-B611-25B9FE5FB0B0,A8,positive +TCGA-A8-A0A2,TCGA-A8-A0A2-01Z-00-DX1.47BCDF13-1065-466A-B7CB-F91F5EAD7D91,A8,positive +TCGA-A8-A0A4,TCGA-A8-A0A4-01Z-00-DX1.06C38F98-F34B-49F0-AFB5-7F0086188346,A8,positive +TCGA-A8-A0A9,TCGA-A8-A0A9-01Z-00-DX1.16524DDD-539D-4E47-9A8F-1B910D4FEC2D,A8,positive +TCGA-A8-A0AB,TCGA-A8-A0AB-01Z-00-DX1.103ED338-A0F9-403B-A10C-49840BD60EB8,A8,positive +TCGA-A8-A0AD,TCGA-A8-A0AD-01Z-00-DX1.A831FB53-7605-4138-9341-6BBCA284441C,A8,positive +TCGA-AC-A23C,TCGA-AC-A23C-01Z-00-DX1.0E67C785-83D3-49AF-B600-FB5B909AE6ED,AC,positive +TCGA-AC-A23E,TCGA-AC-A23E-01Z-00-DX1.F12A5A87-72CF-42F8-A6EC-8E7FAD80B1F7,AC,positive +TCGA-AC-A23G,TCGA-AC-A23G-01Z-00-DX1.2F0326F7-6B77-4B3F-B4FA-59ADB785AA07,AC,positive +TCGA-AC-A23H,TCGA-AC-A23H-01Z-00-DX1.8E0AE339-1047-4CA5-BFC5-37A3B10FD8B5,AC,positive +TCGA-AC-A2B8,TCGA-AC-A2B8-01Z-00-DX1.14F3A9B2-1382-436A-B53C-92FC8495AD48,AC,positive +TCGA-AC-A2BM,TCGA-AC-A2BM-01Z-00-DX1.162A9FA9-6E61-48BF-B0A5-C6C7640ABD9B,AC,positive +TCGA-AC-A2FB,TCGA-AC-A2FB-01Z-00-DX1.A4D93E32-BBD7-45E4-8ACF-3724B059ECBC,AC,positive +TCGA-AC-A2FE,TCGA-AC-A2FE-01Z-00-DX1.C036AE58-3C0D-4DBF-B658-296D03FD6701,AC,positive +TCGA-AC-A2FF,TCGA-AC-A2FF-01Z-00-DX1.012DAC86-E05C-44FA-81C2-FE4C754F409C,AC,positive +TCGA-AC-A2FG,TCGA-AC-A2FG-01Z-00-DX1.0F13DE40-9F7A-42C6-AB3D-02E8521B690A,AC,positive +TCGA-AC-A2FK,TCGA-AC-A2FK-01Z-00-DX1.033F3C27-9860-4EF3-9330-37DE5EC45724,AC,positive +TCGA-AC-A2FO,TCGA-AC-A2FO-01Z-00-DX1.46929316-D7ED-4D69-B068-409BFFBCEC4D,AC,positive +TCGA-AC-A2QI,TCGA-AC-A2QI-01Z-00-DX1.977549F4-D866-4465-85FB-5C916282A488,AC,positive +TCGA-AC-A3BB,TCGA-AC-A3BB-01Z-00-DX1.CE889249-2A5E-44DA-B04E-746BE82CD805,AC,positive +TCGA-AC-A3EH,TCGA-AC-A3EH-01Z-00-DX1.30160436-4DE9-4F99-87E2-214E3CA47289,AC,positive +TCGA-AC-A3HN,TCGA-AC-A3HN-01Z-00-DX1.1AA71B9B-1B61-41A9-91CF-B7649DEC189E,AC,positive +TCGA-AC-A3OD,TCGA-AC-A3OD-01Z-00-DX1.E19B311B-BAB1-4CE8-9421-B49D5BE4537E,AC,positive +TCGA-AC-A3QP,TCGA-AC-A3QP-01Z-00-DX1.4B97E53E-1069-40C5-A062-C988663B37DD,AC,positive +TCGA-AC-A3QQ,TCGA-AC-A3QQ-01Z-00-DX1.86463263-AB12-49FB-8967-D5FD1F7D221A,AC,positive +TCGA-AC-A3TM,TCGA-AC-A3TM-01Z-00-DX1.4421EE37-A2CC-4638-AC44-9C26CB09FDC1,AC,positive +TCGA-AC-A3TN,TCGA-AC-A3TN-01Z-00-DX1.F00E81C7-0CA2-49E8-A30F-DE423538E93C,AC,positive +TCGA-AC-A3W5,TCGA-AC-A3W5-01Z-00-DX1.522F702C-776E-45F1-84D3-1648DF04137C,AC,positive +TCGA-AC-A3W6,TCGA-AC-A3W6-01Z-00-DX1.88CC534C-F032-4E5D-9CC4-4BB50AA46880,AC,positive +TCGA-AC-A3W7,TCGA-AC-A3W7-01Z-00-DX1.189BB448-ECBA-4042-8A35-4B787271E8FE,AC,positive +TCGA-AC-A3YI,TCGA-AC-A3YI-01Z-00-DX1.321C0A32-ABF9-48ED-8071-C6B1774E9F7B,AC,positive +TCGA-AC-A3YJ,TCGA-AC-A3YJ-01Z-00-DX1.8E665F69-FD8C-419A-871F-3AEE2E5A3A60,AC,positive +TCGA-AC-A4ZE,TCGA-AC-A4ZE-01Z-00-DX1.A9BB4DBC-4314-4A77-AA1C-0882DBC370F1,AC,positive +TCGA-AC-A5EH,TCGA-AC-A5EH-01Z-00-DX1.DF59B26B-379F-4047-B7AE-4493810D30A0,AC,positive +TCGA-AC-A5XS,TCGA-AC-A5XS-01Z-00-DX1.E5F2481E-9314-4B20-84B7-370EF82AF4BB,AC,positive +TCGA-AC-A5XU,TCGA-AC-A5XU-01Z-00-DX1.1AEEFAAF-0906-4086-8022-13B689FAB9F5,AC,positive +TCGA-AC-A62V,TCGA-AC-A62V-01Z-00-DX1.2D8994FD-58B8-43C1-B99D-AA964E7DFD60,AC,positive +TCGA-AC-A62X,TCGA-AC-A62X-01Z-00-DX1.36FDC9AA-4EBE-40E6-92C2-8471049171A4,AC,positive +TCGA-AC-A62Y,TCGA-AC-A62Y-01Z-00-DX1.5075F4DA-488C-40AD-BD73-6DE8953E1864,AC,positive +TCGA-AC-A6IV,TCGA-AC-A6IV-01Z-00-DX1.DA4F4B9F-9070-4B3A-996F-311E0A989FFB,AC,positive +TCGA-AC-A6IX,TCGA-AC-A6IX-01Z-00-DX1.45B63C4F-9DC3-4CED-9DF9-0F86E647BEA3,AC,positive +TCGA-AC-A6NO,TCGA-AC-A6NO-01Z-00-DX1.61B7F48C-6D6E-4C1B-B236-DD130ECBDA9D,AC,positive +TCGA-AC-A7VB,TCGA-AC-A7VB-01Z-00-DX1.62D7FDB3-280D-43B2-92EA-AB810384B0B7,AC,positive +TCGA-AC-A8OP,TCGA-AC-A8OP-01Z-00-DX1.A2AF501D-7DCD-4BE7-93A4-EBCAF0C0D54E,AC,positive +TCGA-AC-A8OR,TCGA-AC-A8OR-01Z-00-DX1.C9A35592-1E73-40D4-834A-46E3471D84A3,AC,positive +TCGA-AC-A8OS,TCGA-AC-A8OS-01Z-00-DX1.3FD44846-8BD2-4A7A-9A87-8D3D29C25F60,AC,positive +TCGA-AN-A03Y,TCGA-AN-A03Y-01Z-00-DX1.074C48E5-ED69-40AE-8709-444898122BF4,AN,positive +TCGA-AN-A041,TCGA-AN-A041-01Z-00-DX1.96D3628B-FB6E-4A41-A5E8-8896B0657F1C,AN,positive +TCGA-AN-A046,TCGA-AN-A046-01Z-00-DX1.C529B94F-AFE3-4701-BC98-5D6EDF7B82C0,AN,positive +TCGA-AN-A0AJ,TCGA-AN-A0AJ-01Z-00-DX1.74EE47B0-FE27-44CF-9355-C738DE1BD017,AN,positive +TCGA-AN-A0AK,TCGA-AN-A0AK-01Z-00-DX1.4B410152-8588-4000-870A-B31B21161015,AN,positive +TCGA-AN-A0AM,TCGA-AN-A0AM-01Z-00-DX1.169CE39A-DD54-46D8-8D03-60B69A473CDB,AN,positive +TCGA-AN-A0AS,TCGA-AN-A0AS-01Z-00-DX1.51E551E9-E5F4-4C94-9B25-71DA41109E92,AN,positive +TCGA-AN-A0FD,TCGA-AN-A0FD-01Z-00-DX1.8970DBBA-7D43-4C2E-B14A-6D21F1EF3425,AN,positive +TCGA-AN-A0FF,TCGA-AN-A0FF-01Z-00-DX1.A23982C3-E0EB-4DB2-84EE-26E0005E3F66,AN,positive +TCGA-AN-A0FJ,TCGA-AN-A0FJ-01Z-00-DX1.97B60767-916E-4938-9D0B-E6C0FE1CB3FC,AN,positive +TCGA-AN-A0FK,TCGA-AN-A0FK-01Z-00-DX1.8966A1D5-CE3A-4B08-A1F6-E613BEB1ABD1,AN,positive +TCGA-AN-A0FN,TCGA-AN-A0FN-01Z-00-DX1.CAA3C2D0-7E74-48E5-ACB7-487434C7AAD2,AN,positive +TCGA-AN-A0FS,TCGA-AN-A0FS-01Z-00-DX1.104368C2-28DA-4E43-8B27-78336AD5ABC8,AN,positive +TCGA-AN-A0FT,TCGA-AN-A0FT-01Z-00-DX1.6F263AB3-FB1D-4056-AEC2-4F163017324F,AN,positive +TCGA-AN-A0FW,TCGA-AN-A0FW-01Z-00-DX1.5EEDFC71-6ABE-4189-8C45-2127403A8F04,AN,positive +TCGA-AN-A0FY,TCGA-AN-A0FY-01Z-00-DX1.25F5E2A1-F92C-4FE1-BD90-0CDDE50DC066,AN,positive +TCGA-AN-A0FZ,TCGA-AN-A0FZ-01Z-00-DX1.9555AF11-3A0D-4FE3-AE91-09DA77B175CA,AN,positive +TCGA-AN-A0XL,TCGA-AN-A0XL-01Z-00-DX1.E90AA056-51DC-4A6B-96EB-A0B707496912,AN,positive +TCGA-AN-A0XO,TCGA-AN-A0XO-01Z-00-DX1.204E554E-B1A8-41AD-8E39-62484DF4E3CD,AN,positive +TCGA-AN-A0XP,TCGA-AN-A0XP-01Z-00-DX1.A4EE3970-5C1F-482E-9AED-F1E3C52A776F,AN,positive +TCGA-AN-A0XT,TCGA-AN-A0XT-01Z-00-DX1.882AF4CE-62A1-4808-9B4B-780066E5E602,AN,positive +TCGA-AO-A03L,TCGA-AO-A03L-01Z-00-DX1.DE6561A9-E6B8-4BF0-AAA8-F91C87A66037,AO,positive +TCGA-AO-A03M,TCGA-AO-A03M-01Z-00-DX1.9998A9A0-D0A6-48FC-80FB-AE597CB9E8AA,AO,positive +TCGA-AO-A03N,TCGA-AO-A03N-01Z-00-DX1.79A8816D-961F-4D79-90B4-341919297A90,AO,positive +TCGA-AO-A03O,TCGA-AO-A03O-01Z-00-DX1.054BBC89-69F4-4311-814D-E343472CD2C0,AO,positive +TCGA-AO-A03P,TCGA-AO-A03P-01Z-00-DX1.D34DA321-D8E3-4D68-BCE2-4A8B72B3D0AE,AO,positive +TCGA-AO-A03R,TCGA-AO-A03R-01Z-00-DX1.DC317201-7F22-4C4A-86A5-ED82C37757C8,AO,positive +TCGA-AO-A03T,TCGA-AO-A03T-01Z-00-DX1.8B75E203-6DC8-4AC9-A256-6FCB818DA0DD,AO,positive +TCGA-AO-A03V,TCGA-AO-A03V-01Z-00-DX1.52EBCB72-0C65-4E67-B9BB-DA15494327DE,AO,positive +TCGA-AO-A0J3,TCGA-AO-A0J3-01Z-00-DX1.8F28BFC8-D37E-425F-A9E3-4C23F083237C,AO,positive +TCGA-AO-A0J5,TCGA-AO-A0J5-01Z-00-DX1.20C14D0C-1A74-4FE9-A5E6-BDDCB8DE7714,AO,positive +TCGA-AO-A0J7,TCGA-AO-A0J7-01Z-00-DX1.DBA7BA24-CEEF-4636-87BB-D12D8DF016A3,AO,positive +TCGA-AO-A0J8,TCGA-AO-A0J8-01Z-00-DX1.9BDD4BDE-2A07-4E0C-B146-87365CA9DE3A,AO,positive +TCGA-AO-A0J9,TCGA-AO-A0J9-01Z-00-DX1.E37C342B-9B6A-4BEE-B064-A82C3579CC66,AO,positive +TCGA-AO-A0JA,TCGA-AO-A0JA-01Z-00-DX1.E3DDBAAF-6FFA-40BE-83FF-942857CFED5D,AO,positive +TCGA-AO-A0JB,TCGA-AO-A0JB-01Z-00-DX1.250FE098-345B-4981-9236-0519E1C9058E,AO,positive +TCGA-AO-A0JC,TCGA-AO-A0JC-01Z-00-DX1.C8DD421B-9799-4FE7-9224-5EAC6ED1028E,AO,positive +TCGA-AO-A0JD,TCGA-AO-A0JD-01Z-00-DX1.52E3DCE8-32E3-45C0-AB54-7FA0A2F3F722,AO,positive +TCGA-AO-A0JF,TCGA-AO-A0JF-01Z-00-DX1.3277D13C-F257-4D4C-AB5C-BEBA2E1A63D8,AO,positive +TCGA-AO-A0JG,TCGA-AO-A0JG-01Z-00-DX1.FA785D56-B4FF-4CC3-A631-5C748E8E0558,AO,positive +TCGA-AO-A0JI,TCGA-AO-A0JI-01Z-00-DX1.4DCDA545-B7E3-49DF-AB10-963E84BEB105,AO,positive +TCGA-AO-A0JJ,TCGA-AO-A0JJ-01Z-00-DX1.D5B636F5-1B47-4033-9938-9DC8CD48CEE9,AO,positive +TCGA-AO-A0JM,TCGA-AO-A0JM-01Z-00-DX1.94E75EFD-E5F5-4DF8-93A0-ED94CB4D203A,AO,positive +TCGA-AO-A125,TCGA-AO-A125-01Z-00-DX1.DFC45CE1-5D81-4C2B-9281-60E2F146DDB6,AO,positive +TCGA-AO-A126,TCGA-AO-A126-01Z-00-DX1.D9D6AA15-32F0-44BD-AF30-36191784FFA2,AO,positive +TCGA-AO-A12A,TCGA-AO-A12A-01Z-00-DX1.4E9609A7-9AAD-40A8-8344-8369DF998006,AO,positive +TCGA-AO-A12B,TCGA-AO-A12B-01Z-00-DX1.B215230B-5FF7-4B0A-9C1E-5F1658534B11,AO,positive +TCGA-AO-A12C,TCGA-AO-A12C-01Z-00-DX1.F4105027-FA10-48AF-BA8F-C7A0181C4A1D,AO,positive +TCGA-AO-A12G,TCGA-AO-A12G-01Z-00-DX1.1C37E1FE-CF36-4570-A864-3813B8ADBA36,AO,positive +TCGA-AO-A12H,TCGA-AO-A12H-01Z-00-DX1.C6CCBD9D-FB41-4F6C-84FB-A43E0A77E696,AO,positive +TCGA-AO-A1KO,TCGA-AO-A1KO-01Z-00-DX1.EEB5E0A0-92B2-42CD-9F7A-00E9250B561F,AO,positive +TCGA-AO-A1KP,TCGA-AO-A1KP-01Z-00-DX1.612442AB-24E2-489B-9838-7CEE82BCD605,AO,positive +TCGA-AO-A1KQ,TCGA-AO-A1KQ-01Z-00-DX1.CAB7D9A5-7030-4A33-BE51-9B04D67A7676,AO,positive +TCGA-AO-A1KS,TCGA-AO-A1KS-01Z-00-DX1.349F9BCA-C9F0-43B4-BA6A-D20039E1C720,AO,positive +TCGA-AO-A1KT,TCGA-AO-A1KT-01Z-00-DX1.47878910-4359-4795-B0BB-FAC147C8D932,AO,positive +TCGA-AQ-A04H,TCGA-AQ-A04H-01Z-00-DX1.5AC1E459-EF27-401D-98FD-0AC16559AF17,AQ,positive +TCGA-AQ-A04L,TCGA-AQ-A04L-01Z-00-DX1.F35BF4DD-896A-43C2-8DA6-C67945C003BE,AQ,positive +TCGA-AQ-A0Y5,TCGA-AQ-A0Y5-01Z-00-DX1.f68f5b49-30fa-4fb6-bec6-5da9f6809d02,AQ,positive +TCGA-AQ-A54O,TCGA-AQ-A54O-01Z-00-DX1.85AB0AC4-47BB-4CD8-B7FB-5928108864AC,AQ,positive +TCGA-AQ-A7U7,TCGA-AQ-A7U7-01Z-00-DX1.AB5629A1-399E-402B-AD3D-DAF9F97D8300,AQ,positive +TCGA-AR-A0TP,TCGA-AR-A0TP-01Z-00-DX1.58E6A3A6-D1D7-4AE5-87A5-E823F04CBB2F,AR,positive +TCGA-AR-A0TQ,TCGA-AR-A0TQ-01Z-00-DX1.2BEA298C-6B3D-4133-ADBC-769E62CEFFA0,AR,positive +TCGA-AR-A0TR,TCGA-AR-A0TR-01Z-00-DX1.BBCE653F-7DD0-4830-BAD3-C06207A93853,AR,positive +TCGA-AR-A0TT,TCGA-AR-A0TT-01Z-00-DX1.127C2A4F-62AC-4E83-99DB-13519BD2949D,AR,positive +TCGA-AR-A0TV,TCGA-AR-A0TV-01Z-00-DX1.29AE9996-E73A-41F8-BCC9-23DEDCA5A7AC,AR,positive +TCGA-AR-A0TW,TCGA-AR-A0TW-01Z-00-DX1.0C1EAE07-6BA6-4908-89E1-0133F3640174,AR,positive +TCGA-AR-A0TX,TCGA-AR-A0TX-01Z-00-DX1.5BEA4E65-8CEC-49B2-9F29-DBE53E8ED46E,AR,positive +TCGA-AR-A0TY,TCGA-AR-A0TY-01Z-00-DX1.089EB6B4-9921-441D-84F3-C64C97AFC3B4,AR,positive +TCGA-AR-A0TZ,TCGA-AR-A0TZ-01Z-00-DX1.2D58BE38-03F6-4310-8E06-F1A523FB0904,AR,positive +TCGA-AR-A0U2,TCGA-AR-A0U2-01Z-00-DX1.03E2ADD9-F20F-44DA-93D1-D10BBD3844A5,AR,positive +TCGA-AR-A0U3,TCGA-AR-A0U3-01Z-00-DX1.9BB87F9D-459F-4A18-B591-29822EA5AE18,AR,positive +TCGA-AR-A1AH,TCGA-AR-A1AH-01Z-00-DX1.5D5D7774-77BB-4EA0-9E8E-A8D28F70A575,AR,positive +TCGA-AR-A1AJ,TCGA-AR-A1AJ-01Z-00-DX1.34B9FCF0-74D8-4328-9B5A-698AD57EDA85,AR,positive +TCGA-AR-A1AK,TCGA-AR-A1AK-01Z-00-DX1.0AFFA0B5-D1A6-43E9-892A-7CB16A79E5F9,AR,positive +TCGA-AR-A1AL,TCGA-AR-A1AL-01Z-00-DX1.0970370F-6D4B-449A-AF33-76355DD99F91,AR,positive +TCGA-AR-A1AM,TCGA-AR-A1AM-01Z-00-DX1.B3F006D9-9386-41E5-B0B1-B0832EE104A0,AR,positive +TCGA-AR-A1AN,TCGA-AR-A1AN-01Z-00-DX1.1118F9FE-6DF2-4496-B102-D3A10D332EC0,AR,positive +TCGA-AR-A1AP,TCGA-AR-A1AP-01Z-00-DX1.F163EED5-0385-41AA-BA35-54E4910CD99E,AR,positive +TCGA-AR-A1AS,TCGA-AR-A1AS-01Z-00-DX1.72979519-0A90-42AD-9F23-1991F75C8CE7,AR,positive +TCGA-AR-A1AT,TCGA-AR-A1AT-01Z-00-DX1.6ED69002-2CC1-4AD4-9573-B245D7C2D060,AR,positive +TCGA-AR-A1AU,TCGA-AR-A1AU-01Z-00-DX1.F06647D1-E5A3-46FE-8537-01A10A08A4EC,AR,positive +TCGA-AR-A1AV,TCGA-AR-A1AV-01Z-00-DX1.93698893-7C5C-44C1-A488-ED358D523693,AR,positive +TCGA-AR-A1AW,TCGA-AR-A1AW-01Z-00-DX1.E527CA46-D83F-4055-8C7E-AEFEF13C1E29,AR,positive +TCGA-AR-A1AX,TCGA-AR-A1AX-01Z-00-DX1.2389D54F-545E-499E-B392-DD731834460A,AR,positive +TCGA-AR-A24H,TCGA-AR-A24H-01Z-00-DX1.5CFC7E16-3F38-4531-968C-A4E4C9D00659,AR,positive +TCGA-AR-A24K,TCGA-AR-A24K-01Z-00-DX1.3A56BAEC-484E-4F9B-BCB4-360ABF6DDB4B,AR,positive +TCGA-AR-A24L,TCGA-AR-A24L-01Z-00-DX1.218A0AE1-D070-4A16-A277-31185F10724D,AR,positive +TCGA-AR-A24M,TCGA-AR-A24M-01Z-00-DX1.D81F1BDF-00DD-42A3-AFFE-55655A877213,AR,positive +TCGA-AR-A24N,TCGA-AR-A24N-01Z-00-DX1.563E9160-9E09-4E3F-8CE9-FFED0AA52927,AR,positive +TCGA-AR-A24O,TCGA-AR-A24O-01Z-00-DX1.04EFBAC8-7A4A-4005-890C-5CC3E1C67DBD,AR,positive +TCGA-AR-A24P,TCGA-AR-A24P-01Z-00-DX1.08BE8830-C55E-4E2B-9378-371A670C75FD,AR,positive +TCGA-AR-A24Q,TCGA-AR-A24Q-01Z-00-DX1.69DC7E35-DC0D-4F20-88DA-04C25F28628C,AR,positive +TCGA-AR-A24R,TCGA-AR-A24R-01Z-00-DX1.47D79205-63E7-43E6-A513-2CD10DFA53C9,AR,positive +TCGA-AR-A24S,TCGA-AR-A24S-01Z-00-DX1.317C3B19-0D0F-4315-AC2D-C741F09DD775,AR,positive +TCGA-AR-A24T,TCGA-AR-A24T-01Z-00-DX1.3B325D54-38EC-4E16-BDE1-628894CC15AA,AR,positive +TCGA-AR-A24V,TCGA-AR-A24V-01Z-00-DX1.E749C2FE-E26B-4306-80B4-8269EEB23270,AR,positive +TCGA-AR-A24W,TCGA-AR-A24W-01Z-00-DX1.0663B4A3-6475-4981-91F0-20D2C79CBF89,AR,positive +TCGA-AR-A24X,TCGA-AR-A24X-01Z-00-DX1.304BF142-B01A-4AE9-B98C-FB3DE3BEAE57,AR,positive +TCGA-AR-A24Z,TCGA-AR-A24Z-01Z-00-DX1.C88AFF18-A8E4-426A-BEA3-70566DE39C46,AR,positive +TCGA-AR-A250,TCGA-AR-A250-01Z-00-DX1.572A2F73-6BF5-4932-BFEF-4B8904755ACF,AR,positive +TCGA-AR-A251,TCGA-AR-A251-01Z-00-DX1.E91E357A-3AD4-4F79-A698-05F0B21A37EC,AR,positive +TCGA-AR-A252,TCGA-AR-A252-01Z-00-DX1.2E7EE069-A0B8-489B-A2F0-3175FB58F468,AR,positive +TCGA-AR-A254,TCGA-AR-A254-01Z-00-DX1.EA5CD008-1106-41EC-998A-04EF08EEEC9D,AR,positive +TCGA-AR-A255,TCGA-AR-A255-01Z-00-DX1.E67C081D-50C9-451C-814B-F097B2671300,AR,positive +TCGA-AR-A2LE,TCGA-AR-A2LE-01Z-00-DX1.E8E1FC5C-9A4A-463F-81D7-E9A8AF43EAD9,AR,positive +TCGA-AR-A2LJ,TCGA-AR-A2LJ-01Z-00-DX1.9D6257FA-2767-4640-9252-9BE18BD9D158,AR,positive +TCGA-AR-A2LK,TCGA-AR-A2LK-01Z-00-DX1.FBD59C38-CD4E-4C22-BC74-A57C192A9BBC,AR,positive +TCGA-AR-A2LL,TCGA-AR-A2LL-01Z-00-DX1.F0AF890C-2B34-4758-A929-58AAFF593EC5,AR,positive +TCGA-AR-A2LM,TCGA-AR-A2LM-01Z-00-DX1.CB64A062-2E35-41F9-97C7-07BB60754799,AR,positive +TCGA-AR-A2LN,TCGA-AR-A2LN-01Z-00-DX1.3EF43524-28B8-4326-9356-6BCBC63A1DA8,AR,positive +TCGA-AR-A2LO,TCGA-AR-A2LO-01Z-00-DX1.A8C81E1E-B224-4CF4-83D4-FFB2F51A7002,AR,positive +TCGA-AR-A2LQ,TCGA-AR-A2LQ-01Z-00-DX1.2F579E62-69A8-40F4-A516-542C7C808628,AR,positive +TCGA-AR-A5QM,TCGA-AR-A5QM-01Z-00-DX1.A9017CED-FF57-4090-95B9-94DC91854347,AR,positive +TCGA-AR-A5QN,TCGA-AR-A5QN-01Z-00-DX1.BF92AA24-3D44-4FFA-BCC0-CDB934EC3BBC,AR,positive +TCGA-AR-A5QP,TCGA-AR-A5QP-01Z-00-DX1.256FDB13-1F81-42DA-AF6E-8A94835550C1,AR,positive +TCGA-B6-A0I5,TCGA-B6-A0I5-01Z-00-DX1.47A30509-4BA0-4BA5-AD9B-F2C2DB701159,B6,positive +TCGA-B6-A0IA,TCGA-B6-A0IA-01Z-00-DX1.8A868F0A-6CE2-46A7-AF0C-C5FC5D657D86,B6,positive +TCGA-B6-A0IB,TCGA-B6-A0IB-01Z-00-DX1.BAA1D655-1B80-49E2-B1EB-2ECC83DED989,B6,positive +TCGA-B6-A0IC,TCGA-B6-A0IC-01Z-00-DX1.60ED7BBF-A448-4E4E-80F9-91527A697588,B6,positive +TCGA-B6-A0IG,TCGA-B6-A0IG-01Z-00-DX1.4238CB6E-0561-49FD-9C49-9B8AEAFC4618,B6,positive +TCGA-B6-A0IH,TCGA-B6-A0IH-01Z-00-DX1.3463B12E-D1B0-4FB2-B257-BFD63BA2B6BF,B6,positive +TCGA-B6-A0IJ,TCGA-B6-A0IJ-01Z-00-DX1.BF2E062F-06DA-4CA8-86C4-36674C035CAA,B6,positive +TCGA-B6-A0IM,TCGA-B6-A0IM-01Z-00-DX1.E2CBD42F-D933-4446-BE7C-C2F8B5AFB976,B6,positive +TCGA-B6-A0IN,TCGA-B6-A0IN-01Z-00-DX1.5C1CBEF7-0C4A-4B22-A85B-6F6FD9435573,B6,positive +TCGA-B6-A0IO,TCGA-B6-A0IO-01Z-00-DX1.D8898C30-4016-4983-9359-5C1507C01715,B6,positive +TCGA-B6-A0IP,TCGA-B6-A0IP-01Z-00-DX1.3723A250-AA4B-4AFE-A692-C1311E9BA268,B6,positive +TCGA-B6-A0RH,TCGA-B6-A0RH-01Z-00-DX1.33DB5CF9-AC87-435B-A8AF-C4F84BEF15F1,B6,positive +TCGA-B6-A0RI,TCGA-B6-A0RI-01Z-00-DX1.E39951A4-AC6A-4B07-851B-F35CB86D79AA,B6,positive +TCGA-B6-A0RL,TCGA-B6-A0RL-01Z-00-DX1.B5B6709B-5F38-4561-A0E7-C78F97B33F22,B6,positive +TCGA-B6-A0RM,TCGA-B6-A0RM-01Z-00-DX1.31DE3E32-7996-4D30-BB5F-72C94409C940,B6,positive +TCGA-B6-A0RO,TCGA-B6-A0RO-01Z-00-DX1.3ADBFF05-92CE-41B5-BD49-CB3CE5B74CC9,B6,positive +TCGA-B6-A0RP,TCGA-B6-A0RP-01Z-00-DX1.02A55E9D-2AA9-497A-B481-85724EA813AD,B6,positive +TCGA-B6-A0RQ,TCGA-B6-A0RQ-01Z-00-DX1.68B4F49D-9F81-4501-BC66-0349012077C8,B6,positive +TCGA-B6-A0RV,TCGA-B6-A0RV-01Z-00-DX1.C46C2937-868D-47B7-B9B9-E51F5433BE97,B6,positive +TCGA-B6-A0WS,TCGA-B6-A0WS-01Z-00-DX1.020ED7BF-9497-4F13-AA07-FE0E839F9A06,B6,positive +TCGA-B6-A0WT,TCGA-B6-A0WT-01Z-00-DX1.7ED21DA2-A3D8-4DCA-9742-963879A658D8,B6,positive +TCGA-B6-A0WV,TCGA-B6-A0WV-01Z-00-DX1.A8B9E114-A8CF-4389-B47C-2E1B842F7FF9,B6,positive +TCGA-B6-A0WW,TCGA-B6-A0WW-01Z-00-DX1.64048633-1D1A-4074-8C50-7641159355DB,B6,positive +TCGA-B6-A0WY,TCGA-B6-A0WY-01Z-00-DX1.FA103E02-3BDC-47DC-BA6E-59FCC58866D1,B6,positive +TCGA-B6-A0WZ,TCGA-B6-A0WZ-01Z-00-DX1.6CFB236E-36F5-43D6-8DE3-C4ECBD3C14C6,B6,positive +TCGA-B6-A0X0,TCGA-B6-A0X0-01Z-00-DX1.5D67D9D5-C2F0-40BE-A661-E25B6A6287A2,B6,positive +TCGA-B6-A0X4,TCGA-B6-A0X4-01Z-00-DX1.4B7914DE-3436-4D69-9BC6-F6419D3CC518,B6,positive +TCGA-B6-A0X5,TCGA-B6-A0X5-01Z-00-DX1.02A9FF1E-EA20-4F2D-A2BB-427A287DE3FD,B6,positive +TCGA-B6-A0X7,TCGA-B6-A0X7-01Z-00-DX1.A2AA8FE9-CF43-434A-AE72-7277D124BD0A,B6,positive +TCGA-B6-A1KC,TCGA-B6-A1KC-01Z-00-DX1.4DD3E48B-F434-499F-9FF1-0FFD2883A375,B6,positive +TCGA-B6-A1KI,TCGA-B6-A1KI-01Z-00-DX1.EFAA08A8-02EC-4A9B-AC6B-79BF2DC5A2AA,B6,positive +TCGA-B6-A2IU,TCGA-B6-A2IU-01Z-00-DX1.3C7DF701-4ED9-4989-AA63-A6B7B105F6EE,B6,positive +TCGA-BH-A0AU,TCGA-BH-A0AU-01Z-00-DX1.E852F218-B15C-4CF8-8FD0-F284043834C4,BH,positive +TCGA-BH-A0AW,TCGA-BH-A0AW-01Z-00-DX1.9D50A0D2-B103-411C-831E-8520C3D50173,BH,positive +TCGA-BH-A0AY,TCGA-BH-A0AY-01Z-00-DX1.A1153465-D9B3-4D83-BE85-F2524B071EE5,BH,positive +TCGA-BH-A0AZ,TCGA-BH-A0AZ-01Z-00-DX1.E20051E8-DEEF-48E5-B519-40777DDBC96B,BH,positive +TCGA-BH-A0B0,TCGA-BH-A0B0-01Z-00-DX1.316D35DB-7F13-4AE5-82A7-5716D2519669,BH,positive +TCGA-BH-A0B1,TCGA-BH-A0B1-01Z-00-DX1.CC60D284-04FB-4B73-90AA-FCE9185DADB6,BH,positive +TCGA-BH-A0B4,TCGA-BH-A0B4-01Z-00-DX1.A2CB0BF4-32F9-48FD-9B96-A744013BADDB,BH,positive +TCGA-BH-A0B5,TCGA-BH-A0B5-01Z-00-DX1.742DB0E8-8EB1-47C8-B698-8E2438FB6299,BH,positive +TCGA-BH-A0B6,TCGA-BH-A0B6-01Z-00-DX1.4D982935-F600-4F37-ABB5-13AF74F4FDC4,BH,positive +TCGA-BH-A0B7,TCGA-BH-A0B7-01Z-00-DX1.6950CDDF-8A81-4B10-BFFF-BE0E33A2C6CC,BH,positive +TCGA-BH-A0B8,TCGA-BH-A0B8-01Z-00-DX1.380ABAE9-EA83-4E9B-BFB2-6476A86C1ADD,BH,positive +TCGA-BH-A0BA,TCGA-BH-A0BA-01Z-00-DX1.579E11C9-437C-4D49-8811-DD94D8454712,BH,positive +TCGA-BH-A0BC,TCGA-BH-A0BC-01Z-00-DX1.7A91831D-9625-49FC-B783-C633F80F898C,BH,positive +TCGA-BH-A0BD,TCGA-BH-A0BD-01Z-00-DX1.CD4A6FC2-BA8C-4E30-972A-E6CD1BEAD8AD,BH,positive +TCGA-BH-A0BF,TCGA-BH-A0BF-01Z-00-DX1.934DF984-9054-4B20-B85B-9CF94B8DC3D4,BH,positive +TCGA-BH-A0BJ,TCGA-BH-A0BJ-01Z-00-DX1.77BE9A20-3D23-4650-843A-0483BF9DD07B,BH,positive +TCGA-BH-A0BM,TCGA-BH-A0BM-01Z-00-DX1.8BD324F2-60F5-4F59-AF2D-533A5FB230BF,BH,positive +TCGA-BH-A0BO,TCGA-BH-A0BO-01Z-00-DX1.1A704471-FEB3-40F9-9838-3E347A18285F,BH,positive +TCGA-BH-A0BP,TCGA-BH-A0BP-01Z-00-DX1.63A87C1D-87FA-494D-9836-74290B5DC30D,BH,positive +TCGA-BH-A0BQ,TCGA-BH-A0BQ-01Z-00-DX1.F5C6386B-BB2C-49DF-B3B8-6C4B80D060D4,BH,positive +TCGA-BH-A0BR,TCGA-BH-A0BR-01Z-00-DX1.F7912887-8AE9-4391-9B83-9AC7B4FF38EF,BH,positive +TCGA-BH-A0BS,TCGA-BH-A0BS-01Z-00-DX1.FEE32127-4D0B-4560-A39C-4EAA5B189B70,BH,positive +TCGA-BH-A0BT,TCGA-BH-A0BT-01Z-00-DX1.9087B9E7-C0CD-4179-AF57-AD9255785169,BH,positive +TCGA-BH-A0BV,TCGA-BH-A0BV-01Z-00-DX1.3C9CD4F7-A001-430D-AA8D-80CAD61C344E,BH,positive +TCGA-BH-A0BZ,TCGA-BH-A0BZ-01Z-00-DX1.45EB3E93-A871-49C6-9EAE-90D98AE01913,BH,positive +TCGA-BH-A0C0,TCGA-BH-A0C0-01Z-00-DX1.2D32D35A-EB7E-4D0E-BE5F-E56F7B930463,BH,positive +TCGA-BH-A0C1,TCGA-BH-A0C1-01Z-00-DX1.21FE357E-B182-4397-BFEF-7E96E994236A,BH,positive +TCGA-BH-A0C3,TCGA-BH-A0C3-01Z-00-DX1.14D3210D-0CBE-4DCA-A986-A26AE5382502,BH,positive +TCGA-BH-A0C7,TCGA-BH-A0C7-01Z-00-DX1.C70D358E-C48F-4F69-86CE-3218E9C95837,BH,positive +TCGA-BH-A0DD,TCGA-BH-A0DD-01Z-00-DX1.2242DD1F-0B7C-4B43-A7A9-8130D2BDEE78,BH,positive +TCGA-BH-A0DE,TCGA-BH-A0DE-01Z-00-DX1.64A0340A-8146-48E8-AAF7-4035988B9152,BH,positive +TCGA-BH-A0DG,TCGA-BH-A0DG-01Z-00-DX1.D3C4F57F-608F-46AA-A978-B558117C9DFB,BH,positive +TCGA-BH-A0DH,TCGA-BH-A0DH-01Z-00-DX1.968AA1FD-068C-4B90-8C95-85BF2EFC3E3F,BH,positive +TCGA-BH-A0DI,TCGA-BH-A0DI-01Z-00-DX1.6A42D535-8842-4C36-8299-A40E9E56759D,BH,positive +TCGA-BH-A0DK,TCGA-BH-A0DK-01Z-00-DX1.0CFED53C-BAD9-4E35-B12F-57E64F3FEF1C,BH,positive +TCGA-BH-A0DL,TCGA-BH-A0DL-01Z-00-DX1.553F0021-1FC2-4675-810D-BBD9785D065E,BH,positive +TCGA-BH-A0DO,TCGA-BH-A0DO-01Z-00-DX1.1684557E-A5D4-4828-B2D5-FA899993A019,BH,positive +TCGA-BH-A0DP,TCGA-BH-A0DP-01Z-00-DX1.5550557D-4E5B-4B57-8C1E-9290A7AA32A9,BH,positive +TCGA-BH-A0DQ,TCGA-BH-A0DQ-01Z-00-DX1.5A98747C-52D9-45AD-BC84-934A96F069EB,BH,positive +TCGA-BH-A0DS,TCGA-BH-A0DS-01Z-00-DX1.38E82A1F-21B9-4B97-A304-15B886EA68A0,BH,positive +TCGA-BH-A0DT,TCGA-BH-A0DT-01Z-00-DX1.73AFCEBB-06B1-4870-ADA2-881511B1BE2D,BH,positive +TCGA-BH-A0DV,TCGA-BH-A0DV-01Z-00-DX1.2F0B5FB3-40F0-4D27-BFAC-390FB9A42B39,BH,positive +TCGA-BH-A0DX,TCGA-BH-A0DX-01Z-00-DX1.45C27E71-9A0A-400E-93A9-5CE7780F3C5E,BH,positive +TCGA-BH-A0DZ,TCGA-BH-A0DZ-01Z-00-DX1.138BAEC2-589E-4960-B94E-DF48DDAA5490,BH,positive +TCGA-BH-A0E1,TCGA-BH-A0E1-01Z-00-DX1.257EB36A-1008-4890-927B-204EF05DBE90,BH,positive +TCGA-BH-A0E2,TCGA-BH-A0E2-01Z-00-DX1.5F6FA19C-D59F-45BE-80CE-F738CAB1EF0B,BH,positive +TCGA-BH-A0E7,TCGA-BH-A0E7-01Z-00-DX1.7FEE54C4-3795-403C-85B9-FF932BE56788,BH,positive +TCGA-BH-A0E9,TCGA-BH-A0E9-01Z-00-DX1.5FC4A5E1-3984-4869-9FFB-C0C7F9EAF5FE,BH,positive +TCGA-BH-A0EA,TCGA-BH-A0EA-01Z-00-DX1.85FF2B48-2AF7-4C15-A7E6-FCA68CAB76C7,BH,positive +TCGA-BH-A0EB,TCGA-BH-A0EB-01Z-00-DX1.70D7BBA0-214D-4D1A-933C-7CFCDA5416A3,BH,positive +TCGA-BH-A0EI,TCGA-BH-A0EI-01Z-00-DX1.929E126A-C9F8-4240-BF00-B6C4A57B7FF6,BH,positive +TCGA-BH-A0GY,TCGA-BH-A0GY-01Z-00-DX1.E1AEAA36-94FA-41ED-851F-7BB11DD3C06D,BH,positive +TCGA-BH-A0GZ,TCGA-BH-A0GZ-01Z-00-DX1.9AD4F493-A0A9-499D-B667-B33333FC1A51,BH,positive +TCGA-BH-A0H0,TCGA-BH-A0H0-01Z-00-DX1.B3B394DD-1F4E-42CA-BB51-C002E423B3D0,BH,positive +TCGA-BH-A0H3,TCGA-BH-A0H3-01Z-00-DX1.C6D4DFB9-A4FA-40B2-90EF-75294CBC4523,BH,positive +TCGA-BH-A0H5,TCGA-BH-A0H5-01Z-00-DX1.28F24D4D-EE80-4EDA-BC30-A194E22FD61C,BH,positive +TCGA-BH-A0H6,TCGA-BH-A0H6-01Z-00-DX1.0CF7253A-B42C-4778-B1C6-17EC78A039ED,BH,positive +TCGA-BH-A0H7,TCGA-BH-A0H7-01Z-00-DX1.21502FEA-DB37-4630-BA42-778DDF3A6D17,BH,positive +TCGA-BH-A0H9,TCGA-BH-A0H9-01Z-00-DX1.8AE869C6-5C78-4D52-AC8B-5B6FD5FD91AA,BH,positive +TCGA-BH-A0HA,TCGA-BH-A0HA-01Z-00-DX1.5A2F5C86-E1E7-445D-A4E3-29980FA37708,BH,positive +TCGA-BH-A0HB,TCGA-BH-A0HB-01Z-00-DX1.F90F7139-B804-4548-9FAF-9B475BF225EB,BH,positive +TCGA-BH-A0HF,TCGA-BH-A0HF-01Z-00-DX1.1EB2F7CE-54F4-4E41-A3CF-78209770B7CE,BH,positive +TCGA-BH-A0HI,TCGA-BH-A0HI-01Z-00-DX1.89DB9F79-0DCC-4C65-831E-96AE026C2C4E,BH,positive +TCGA-BH-A0HK,TCGA-BH-A0HK-01Z-00-DX1.019036F5-647A-4EC4-8479-F2C100291AEF,BH,positive +TCGA-BH-A0HL,TCGA-BH-A0HL-01Z-00-DX1.972DE96D-51F4-4A2D-B074-8D965023DC01,BH,positive +TCGA-BH-A0HN,TCGA-BH-A0HN-01Z-00-DX1.4997F641-C593-41F6-82ED-6A0E4EFA55F3,BH,positive +TCGA-BH-A0HO,TCGA-BH-A0HO-01Z-00-DX1.D3D66547-F5D4-40F5-B737-2FECEEB35ACB,BH,positive +TCGA-BH-A0HP,TCGA-BH-A0HP-01Z-00-DX1.F1FF6E88-B2D1-40AA-8373-1714D82E666F,BH,positive +TCGA-BH-A0HQ,TCGA-BH-A0HQ-01Z-00-DX1.0921FCEF-20A2-4D4B-A198-91AF9F6C814C,BH,positive +TCGA-BH-A0HU,TCGA-BH-A0HU-01Z-00-DX1.73B38904-E4F8-4F45-BD75-A27EC833B6DE,BH,positive +TCGA-BH-A0HW,TCGA-BH-A0HW-01Z-00-DX1.44DCCE00-133F-4469-A4DA-5057C011B4AC,BH,positive +TCGA-BH-A0HX,TCGA-BH-A0HX-01Z-00-DX1.44EBA39E-383D-4820-915E-4E5FDA1D43DA,BH,positive +TCGA-BH-A0HY,TCGA-BH-A0HY-01Z-00-DX1.D7A1BF42-F268-4220-B71B-354CE2AF32D8,BH,positive +TCGA-BH-A0W3,TCGA-BH-A0W3-01Z-00-DX1.38DCA2B7-1A54-4FFA-BABA-29F80974BB17,BH,positive +TCGA-BH-A0W4,TCGA-BH-A0W4-01Z-00-DX1.23E017DC-15CB-4838-AC91-CFB11B932AB0,BH,positive +TCGA-BH-A0W5,TCGA-BH-A0W5-01Z-00-DX1.84EA3913-667B-4694-A0F0-E629CAC615C8,BH,positive +TCGA-BH-A0W7,TCGA-BH-A0W7-01Z-00-DX1.93F11232-A1FB-4682-B744-3531D4DF11AF,BH,positive +TCGA-BH-A18F,TCGA-BH-A18F-01Z-00-DX1.81A81589-2D77-4CD6-BD34-76E292AA031D,BH,positive +TCGA-BH-A18H,TCGA-BH-A18H-01Z-00-DX1.4EC9108F-04C2-4B28-BD74-97A414C9A536,BH,positive +TCGA-BH-A18I,TCGA-BH-A18I-01Z-00-DX1.EE699DE3-EBF4-4AB4-8319-9911F027FA18,BH,positive +TCGA-BH-A18J,TCGA-BH-A18J-01Z-00-DX1.AE20ADB6-050C-405D-BF17-1EE4A1A4979A,BH,positive +TCGA-BH-A18L,TCGA-BH-A18L-01Z-00-DX1.01D6E3EF-6857-4CC6-BEA1-0BBC1A0B728F,BH,positive +TCGA-BH-A18M,TCGA-BH-A18M-01Z-00-DX1.56E6935A-62CB-4CBE-BDCA-60DA56422CE0,BH,positive +TCGA-BH-A18N,TCGA-BH-A18N-01Z-00-DX1.FE0E24A1-8AE9-4021-B9C7-B408EE0DE329,BH,positive +TCGA-BH-A18P,TCGA-BH-A18P-01Z-00-DX1.C66642D3-BE44-4D65-B4AE-C1C3D959D22C,BH,positive +TCGA-BH-A18S,TCGA-BH-A18S-01Z-00-DX1.B36FC684-0010-4ABB-9E25-303C8DF1C4E2,BH,positive +TCGA-BH-A18U,TCGA-BH-A18U-01Z-00-DX1.63940315-59BC-462B-87A8-2BBB33C9503E,BH,positive +TCGA-BH-A1EO,TCGA-BH-A1EO-01Z-00-DX1.0E624888-D7E4-48DF-B51A-9AD8F75A66B7,BH,positive +TCGA-BH-A1ES,TCGA-BH-A1ES-01Z-00-DX1.C54C809F-748F-4BB0-B018-A8A83A4134C0,BH,positive +TCGA-BH-A1ET,TCGA-BH-A1ET-01Z-00-DX1.05C126CD-CC10-44BF-9A68-6CDDE97272B2,BH,positive +TCGA-BH-A1EU,TCGA-BH-A1EU-01Z-00-DX1.5A0956EF-0100-47FD-9026-1994CF22D0F1,BH,positive +TCGA-BH-A1EV,TCGA-BH-A1EV-01Z-00-DX1.106CF220-1D7D-40DD-88B2-A7F00B758F8F,BH,positive +TCGA-BH-A1EX,TCGA-BH-A1EX-01Z-00-DX1.16B6A817-6729-446E-9FCF-A4A333C5295D,BH,positive +TCGA-BH-A1EY,TCGA-BH-A1EY-01Z-00-DX1.25C3DE1F-C702-4959-8DFA-69EF78AD9307,BH,positive +TCGA-BH-A1F2,TCGA-BH-A1F2-01Z-00-DX1.17E2FD6F-0DCF-425B-864B-21ADCDAE734B,BH,positive +TCGA-BH-A1F5,TCGA-BH-A1F5-01Z-00-DX1.9A1AA3BB-80B1-473D-B075-8D65A94C4E0B,BH,positive +TCGA-BH-A1F8,TCGA-BH-A1F8-01Z-00-DX1.8BB026F7-35CB-483F-B665-4C3A3EF47E1B,BH,positive +TCGA-BH-A1FB,TCGA-BH-A1FB-01Z-00-DX1.9D778D1A-07F6-450D-B7AA-0E17B4D0A88C,BH,positive +TCGA-BH-A1FE,TCGA-BH-A1FE-01Z-00-DX1.8FB57ECF-350B-44E4-8612-63E8374D3C4B,BH,positive +TCGA-BH-A1FG,TCGA-BH-A1FG-01Z-00-DX1.DA665FB6-9DD9-4B18-A925-2EFD9BC4C43B,BH,positive +TCGA-BH-A1FH,TCGA-BH-A1FH-01Z-00-DX1.F90A691F-B6DB-4C4A-9975-9A2CB01F29E2,BH,positive +TCGA-BH-A1FL,TCGA-BH-A1FL-01Z-00-DX1.D3E9F46E-9A28-4AAE-8C87-D6F1851227A3,BH,positive +TCGA-BH-A1FM,TCGA-BH-A1FM-01Z-00-DX1.860C5169-FE2B-43A6-A13D-E736374303DD,BH,positive +TCGA-BH-A1FN,TCGA-BH-A1FN-01Z-00-DX1.CEE3C59B-6CF0-4D41-8334-6067BB5A8BF7,BH,positive +TCGA-BH-A1FR,TCGA-BH-A1FR-01Z-00-DX1.03E51FCD-DDF2-4924-827D-85A02280C9C7,BH,positive +TCGA-BH-A201,TCGA-BH-A201-01Z-00-DX1.6D6E3224-50A0-45A2-B231-EEF27CA7EFD2,BH,positive +TCGA-BH-A202,TCGA-BH-A202-01Z-00-DX1.8CECDB74-5E6F-4CE8-B52C-A89E574F38FB,BH,positive +TCGA-BH-A209,TCGA-BH-A209-01Z-00-DX1.88B11AEB-11FA-4D23-BC21-378EF177E04B,BH,positive +TCGA-BH-A28O,TCGA-BH-A28O-01Z-00-DX1.B37603E3-BC80-4DDA-B44D-B5F794E3398E,BH,positive +TCGA-BH-A28Q,TCGA-BH-A28Q-01Z-00-DX1.C6D1CA93-6B2A-4722-9270-5D56592D833D,BH,positive +TCGA-BH-A2L8,TCGA-BH-A2L8-01Z-00-DX1.ACA51CA9-3C38-48A6-B4A9-C12FFAB9AB56,BH,positive +TCGA-BH-A42T,TCGA-BH-A42T-01Z-00-DX1.6E4609B9-BE7E-4CA9-A1C0-A77A7DEFBB2B,BH,positive +TCGA-BH-A42V,TCGA-BH-A42V-01Z-00-DX1.B6D78256-05D1-45FA-826A-1B8554D60B7D,BH,positive +TCGA-BH-A5IZ,TCGA-BH-A5IZ-01Z-00-DX1.6C871030-82E1-463E-A67B-976A5F3DCDB2,BH,positive +TCGA-BH-A5J0,TCGA-BH-A5J0-01Z-00-DX1.73CFBC85-8449-48FF-86F8-D4D49A90498B,BH,positive +TCGA-C8-A12M,TCGA-C8-A12M-01Z-00-DX1.0C5F2DED-873F-4C86-97A2-2AA76A6236DF,C8,positive +TCGA-C8-A12N,TCGA-C8-A12N-01Z-00-DX1.8E50110E-A6C0-496F-B44E-7190096C113E,C8,positive +TCGA-C8-A12O,TCGA-C8-A12O-01Z-00-DX1.A77E552B-7B69-45B7-B341-A03571F2A06C,C8,positive +TCGA-C8-A12T,TCGA-C8-A12T-01Z-00-DX1.EF628C26-C570-4D15-A0C0-58B2CC40ABFC,C8,positive +TCGA-C8-A12U,TCGA-C8-A12U-01Z-00-DX1.8E90FC9F-630E-4611-A5D2-D6F11DBE81EF,C8,positive +TCGA-C8-A12W,TCGA-C8-A12W-01Z-00-DX1.3727E7BB-831E-42DB-B5CB-F2D7C649290F,C8,positive +TCGA-C8-A12X,TCGA-C8-A12X-01Z-00-DX1.2967579D-40AE-4425-BE81-5109713C16B4,C8,positive +TCGA-C8-A130,TCGA-C8-A130-01Z-00-DX1.F6810204-C35A-4461-A18A-0A117739B988,C8,positive +TCGA-C8-A132,TCGA-C8-A132-01Z-00-DX1.6CCE1FE0-BB4B-4046-BAF0-43AA110B2EBE,C8,positive +TCGA-C8-A133,TCGA-C8-A133-01Z-00-DX1.AFEE2057-2793-49E4-A477-E0DE3F6C67F9,C8,positive +TCGA-C8-A138,TCGA-C8-A138-01Z-00-DX1.845A0680-23A4-4A58-A9C4-2EE17BDBD371,C8,positive +TCGA-C8-A1HE,TCGA-C8-A1HE-01Z-00-DX1.83A5D816-E3A6-46B8-B1B8-B31486EB686E,C8,positive +TCGA-C8-A1HG,TCGA-C8-A1HG-01Z-00-DX1.F2AB6D0F-4D6B-491F-915B-92E838E79D67,C8,positive +TCGA-C8-A1HI,TCGA-C8-A1HI-01Z-00-DX1.C6D0F8B8-55ED-477F-BAF7-AA05D0449CC8,C8,positive +TCGA-C8-A1HL,TCGA-C8-A1HL-01Z-00-DX1.CC21CAE9-DE48-4ADE-A959-5A99010AAFAC,C8,positive +TCGA-C8-A1HM,TCGA-C8-A1HM-01Z-00-DX1.F93C3F56-3AA6-4D38-B51E-4694CBEA3830,C8,positive +TCGA-C8-A1HN,TCGA-C8-A1HN-01Z-00-DX1.7EBBEAB1-EBB7-456F-8848-AFE2263242B7,C8,positive +TCGA-C8-A1HO,TCGA-C8-A1HO-01Z-00-DX1.0C702CEE-C373-4D00-A706-32206D41AC17,C8,positive +TCGA-C8-A26V,TCGA-C8-A26V-01Z-00-DX1.6E86DF74-D575-4969-8142-963D9DF5208F,C8,positive +TCGA-C8-A26W,TCGA-C8-A26W-01Z-00-DX1.CFF07941-6CD1-4CF9-BE5F-387DA67B66CC,C8,positive +TCGA-C8-A26Z,TCGA-C8-A26Z-01Z-00-DX1.1A15D951-F3BD-4024-91DD-E7633C09A837,C8,positive +TCGA-C8-A273,TCGA-C8-A273-01Z-00-DX1.6E5D581F-DF80-478D-82D1-6D6525EED1B5,C8,positive +TCGA-C8-A274,TCGA-C8-A274-01Z-00-DX1.B667DAB1-C944-4285-A312-5E4DDAE0EE78,C8,positive +TCGA-C8-A27A,TCGA-C8-A27A-01Z-00-DX1.0E26C46D-CD65-40F3-8976-EB4415582934,C8,positive +TCGA-C8-A3M8,TCGA-C8-A3M8-01Z-00-DX1.97AC85E0-6281-4F2E-8034-24AD5F52D5A6,C8,positive +TCGA-C8-A8HQ,TCGA-C8-A8HQ-01Z-00-DX1.A08190B5-F17E-4057-9DBD-B2F34CAEAFDC,C8,positive +TCGA-D8-A13Y,TCGA-D8-A13Y-01Z-00-DX1.02321E77-A11E-41A5-95FE-BB897EA5CE58,D8,positive +TCGA-D8-A13Y,TCGA-D8-A13Y-01Z-00-DX2.0043D2BB-B04D-4A44-AAB1-1CAAD97AC246,D8,positive +TCGA-D8-A140,TCGA-D8-A140-01Z-00-DX1.5B9382C0-332C-4FBF-82CD-B9453D02B815,D8,positive +TCGA-D8-A140,TCGA-D8-A140-01Z-00-DX2.0C0A62BB-1FB8-47D8-8FAF-112D221F18BE,D8,positive +TCGA-D8-A141,TCGA-D8-A141-01Z-00-DX1.10F6EEB2-C920-43FF-B5A7-B5A651CBDBE3,D8,positive +TCGA-D8-A141,TCGA-D8-A141-01Z-00-DX2.DBD0D81E-28FC-4466-BDE3-94753BD6CBEB,D8,positive +TCGA-D8-A145,TCGA-D8-A145-01Z-00-DX1.34D4EC27-7FD7-4BE2-95A4-CA521CB5590E,D8,positive +TCGA-D8-A145,TCGA-D8-A145-01Z-00-DX2.B834BF47-1CD6-45EA-BB88-D8ECE1FDDC6A,D8,positive +TCGA-D8-A146,TCGA-D8-A146-01Z-00-DX1.D93FB315-E5D7-49BA-8380-38B16B70361B,D8,positive +TCGA-D8-A1J8,TCGA-D8-A1J8-01Z-00-DX1.EADAB43A-87C6-47FA-9120-25B69E23366D,D8,positive +TCGA-D8-A1J8,TCGA-D8-A1J8-01Z-00-DX2.5DCB3447-548D-442C-86B1-CEF79B8689DF,D8,positive +TCGA-D8-A1J9,TCGA-D8-A1J9-01Z-00-DX1.F81FA9EF-8129-4E17-A9AD-2B850782CC18,D8,positive +TCGA-D8-A1J9,TCGA-D8-A1J9-01Z-00-DX2.E1C59487-9563-4501-845F-2067A0C5C59B,D8,positive +TCGA-D8-A1JB,TCGA-D8-A1JB-01Z-00-DX1.6CF48257-066C-4B8C-91AA-30A6B4AD4307,D8,positive +TCGA-D8-A1JC,TCGA-D8-A1JC-01Z-00-DX1.16D32FE8-4D76-4EEC-9739-D394087605BC,D8,positive +TCGA-D8-A1JC,TCGA-D8-A1JC-01Z-00-DX2.854ABF5D-40F1-48AE-802F-97D75497F1FD,D8,positive +TCGA-D8-A1JD,TCGA-D8-A1JD-01Z-00-DX1.6D215B14-DD90-4635-8645-AF06EBD9BA3F,D8,positive +TCGA-D8-A1JD,TCGA-D8-A1JD-01Z-00-DX2.C6A7667E-25C9-48BD-A4E7-F4BB740FE589,D8,positive +TCGA-D8-A1JE,TCGA-D8-A1JE-01Z-00-DX1.714805A1-E337-46DA-88D9-6CE4B4E3C2D0,D8,positive +TCGA-D8-A1JE,TCGA-D8-A1JE-01Z-00-DX2.CCF3DFEF-E851-425A-BCD0-0F7B377A00BC,D8,positive +TCGA-D8-A1JH,TCGA-D8-A1JH-01Z-00-DX1.4A4F2502-612C-421D-9F64-444BF2C85620,D8,positive +TCGA-D8-A1JI,TCGA-D8-A1JI-01Z-00-DX1.9BDB647F-EEAB-4235-BE44-A3815A48CCE0,D8,positive +TCGA-D8-A1JI,TCGA-D8-A1JI-01Z-00-DX2.E688E270-26C9-43EC-BC26-86BF8B74A31D,D8,positive +TCGA-D8-A1JJ,TCGA-D8-A1JJ-01Z-00-DX1.a986b48f-b295-4d7a-b778-ce829cdf9c38,D8,positive +TCGA-D8-A1JJ,TCGA-D8-A1JJ-01Z-00-DX2.7D20F308-7DC6-4367-9459-3AC4C654E7F7,D8,positive +TCGA-D8-A1JM,TCGA-D8-A1JM-01Z-00-DX1.3CD15BC8-08B3-4BEE-9FD7-A8BEED33268E,D8,positive +TCGA-D8-A1JN,TCGA-D8-A1JN-01Z-00-DX1.3B02989C-CEA0-4EDD-A6DC-21C3A957C640,D8,positive +TCGA-D8-A1JP,TCGA-D8-A1JP-01Z-00-DX1.B02959C4-6AAD-4ACA-9838-61D224D838E4,D8,positive +TCGA-D8-A1JS,TCGA-D8-A1JS-01Z-00-DX1.76C592B0-DC6D-401A-9533-C9FE5B6CA08D,D8,positive +TCGA-D8-A1JT,TCGA-D8-A1JT-01Z-00-DX1.F278C419-E405-4BDA-BA50-BFBA08801168,D8,positive +TCGA-D8-A1JT,TCGA-D8-A1JT-01Z-00-DX2.8DB9BB5B-17F3-4D80-835D-872BA275FD3B,D8,positive +TCGA-D8-A1JU,TCGA-D8-A1JU-01Z-00-DX1.355D93B4-E69E-417C-B3D1-3E1AAF1E02FE,D8,positive +TCGA-D8-A1X5,TCGA-D8-A1X5-01Z-00-DX1.81B10B43-0D99-44D0-A245-D652041B8FEE,D8,positive +TCGA-D8-A1X5,TCGA-D8-A1X5-01Z-00-DX2.E4D28EE2-9C87-4613-BD2E-89FBEF960DEE,D8,positive +TCGA-D8-A1X6,TCGA-D8-A1X6-01Z-00-DX1.ABF237D4-708C-46A5-AEF8-58712E5DCC04,D8,positive +TCGA-D8-A1X6,TCGA-D8-A1X6-01Z-00-DX2.F36D68F2-70C3-4E0E-ABE6-6969170BA6DC,D8,positive +TCGA-D8-A1X7,TCGA-D8-A1X7-01Z-00-DX1.3529AEED-2827-4D94-9A8B-32F116EE49D5,D8,positive +TCGA-D8-A1X7,TCGA-D8-A1X7-01Z-00-DX2.F0631B8C-EB75-4995-8ED7-1A8972BE8997,D8,positive +TCGA-D8-A1X8,TCGA-D8-A1X8-01Z-00-DX1.DDA3866E-C7CC-4650-BE2B-5C9A98A2D531,D8,positive +TCGA-D8-A1X9,TCGA-D8-A1X9-01Z-00-DX1.28CE7849-EEC5-4ABB-A319-A977A1FD3CD1,D8,positive +TCGA-D8-A1XA,TCGA-D8-A1XA-01Z-00-DX1.36A807F0-2D35-4CE3-8A27-CC59181B1A3D,D8,positive +TCGA-D8-A1XB,TCGA-D8-A1XB-01Z-00-DX1.DA8E2FA4-DBBA-4157-8052-B90FB3BB58F1,D8,positive +TCGA-D8-A1XB,TCGA-D8-A1XB-01Z-00-DX2.B262C269-F22B-4306-8195-079C0874EF8E,D8,positive +TCGA-D8-A1XC,TCGA-D8-A1XC-01Z-00-DX1.E7494388-838B-4D35-9CA1-77017CBE77F2,D8,positive +TCGA-D8-A1XC,TCGA-D8-A1XC-01Z-00-DX2.10BA8536-D6AF-460E-9796-EB77FFE3EF5B,D8,positive +TCGA-D8-A1XD,TCGA-D8-A1XD-01Z-00-DX1.E500D561-1F49-4F08-99AE-E8345F21B406,D8,positive +TCGA-D8-A1XF,TCGA-D8-A1XF-01Z-00-DX1.8B9AB4A1-DEC5-4587-B022-3077CCD220F9,D8,positive +TCGA-D8-A1XF,TCGA-D8-A1XF-01Z-00-DX2.1460E522-5B87-4690-8B6B-3183C5D282D6,D8,positive +TCGA-D8-A1XG,TCGA-D8-A1XG-01Z-00-DX1.0A58EF43-B97E-4111-8DC3-2F8D9BDBE7D6,D8,positive +TCGA-D8-A1XJ,TCGA-D8-A1XJ-01Z-00-DX1.660DDF8D-6816-4EBE-AF37-6F7A374F4E9E,D8,positive +TCGA-D8-A1XL,TCGA-D8-A1XL-01Z-00-DX1.FDF07020-8F40-4C00-9023-E5F40E0D8A7C,D8,positive +TCGA-D8-A1XL,TCGA-D8-A1XL-01Z-00-DX2.FDE2C80D-5DC4-4743-A180-C5AFC5BB0BE2,D8,positive +TCGA-D8-A1XM,TCGA-D8-A1XM-01Z-00-DX1.5F6A140E-5D20-45F6-B76F-0F98AB27AD08,D8,positive +TCGA-D8-A1XO,TCGA-D8-A1XO-01Z-00-DX1.A9EF6AFF-62B0-4E2D-ABC7-DC06FF473890,D8,positive +TCGA-D8-A1XR,TCGA-D8-A1XR-01Z-00-DX1.7F443346-C564-47B6-9736-6944230CAF46,D8,positive +TCGA-D8-A1XR,TCGA-D8-A1XR-01Z-00-DX2.A103FB8B-4397-4DD4-8587-90A736407484,D8,positive +TCGA-D8-A1XS,TCGA-D8-A1XS-01Z-00-DX1.E5C78E5A-947C-499D-9E95-619FEEC63E69,D8,positive +TCGA-D8-A1XS,TCGA-D8-A1XS-01Z-00-DX2.ED8BBDB4-CEA6-4E47-8214-4666F3CC6E44,D8,positive +TCGA-D8-A1XU,TCGA-D8-A1XU-01Z-00-DX1.769DBC38-6F0D-425B-A904-79EBA1C119EF,D8,positive +TCGA-D8-A1XV,TCGA-D8-A1XV-01Z-00-DX1.D0D78B02-2A0D-4107-8083-D2FE1D3F1207,D8,positive +TCGA-D8-A1XV,TCGA-D8-A1XV-01Z-00-DX2.D1988E01-5FDC-4F50-8183-F0CCE74D0CB2,D8,positive +TCGA-D8-A1XY,TCGA-D8-A1XY-01Z-00-DX1.AC051FB4-1D51-449B-BF2D-9DDB4382414C,D8,positive +TCGA-D8-A1XY,TCGA-D8-A1XY-01Z-00-DX2.33D96E5C-5291-4864-B282-8BACA2043586,D8,positive +TCGA-D8-A1XZ,TCGA-D8-A1XZ-01Z-00-DX1.8E51A61D-B01C-4A52-8F5D-44D2ABCA46FC,D8,positive +TCGA-D8-A1XZ,TCGA-D8-A1XZ-01Z-00-DX2.73D59546-C003-4DED-80D5-866E7055EC79,D8,positive +TCGA-D8-A1Y0,TCGA-D8-A1Y0-01Z-00-DX1.10F40197-4174-43CC-AAD3-8CB85154FB2D,D8,positive +TCGA-D8-A1Y0,TCGA-D8-A1Y0-01Z-00-DX2.7C25E941-12A6-46CE-B1A5-488D08E35684,D8,positive +TCGA-D8-A1Y1,TCGA-D8-A1Y1-01Z-00-DX1.477390C8-2141-4ADD-813E-25220D2A71FC,D8,positive +TCGA-D8-A1Y1,TCGA-D8-A1Y1-01Z-00-DX2.B58DC955-F864-4E78-8B1A-8156E2F7D554,D8,positive +TCGA-D8-A1Y2,TCGA-D8-A1Y2-01Z-00-DX1.77451A0F-00FD-44A9-A0D0-CB7A39CE74CC,D8,positive +TCGA-D8-A1Y2,TCGA-D8-A1Y2-01Z-00-DX2.A563ABFE-18DE-4D78-BB66-9CD18D3CBE3A,D8,positive +TCGA-D8-A1Y3,TCGA-D8-A1Y3-01Z-00-DX1.8AA5F695-A06C-4DEA-AD71-16254A48B218,D8,positive +TCGA-D8-A1Y3,TCGA-D8-A1Y3-01Z-00-DX2.B9BD14AF-52A6-4241-B3B3-03F8F5DFC8DA,D8,positive +TCGA-D8-A27E,TCGA-D8-A27E-01Z-00-DX1.7950998F-7E23-4C70-A926-1ADAA5A20BED,D8,positive +TCGA-D8-A27G,TCGA-D8-A27G-01Z-00-DX1.04FCF306-67AA-4F56-AE90-44A4F5DD56D4,D8,positive +TCGA-D8-A27I,TCGA-D8-A27I-01Z-00-DX1.ACAE0048-7AF0-41C9-992B-6ACDABF80B1C,D8,positive +TCGA-D8-A27K,TCGA-D8-A27K-01Z-00-DX1.25AF7928-046B-45BF-BAD6-F34156203F5A,D8,positive +TCGA-D8-A27L,TCGA-D8-A27L-01Z-00-DX1.6572593C-B015-4F9A-8C9B-B634CAA0D3B4,D8,positive +TCGA-D8-A27N,TCGA-D8-A27N-01Z-00-DX1.85613A4F-1FFB-4091-B925-8E8A7C7A6D95,D8,positive +TCGA-D8-A27N,TCGA-D8-A27N-01Z-00-DX2.EB803DEC-438B-43A9-B906-FD7C3B9A0138,D8,positive +TCGA-D8-A27P,TCGA-D8-A27P-01Z-00-DX1.EF426996-D0DC-4418-BEC4-F7CE3C82C869,D8,positive +TCGA-D8-A27R,TCGA-D8-A27R-01Z-00-DX1.F6E2FD1C-0666-4788-8D95-A76D15907270,D8,positive +TCGA-D8-A27R,TCGA-D8-A27R-01Z-00-DX2.31F47D8F-DFD7-42AE-BBBA-7DBBA12FA97D,D8,positive +TCGA-D8-A27T,TCGA-D8-A27T-01Z-00-DX1.1E3A4D57-9CF2-4EBF-B74D-ADD7BD8CBFA5,D8,positive +TCGA-D8-A27T,TCGA-D8-A27T-01Z-00-DX2.7CCFE3AA-955B-4D77-9EDB-7B36D5A01168,D8,positive +TCGA-D8-A27V,TCGA-D8-A27V-01Z-00-DX1.F937C53B-0B55-4271-843E-2C28F72CF28E,D8,positive +TCGA-D8-A27V,TCGA-D8-A27V-01Z-00-DX2.81E4FFDF-7123-4AC3-BE24-A82564A6C34C,D8,positive +TCGA-D8-A27W,TCGA-D8-A27W-01Z-00-DX1.CE2D4FEA-284C-44AF-A70F-7956D91CCB29,D8,positive +TCGA-D8-A27W,TCGA-D8-A27W-01Z-00-DX2.4E0D2E4E-5662-4305-8FC2-C651CDDA4998,D8,positive +TCGA-D8-A3Z5,TCGA-D8-A3Z5-01Z-00-DX1.4DBF937F-623C-4212-8493-ECE3C8F555E3,D8,positive +TCGA-D8-A3Z5,TCGA-D8-A3Z5-01Z-00-DX2.6196B9FF-F09E-4D44-873D-E53BFB2BF4E1,D8,positive +TCGA-D8-A3Z5,TCGA-D8-A3Z5-01Z-00-DX3.BB83C7D4-F795-47E7-9A5C-7DBF0EB7FDAA,D8,positive +TCGA-D8-A3Z6,TCGA-D8-A3Z6-01Z-00-DX1.4076A770-5901-4325-85C3-AF7B192272F5,D8,positive +TCGA-D8-A3Z6,TCGA-D8-A3Z6-01Z-00-DX2.19C6AA07-5D58-46CD-91D0-90DD5CC84022,D8,positive +TCGA-D8-A3Z6,TCGA-D8-A3Z6-01Z-00-DX3.66566DB7-661D-4002-90C1-B15BF3425903,D8,positive +TCGA-D8-A4Z1,TCGA-D8-A4Z1-01Z-00-DX1.D39D38B5-FC9F-4298-8720-016407DC6591,D8,positive +TCGA-D8-A73U,TCGA-D8-A73U-01Z-00-DX1.6ECEF7C0-00CC-4AC2-87BD-DBFB6E0DC042,D8,positive +TCGA-D8-A73U,TCGA-D8-A73U-01Z-00-DX2.91F571C8-303C-463D-952C-263FAA5097DF,D8,positive +TCGA-D8-A73W,TCGA-D8-A73W-01Z-00-DX1.2A4B8A37-BE62-42C8-A109-800A7970FF0F,D8,positive +TCGA-D8-A73W,TCGA-D8-A73W-01Z-00-DX2.EBBBBA5A-AB7C-49EC-9F7F-14248C928F3A,D8,positive +TCGA-D8-A73X,TCGA-D8-A73X-01Z-00-DX1.5F0DF75C-594C-42DF-BE3F-E00E5E01DCD6,D8,positive +TCGA-E2-A105,TCGA-E2-A105-01Z-00-DX1.192813F9-2D7B-40C7-88D9-BFA939D43FEE,E2,positive +TCGA-E2-A106,TCGA-E2-A106-01Z-00-DX1.A8D49C06-3C93-48A6-87F8-86D646BEA28C,E2,positive +TCGA-E2-A107,TCGA-E2-A107-01Z-00-DX1.3BFCCC24-7286-488D-A9E9-98E498AD0767,E2,positive +TCGA-E2-A108,TCGA-E2-A108-01Z-00-DX1.B110ED43-08A4-476A-A658-1CA75F7C0DDE,E2,positive +TCGA-E2-A109,TCGA-E2-A109-01Z-00-DX1.FCF5E9FC-F9FE-4F5F-96DD-5628E2609BEF,E2,positive +TCGA-E2-A10A,TCGA-E2-A10A-01Z-00-DX1.98B19EF1-0DAE-4DC6-8B0E-963CFABC6724,E2,positive +TCGA-E2-A10B,TCGA-E2-A10B-01Z-00-DX1.148CFC4B-EE65-4A7E-918F-10C72F37CB0F,E2,positive +TCGA-E2-A10C,TCGA-E2-A10C-01Z-00-DX1.29A2B20B-29C6-4491-ADAB-0584C772EF25,E2,positive +TCGA-E2-A10E,TCGA-E2-A10E-01Z-00-DX1.C45030A9-CC1A-4BA7-8F62-872619C5AD5E,E2,positive +TCGA-E2-A10F,TCGA-E2-A10F-01Z-00-DX1.18F9324C-A38F-478E-95DF-B8E172D0DD07,E2,positive +TCGA-E2-A14O,TCGA-E2-A14O-01Z-00-DX1.432DB252-35D4-4B02-B0A7-39BF85A38F89,E2,positive +TCGA-E2-A14Q,TCGA-E2-A14Q-01Z-00-DX1.C19BA7FD-D986-4E3B-9A79-F3531A78F05D,E2,positive +TCGA-E2-A14S,TCGA-E2-A14S-01Z-00-DX1.5C332E99-7296-473A-9917-7EFC38A8251D,E2,positive +TCGA-E2-A14T,TCGA-E2-A14T-01Z-00-DX1.61B4C988-6D75-447B-A5F4-9DE92CEACC9F,E2,positive +TCGA-E2-A14U,TCGA-E2-A14U-01Z-00-DX1.F788A244-6C96-4CDC-AD9A-157DE951A723,E2,positive +TCGA-E2-A14V,TCGA-E2-A14V-01Z-00-DX1.D6274B5E-644C-478C-817F-B8BB9262D6F4,E2,positive +TCGA-E2-A14W,TCGA-E2-A14W-01Z-00-DX1.2AF665A9-F582-4C41-B2B8-1982648CDEC7,E2,positive +TCGA-E2-A14Y,TCGA-E2-A14Y-01Z-00-DX1.804A22A3-FD8D-4C8A-A766-48D28434DE22,E2,positive +TCGA-E2-A14Z,TCGA-E2-A14Z-01Z-00-DX1.A1344A78-1842-4578-8CDE-921E50656891,E2,positive +TCGA-E2-A152,TCGA-E2-A152-01Z-00-DX1.B0860DEB-D34B-4C5D-97F5-C1C646437424,E2,positive +TCGA-E2-A153,TCGA-E2-A153-01Z-00-DX1.CA994467-E541-4131-A9FC-DCD9944F29C4,E2,positive +TCGA-E2-A154,TCGA-E2-A154-01Z-00-DX1.01FC9B1A-8ECD-4467-9EDD-0B02E4AEEF72,E2,positive +TCGA-E2-A155,TCGA-E2-A155-01Z-00-DX1.A5AF232A-61BB-4FDC-ABF9-2FE9BA461BDC,E2,positive +TCGA-E2-A156,TCGA-E2-A156-01Z-00-DX1.CEF32B4D-A676-4230-992D-45A046E4043A,E2,positive +TCGA-E2-A15A,TCGA-E2-A15A-01Z-00-DX1.B9D22735-FE9F-46D9-84E6-0C71A1BF84D6,E2,positive +TCGA-E2-A15C,TCGA-E2-A15C-01Z-00-DX1.26E13415-1D37-43C7-9EBB-4411BE7FCE10,E2,positive +TCGA-E2-A15D,TCGA-E2-A15D-01Z-00-DX1.AA5AF847-3635-4BAF-AAC3-BADB4A1B2CB1,E2,positive +TCGA-E2-A15E,TCGA-E2-A15E-06Z-00-DX1.D394A007-91A2-4445-9B4A-1629EE684A51,E2,positive +TCGA-E2-A15F,TCGA-E2-A15F-01Z-00-DX1.F022214A-3F0C-4DA4-A0EE-735D27480A45,E2,positive +TCGA-E2-A15G,TCGA-E2-A15G-01Z-00-DX1.BAA5235E-1022-463B-A852-586ED7CA75FF,E2,positive +TCGA-E2-A15H,TCGA-E2-A15H-01Z-00-DX1.E3A9DFDC-204D-4F03-98D9-97BBBB74E840,E2,positive +TCGA-E2-A15J,TCGA-E2-A15J-01Z-00-DX1.BF7901D1-30B1-4A76-B0A5-E9B8B36EF4C9,E2,positive +TCGA-E2-A15K,TCGA-E2-A15K-01Z-00-DX1.9F424BE2-9BFE-4DFF-8CC9-10D2DADBBEA7,E2,positive +TCGA-E2-A15L,TCGA-E2-A15L-01Z-00-DX1.626032DC-D396-48E7-B888-DFEBCF7102FF,E2,positive +TCGA-E2-A15O,TCGA-E2-A15O-01Z-00-DX1.4D0F3975-93EA-4DC2-AD0C-A76A24C3AE0C,E2,positive +TCGA-E2-A15P,TCGA-E2-A15P-01Z-00-DX1.4C7F95DC-B319-4A78-8F34-8CB9166EA6A8,E2,positive +TCGA-E2-A15R,TCGA-E2-A15R-01Z-00-DX1.B58D2D9D-05EE-4D66-8DC0-9E5C6AD2E3D3,E2,positive +TCGA-E2-A15S,TCGA-E2-A15S-01Z-00-DX1.77CFA95F-F1AE-41BD-BAF8-BCA6F0DFA3A3,E2,positive +TCGA-E2-A15T,TCGA-E2-A15T-01Z-00-DX1.23913E37-53A8-4E11-9B64-CDD016F6D87A,E2,positive +TCGA-E2-A1B1,TCGA-E2-A1B1-01Z-00-DX1.7C8DF153-B09B-44C7-87B8-14591E319354,E2,positive +TCGA-E2-A1B4,TCGA-E2-A1B4-01Z-00-DX1.E585C4FB-0D3E-4160-8192-53A329648F5C,E2,positive +TCGA-E2-A1B5,TCGA-E2-A1B5-01Z-00-DX1.AB19FF3D-5C42-4D49-ABEA-D2315709B6EA,E2,positive +TCGA-E2-A1BC,TCGA-E2-A1BC-01Z-00-DX1.FD19F2F8-497F-4D7D-97C6-271DC6B75173,E2,positive +TCGA-E2-A1BD,TCGA-E2-A1BD-01Z-00-DX1.A2AFF7AD-ED47-43E4-87FE-62882BAEB8DA,E2,positive +TCGA-E2-A1IE,TCGA-E2-A1IE-01Z-00-DX1.D6BF29A7-4FBD-4F7D-B093-991FA0F49FE8,E2,positive +TCGA-E2-A1IF,TCGA-E2-A1IF-01Z-00-DX1.51F5B0AC-E4D6-439D-8168-34934471FA06,E2,positive +TCGA-E2-A1IG,TCGA-E2-A1IG-01Z-00-DX1.C894EEA1-708A-4043-8C60-3BCA98AA751E,E2,positive +TCGA-E2-A1IH,TCGA-E2-A1IH-01Z-00-DX1.816F0C54-D87E-4E23-891B-98169F5B439B,E2,positive +TCGA-E2-A1IJ,TCGA-E2-A1IJ-01Z-00-DX1.00093877-7B01-42D3-857C-987BC1F604A3,E2,positive +TCGA-E2-A1IK,TCGA-E2-A1IK-01Z-00-DX1.25C554BB-AA90-4FF4-9D68-EEC899B8A27D,E2,positive +TCGA-E2-A1IL,TCGA-E2-A1IL-01Z-00-DX1.46B6AA99-C7CE-4573-B15D-2C56A708B082,E2,positive +TCGA-E2-A1IN,TCGA-E2-A1IN-01Z-00-DX1.F63F004F-847D-41B0-BAEF-3189D4965838,E2,positive +TCGA-E2-A1IO,TCGA-E2-A1IO-01Z-00-DX1.C812D02F-3B40-496B-9EF5-7F68ADE64962,E2,positive +TCGA-E2-A1IU,TCGA-E2-A1IU-01Z-00-DX1.E2F24814-24BA-4158-8841-F27A8E100589,E2,positive +TCGA-E2-A1L6,TCGA-E2-A1L6-01Z-00-DX1.AFE87067-2BFD-42C2-9334-9DDE8AB61B49,E2,positive +TCGA-E2-A1L8,TCGA-E2-A1L8-01Z-00-DX1.842BF73D-2D64-4ABA-880D-DBE8F4C2AA19,E2,positive +TCGA-E2-A1L9,TCGA-E2-A1L9-01Z-00-DX1.F2CC1036-8EEE-4664-962E-541B8ACB10DE,E2,positive +TCGA-E2-A1LA,TCGA-E2-A1LA-01Z-00-DX1.AE49E943-B830-4ED9-B09E-E499441BC2EC,E2,positive +TCGA-E2-A2P5,TCGA-E2-A2P5-01Z-00-DX1.375B5E5D-D8D0-425C-9163-6DF26E31BBE1,E2,positive +TCGA-E2-A2P6,TCGA-E2-A2P6-01Z-00-DX1.9D8060AA-881F-49FF-AEF5-B40A3625A44A,E2,positive +TCGA-E2-A3DX,TCGA-E2-A3DX-01Z-00-DX1.75094CA2-CE51-4886-A564-BD10043A6E4C,E2,positive +TCGA-E2-A56Z,TCGA-E2-A56Z-01Z-00-DX1.EC28E279-B869-4485-A19F-C732D4CCD374,E2,positive +TCGA-E2-A570,TCGA-E2-A570-01Z-00-DX1.64E5F2CA-9396-40DC-A300-04A54657BDE6,E2,positive +TCGA-E2-A572,TCGA-E2-A572-01Z-00-DX1.7DDF1C25-58E2-4AF9-B831-598222A4E85D,E2,positive +TCGA-E2-A576,TCGA-E2-A576-01Z-00-DX1.2E045763-7EB0-48F5-8C47-E4B84287D739,E2,positive +TCGA-E2-A9RU,TCGA-E2-A9RU-01Z-00-DX1.A06BE284-B9DC-4B45-A202-A9D027AEEDD9,E2,positive +TCGA-E9-A1N4,TCGA-E9-A1N4-01Z-00-DX1.71c8d4a5-ec99-4012-9fe2-ddb3349ad5bc,E9,positive +TCGA-E9-A1N5,TCGA-E9-A1N5-01Z-00-DX1.94DE7B4B-3798-4CC2-8A64-77796A58821A,E9,positive +TCGA-E9-A1N6,TCGA-E9-A1N6-01Z-00-DX1.C0E8FFDC-2614-4DBF-B51E-4646A3919911,E9,positive +TCGA-E9-A1NA,TCGA-E9-A1NA-01Z-00-DX1.2A2FE8BB-621E-43A2-8449-F8FE730C0487,E9,positive +TCGA-E9-A1NE,TCGA-E9-A1NE-01Z-00-DX1.f332124a-cdab-4ab5-82d1-4b8b7f3c9821,E9,positive +TCGA-E9-A1NF,TCGA-E9-A1NF-01Z-00-DX1.08779060-F710-45CB-A016-56D3C5B745E0,E9,positive +TCGA-E9-A1NG,TCGA-E9-A1NG-01Z-00-DX1.30d2a611-21ad-4ba0-a03c-31f36ecb83d5,E9,positive +TCGA-E9-A1NH,TCGA-E9-A1NH-01Z-00-DX1.20FF11B7-DAF0-41C2-B7C5-449BA5268EC0,E9,positive +TCGA-E9-A1NI,TCGA-E9-A1NI-01Z-00-DX1.AF90ACEF-EDED-40E6-955B-80ACC49778F9,E9,positive +TCGA-E9-A1R2,TCGA-E9-A1R2-01Z-00-DX1.5529C856-4F4F-4F17-94A2-5CEF94A940EB,E9,positive +TCGA-E9-A227,TCGA-E9-A227-01Z-00-DX1.823062BF-3444-489B-AF91-AAD4ECAA1DC7,E9,positive +TCGA-E9-A22A,TCGA-E9-A22A-01Z-00-DX1.d986c9eb-2c54-4663-a54b-04c0756db6db,E9,positive +TCGA-E9-A22B,TCGA-E9-A22B-01Z-00-DX1.01692f98-bd7c-4b02-990c-53560276baa0,E9,positive +TCGA-E9-A22D,TCGA-E9-A22D-01Z-00-DX1.b2867437-0add-4b7d-8002-fb09ed961942,E9,positive +TCGA-E9-A22E,TCGA-E9-A22E-01Z-00-DX1.3d5b1ba6-466f-4852-b1fc-bc29ef3be9a2,E9,positive +TCGA-E9-A22H,TCGA-E9-A22H-01Z-00-DX1.0ee39fd1-b077-40ea-bd0d-8a3db474fdd5,E9,positive +TCGA-E9-A295,TCGA-E9-A295-01Z-00-DX1.ED6CC024-56E7-45A9-B41F-F84550A899F7,E9,positive +TCGA-E9-A3Q9,TCGA-E9-A3Q9-01Z-00-DX1.2D0CA5D4-CCE5-4647-BFB3-D5359F53DA83,E9,positive +TCGA-E9-A3X8,TCGA-E9-A3X8-01Z-00-DX1.D1532F80-F5F8-47AF-8A1B-B59E345F5D70,E9,positive +TCGA-E9-A54X,TCGA-E9-A54X-01Z-00-DX1.9BF3DE15-FA5A-459D-9B2E-8EB92F34228D,E9,positive +TCGA-E9-A54Y,TCGA-E9-A54Y-01Z-00-DX1.5FD70369-6514-4E5E-8BEF-65DAC5671E1A,E9,positive +TCGA-E9-A5FK,TCGA-E9-A5FK-01Z-00-DX1.3F002C99-B4CA-4068-A816-EAC370A2E8C4,E9,positive +TCGA-E9-A6HE,TCGA-E9-A6HE-01Z-00-DX1.97514DBA-9BC0-4554-9A43-99C48C6B953A,E9,positive +TCGA-EW-A1IW,TCGA-EW-A1IW-01Z-00-DX1.0DE87057-951F-4887-A2D5-485736D66E4E,EW,positive +TCGA-EW-A1IX,TCGA-EW-A1IX-01Z-00-DX1.72AE0E0F-B213-454F-8275-76A7C43FAB86,EW,positive +TCGA-EW-A1IY,TCGA-EW-A1IY-01Z-00-DX1.3FD00274-C2CD-491A-915B-23FD535A4723,EW,positive +TCGA-EW-A1IZ,TCGA-EW-A1IZ-01Z-00-DX1.81D5544E-0A7D-4A92-A3B0-4F22CFB8BBB2,EW,positive +TCGA-EW-A1J1,TCGA-EW-A1J1-01Z-00-DX1.FFC2D5DB-E6D0-4143-A1E7-0A2027846C8C,EW,positive +TCGA-EW-A1J2,TCGA-EW-A1J2-01Z-00-DX1.F1D8E593-2DF4-44C3-873D-FD3C910011E4,EW,positive +TCGA-EW-A1J3,TCGA-EW-A1J3-01Z-00-DX1.F736F6F4-0859-439A-A417-98B520C8D65A,EW,positive +TCGA-EW-A1J6,TCGA-EW-A1J6-01Z-00-DX1.F668E978-13B6-4284-9522-37C65C57A21A,EW,positive +TCGA-EW-A1OX,TCGA-EW-A1OX-01Z-00-DX1.9B19CBEE-E592-4A4D-839F-CECDCA20F4B1,EW,positive +TCGA-EW-A1OY,TCGA-EW-A1OY-01Z-00-DX1.42AF5C9A-A90F-4A58-B5D4-B615F8CD4333,EW,positive +TCGA-EW-A1OZ,TCGA-EW-A1OZ-01Z-00-DX1.9639BB06-FF02-475F-A89D-773BE12721CC,EW,positive +TCGA-EW-A1P0,TCGA-EW-A1P0-01Z-00-DX1.6C75EEBD-CE05-4DE2-8C43-8B6435F1379F,EW,positive +TCGA-EW-A1P3,TCGA-EW-A1P3-01Z-00-DX1.EF69B964-57D6-4F8E-A183-A3022AF6E835,EW,positive +TCGA-EW-A1P5,TCGA-EW-A1P5-01Z-00-DX1.B0D59A56-116E-48B3-BC90-237EA4B3F95B,EW,positive +TCGA-EW-A1P6,TCGA-EW-A1P6-01Z-00-DX1.A8024C26-6336-4858-88FD-5679795899BA,EW,positive +TCGA-EW-A1PA,TCGA-EW-A1PA-01Z-00-DX1.03B033F8-62C0-49E1-BDEA-C5217AB3460A,EW,positive +TCGA-EW-A1PC,TCGA-EW-A1PC-01Z-00-DX1.45369705-AD49-419F-A0F0-EC51473BBAB9,EW,positive +TCGA-EW-A1PD,TCGA-EW-A1PD-01Z-00-DX1.6F6A0122-A50B-4D00-9A3F-7A2502D44E38,EW,positive +TCGA-EW-A1PE,TCGA-EW-A1PE-01Z-00-DX1.8EF56824-0B37-4AD1-AF3E-7988FCBEF773,EW,positive +TCGA-EW-A1PF,TCGA-EW-A1PF-01Z-00-DX1.9420D058-65CE-4DF8-815F-EB407003096E,EW,positive +TCGA-EW-A1PG,TCGA-EW-A1PG-01Z-00-DX1.A31B2048-D04D-467C-A9E4-6A63942A58C6,EW,positive +TCGA-EW-A2FS,TCGA-EW-A2FS-01Z-00-DX1.A01C9183-2AC8-456A-B5A6-85C5BB0361D8,EW,positive +TCGA-EW-A2FV,TCGA-EW-A2FV-01Z-00-DX1.0C59AC06-DD9B-413A-8A03-B71774F662FA,EW,positive +TCGA-EW-A2FW,TCGA-EW-A2FW-01Z-00-DX1.4F948681-81F3-46C5-A4E8-FDE8A4116F7A,EW,positive +TCGA-EW-A3E8,TCGA-EW-A3E8-01Z-00-DX1.E185B2E4-3E16-4C66-90D7-FDB9A14BA00D,EW,positive +TCGA-EW-A423,TCGA-EW-A423-01Z-00-DX1.31502D62-9C83-4A9C-9682-05A185486EAA,EW,positive +TCGA-EW-A423,TCGA-EW-A423-01Z-00-DX2.5EF1CF39-600A-4ED3-A5A2-AF4435A5F8B5,EW,positive +TCGA-EW-A424,TCGA-EW-A424-01Z-00-DX1.662894F0-31FE-494B-9A9E-7C369AE23D27,EW,positive +TCGA-EW-A6S9,TCGA-EW-A6S9-01Z-00-DX1.4A8CD2D4-D528-465C-82DC-9CAE35DD424B,EW,positive +TCGA-EW-A6SA,TCGA-EW-A6SA-01Z-00-DX1.DE032DD7-169D-4AA2-A4E0-1EFC158431FA,EW,positive +TCGA-EW-A6SC,TCGA-EW-A6SC-01Z-00-DX1.C2D50E7C-3AD0-4038-9A1C-CAE54A3FBE2F,EW,positive +TCGA-GI-A2C8,TCGA-GI-A2C8-01Z-00-DX1.09BD8AC9-645A-4C8B-9B36-77D833BDBA09,GI,positive +TCGA-GM-A2D9,TCGA-GM-A2D9-01Z-00-DX1.AF4BF2DD-05FB-400B-A1BC-6E7C9B9DDF05,GM,positive +TCGA-GM-A2DA,TCGA-GM-A2DA-01Z-00-DX1.6E409F88-F654-48C8-A753-BB054037BE16,GM,positive +TCGA-GM-A2DC,TCGA-GM-A2DC-01Z-00-DX1.39F956BB-52D0-4385-9923-4D7D36D78862,GM,positive +TCGA-GM-A2DK,TCGA-GM-A2DK-01Z-00-DX1.4F81A585-9549-454C-8E15-2E4545795460,GM,positive +TCGA-GM-A2DL,TCGA-GM-A2DL-01Z-00-DX1.1CE6992E-B1CE-45AB-B2EB-75338B6FEE9D,GM,positive +TCGA-GM-A2DM,TCGA-GM-A2DM-01Z-00-DX1.652038F4-C370-40EB-A545-51062783C74C,GM,positive +TCGA-GM-A2DN,TCGA-GM-A2DN-01Z-00-DX1.593003B0-BA87-486B-B472-D6B85867D54D,GM,positive +TCGA-GM-A2DO,TCGA-GM-A2DO-01Z-00-DX1.60817A51-93B7-483D-ABC9-8ED6341C6660,GM,positive +TCGA-GM-A3NY,TCGA-GM-A3NY-01Z-00-DX1.BEED8B8F-5A1B-4CE2-BB5D-A7ED40551AE4,GM,positive +TCGA-GM-A3XG,TCGA-GM-A3XG-01Z-00-DX1.68FFB600-8573-451F-8100-D11DB091F457,GM,positive +TCGA-GM-A3XN,TCGA-GM-A3XN-01Z-00-DX1.49F5C8AC-0328-4BC2-9FE2-E7150B7A42E2,GM,positive +TCGA-GM-A4E0,TCGA-GM-A4E0-01Z-00-DX1.55A5D765-AA35-4052-9185-6872B580F0F6,GM,positive +TCGA-HN-A2OB,TCGA-HN-A2OB-01Z-00-DX1.14F1FBFB-4540-43CE-9D79-5BC628640424,HN,positive +TCGA-JL-A3YW,TCGA-JL-A3YW-01Z-00-DX1.827C5C53-9C30-4307-802A-5A7896828A7F,JL,positive +TCGA-LD-A66U,TCGA-LD-A66U-01Z-00-DX1.AEC25C62-0519-47ED-A85B-2A964BA2BA87,LD,positive +TCGA-LD-A74U,TCGA-LD-A74U-01Z-00-DX1.F3C1EBBB-4AED-49A9-A8D2-B6145E162BE4,LD,positive +TCGA-LD-A7W5,TCGA-LD-A7W5-01Z-00-DX1.C9920081-C2FE-461E-AEF8-625D4A95BE2F,LD,positive +TCGA-LD-A7W6,TCGA-LD-A7W6-01Z-00-DX1.3E125146-B447-4973-AE09-6D374970B46C,LD,positive +TCGA-LL-A440,TCGA-LL-A440-01Z-00-DX1.6E031FD6-236C-49FA-B920-4CB120C59037,LL,positive +TCGA-LL-A442,TCGA-LL-A442-01Z-00-DX1.9275EDBD-1C89-4AF3-B02B-19F613A4E083,LL,positive +TCGA-LL-A50Y,TCGA-LL-A50Y-01Z-00-DX1.7F08413F-2C59-4322-9296-84F8CD3DF619,LL,positive +TCGA-LL-A5YL,TCGA-LL-A5YL-01Z-00-DX1.164C74AB-87D5-46CC-9D96-33FE1F88DC80,LL,positive +TCGA-LL-A5YM,TCGA-LL-A5YM-01Z-00-DX1.2B8904A9-6A45-4FFD-AF15-EB81773E5B79,LL,positive +TCGA-LL-A5YN,TCGA-LL-A5YN-01Z-00-DX1.F221939B-3680-4B95-93A9-BE8599550E87,LL,positive +TCGA-LL-A5YP,TCGA-LL-A5YP-01Z-00-DX1.ADA978DF-6625-4C4D-AE0C-F589B2BA4897,LL,positive +TCGA-LL-A6FP,TCGA-LL-A6FP-01Z-00-DX1.6261398A-7288-4924-BBE2-FC1949256E40,LL,positive +TCGA-LL-A6FQ,TCGA-LL-A6FQ-01Z-00-DX1.D4FEEC16-FC67-4C4A-A4B1-F8BA34ECBCBD,LL,positive +TCGA-LL-A73Z,TCGA-LL-A73Z-01Z-00-DX1.C010142E-29C0-411D-9E0E-4B7D8A4C09BF,LL,positive +TCGA-LL-A7SZ,TCGA-LL-A7SZ-01Z-00-DX1.4DAF6421-6A1D-41C8-BFD6-859FE10CB8CC,LL,positive +TCGA-LL-A7T0,TCGA-LL-A7T0-01Z-00-DX1.B03BBA63-ACF4-4BCA-9F2B-F631F0C6A25C,LL,positive +TCGA-LL-A8F5,TCGA-LL-A8F5-01Z-00-DX1.7579425E-1425-4160-A9DD-3D50F4C5428D,LL,positive +TCGA-LL-A9Q3,TCGA-LL-A9Q3-01Z-00-DX1.94FA76F5-008A-401B-BFDC-817804BAE5F6,LL,positive +TCGA-LQ-A4E4,TCGA-LQ-A4E4-01Z-00-DX1.CF422C39-614F-4F14-813B-95BB7B171977,LQ,positive +TCGA-MS-A51U,TCGA-MS-A51U-01Z-00-DX1.490DE85A-ECE5-4E2A-9657-841BE6FFCCA0,MS,positive +TCGA-OK-A5Q2,TCGA-OK-A5Q2-01Z-00-DX1.0D169898-37C6-44CA-AC87-27887123AA6F,OK,positive +TCGA-OK-A5Q2,TCGA-OK-A5Q2-01Z-00-DX2.C828A160-87DF-4625-A8C5-2057F61D54F4,OK,positive +TCGA-OK-A5Q2,TCGA-OK-A5Q2-01Z-00-DX3.5F9215C3-E407-46F8-968E-503D7D14605C,OK,positive +TCGA-OK-A5Q2,TCGA-OK-A5Q2-01Z-00-DX4.83B45D6C-E350-4436-812F-4155D9F7D331,OK,positive +TCGA-OL-A5D8,TCGA-OL-A5D8-01Z-00-DX1.C0A75731-1DDC-4FAF-A2C5-2E2ECB23DC13,OL,positive +TCGA-OL-A5DA,TCGA-OL-A5DA-01Z-00-DX1.1B1E9CC4-7B42-43BB-A1D7-A26F1D1F8557,OL,positive +TCGA-OL-A66H,TCGA-OL-A66H-01Z-00-DX1.E54AF3FA-E59E-404C-BB83-A6FC6FC9B312,OL,positive +TCGA-OL-A66J,TCGA-OL-A66J-01Z-00-DX1.661F7F70-E4D4-4875-B8C4-556F7927F3BA,OL,positive +TCGA-OL-A66K,TCGA-OL-A66K-01Z-00-DX1.C1DC85F1-4FAE-4411-9886-11DCB5E70CC3,OL,positive +TCGA-OL-A66L,TCGA-OL-A66L-01Z-00-DX1.E01BA275-57A5-49DF-9376-1AD0BDFFF7E2,OL,positive +TCGA-OL-A66N,TCGA-OL-A66N-01Z-00-DX1.ABDA3014-7B3A-4D48-A415-CB9608491ECB,OL,positive +TCGA-OL-A66O,TCGA-OL-A66O-01Z-00-DX1.5F1E4C60-5CE8-41B4-A94D-4AA80D9253F9,OL,positive +TCGA-OL-A6VQ,TCGA-OL-A6VQ-01Z-00-DX1.E3B81163-0239-47C3-B53F-064405B58685,OL,positive +TCGA-PE-A5DC,TCGA-PE-A5DC-01Z-00-DX1.8E6953AD-C14A-43F5-B062-E31AB13C7BDF,PE,positive +TCGA-PE-A5DD,TCGA-PE-A5DD-01Z-00-DX1.E1EF11EC-B87B-49DE-A544-2E162F9AE789,PE,positive +TCGA-PE-A5DE,TCGA-PE-A5DE-01Z-00-DX1.88B354F2-5485-44AA-A057-89E40B988F69,PE,positive +TCGA-S3-A6ZF,TCGA-S3-A6ZF-01Z-00-DX1.E50205A0-63CA-49DE-8831-2A8916DEF403,S3,positive +TCGA-S3-A6ZG,TCGA-S3-A6ZG-01Z-00-DX1.659A0BFB-D99B-474F-B4C2-6590C9161BD1,S3,positive +TCGA-S3-A6ZH,TCGA-S3-A6ZH-01Z-00-DX1.E728FF79-4921-436F-BFF5-5B782C0EEB9A,S3,positive +TCGA-S3-AA0Z,TCGA-S3-AA0Z-01Z-00-DX1.4D96156A-4067-41F2-90B8-41EF7ED794CF,S3,positive +TCGA-S3-AA11,TCGA-S3-AA11-01Z-00-DX1.36B83B37-2928-4DB7-A04A-8D511F1183FD,S3,positive +TCGA-S3-AA12,TCGA-S3-AA12-01Z-00-DX1.2A991B9F-E7E6-410B-B0BF-635E3CC40C7E,S3,positive +TCGA-S3-AA12,TCGA-S3-AA12-01Z-00-DX2.4F0A4F18-41C7-4497-A7B8-5DCE610E08AD,S3,positive +TCGA-S3-AA14,TCGA-S3-AA14-01Z-00-DX1.000A865F-19E6-4018-9352-BFA54EF0CE31,S3,positive +TCGA-S3-AA17,TCGA-S3-AA17-01Z-00-DX1.9B7DC02E-ECB2-4403-A894-F418AD452E49,S3,positive +TCGA-UL-AAZ6,TCGA-UL-AAZ6-01Z-00-DX1.0488628B-C06B-4A1D-9198-EC29D0BACF6E,UL,positive +TCGA-W8-A86G,TCGA-W8-A86G-01Z-00-DX2.DEC2CC1C-3662-43C6-A5CB-EC94B343FA3D,W8,positive +TCGA-WT-AB41,TCGA-WT-AB41-01Z-00-DX1.75BDFDF2-CD87-46D1-B32C-725741CB02BE,WT,positive +TCGA-WT-AB44,TCGA-WT-AB44-01Z-00-DX1.B6ECEA7C-DA26-4B34-88CE-6834631DFA35,WT,positive +TCGA-XX-A899,TCGA-XX-A899-01Z-00-DX1.08FE27B7-73B8-4CE3-ACF4-0689C81C140B,XX,positive +TCGA-XX-A89A,TCGA-XX-A89A-01Z-00-DX1.671E2AD6-4D1A-4579-88C1-5B0B15818126,XX,positive +TCGA-Z7-A8R5,TCGA-Z7-A8R5-01Z-00-DX1.3BDB407F-514C-4131-B058-FA1E69154276,Z7,positive +TCGA-Z7-A8R6,TCGA-Z7-A8R6-01Z-00-DX1.CE4ED818-D762-4324-9DEA-2ACB38B9B0B9,Z7,positive diff --git a/datasets/gdc_manifest.tar.xz b/datasets/gdc_manifest.tar.xz new file mode 100644 index 000000000..b4ae3c1b9 Binary files /dev/null and b/datasets/gdc_manifest.tar.xz differ diff --git a/datasets/lung_adeno_squam/lung_adeno_squam.json b/datasets/lung_adeno_squam/lung_adeno_squam.json new file mode 100644 index 000000000..bf4e72043 --- /dev/null +++ b/datasets/lung_adeno_squam/lung_adeno_squam.json @@ -0,0 +1,4 @@ +{ + "name": "TCGA_LUNG", + "annotations": "./lung_labels.csv" +} \ No newline at end of file diff --git a/datasets/lung_adeno_squam/lung_labels.csv b/datasets/lung_adeno_squam/lung_labels.csv new file mode 100644 index 000000000..61109945e --- /dev/null +++ b/datasets/lung_adeno_squam/lung_labels.csv @@ -0,0 +1,942 @@ +patient,subtype,site,slide +TCGA-83-5908,adenocarcinoma,Site-28,TCGA-83-5908-01Z-00-DX1.381c8f82-61a0-4e9d-982d-1ad0af7bead9 +TCGA-62-A46V,adenocarcinoma,Site-124,TCGA-62-A46V-01Z-00-DX1.631E54D0-9E57-4932-B4EF-81820E56A95B +TCGA-44-2655,adenocarcinoma,Site-29,TCGA-44-2655-01Z-00-DX1.ee255271-780c-461c-ab23-5cd3504b5e4a +TCGA-05-4418,adenocarcinoma,Site-61,TCGA-05-4418-01Z-00-DX1.f3863ea5-564f-482f-9878-cc104cf69401 +TCGA-49-4487,adenocarcinoma,Site-69,TCGA-49-4487-01Z-00-DX1.3a3a0720-463c-430e-849b-e2f8991bdfa5 +TCGA-38-4631,adenocarcinoma,Site-130,TCGA-38-4631-01Z-00-DX1.5e0c873a-9c4c-4e0b-bf2e-e3cd8b760761 +TCGA-55-1594,adenocarcinoma,Site-67,TCGA-55-1594-01Z-00-DX1.bd90c500-7c0b-4c45-a3f7-2d9177384b1d +TCGA-75-6207,adenocarcinoma,Site-93,TCGA-75-6207-01Z-00-DX1.837B7B0F-424C-423B-9045-A905E7C1C54C +TCGA-MP-A4TD,adenocarcinoma,Site-180,TCGA-MP-A4TD-01Z-00-DX1.937DEBC9-F5D5-4682-AA9A-13D8226EE06C +TCGA-78-7537,adenocarcinoma,Site-96,TCGA-78-7537-01Z-00-DX1.e5597e41-ebba-4d6f-8a1f-15cd81d8f026 +TCGA-J2-A4AE,adenocarcinoma,Site-2,TCGA-J2-A4AE-01Z-00-DX1.42C5DE4A-7787-4E59-8969-D12503262C96 +TCGA-55-7994,adenocarcinoma,Site-67,TCGA-55-7994-01Z-00-DX1.a0858080-c471-4337-bc57-7af57e9d92d8 +TCGA-55-6978,adenocarcinoma,Site-67,TCGA-55-6978-01Z-00-DX1.2ef6d6aa-2a19-4bce-b845-cc83ac81a1c3 +TCGA-50-5066,adenocarcinoma,Site-157,TCGA-50-5066-02Z-00-DX1.A2CA045A-7956-4E04-9EC9-5856FC789110 +TCGA-55-8206,adenocarcinoma,Site-67,TCGA-55-8206-01Z-00-DX1.1216f330-c4e9-41b5-a2b7-9cafd960fa11 +TCGA-J2-8192,adenocarcinoma,Site-2,TCGA-J2-8192-01Z-00-DX1.A784F381-7906-480F-99A1-0B88005953A0 +TCGA-55-8621,adenocarcinoma,Site-67,TCGA-55-8621-01Z-00-DX1.7C519007-D59D-432A-BF4D-23D14A1C8BB6 +TCGA-73-4662,adenocarcinoma,Site-103,TCGA-73-4662-01Z-00-DX1.76e23a15-0917-44ad-9181-50100d318003 +TCGA-49-6761,adenocarcinoma,Site-69,TCGA-49-6761-01Z-00-DX1.10c577a2-f65c-4517-8a25-83d44e380f8f +TCGA-78-7160,adenocarcinoma,Site-96,TCGA-78-7160-01Z-00-DX1.f40f6c04-74b1-4cef-aabc-cbafadbdf39f +TCGA-50-6594,adenocarcinoma,Site-157,TCGA-50-6594-01Z-00-DX1.43b2005a-4245-4025-ad85-4a957f308a5c +TCGA-49-4494,adenocarcinoma,Site-69,TCGA-49-4494-01Z-00-DX2.cac5ed0a-98c3-4d37-a4f4-9596a061836a +TCGA-86-7711,adenocarcinoma,Site-9,TCGA-86-7711-01Z-00-DX1.f64dd9d8-b9ca-4d1f-9783-d1042979132d +TCGA-50-5931,adenocarcinoma,Site-157,TCGA-50-5931-01Z-00-DX1.34261ED6-7815-487C-A50C-2DAD587187B9 +TCGA-55-A57B,adenocarcinoma,Site-67,TCGA-55-A57B-01Z-00-DX1.8D3C7063-3AF6-4839-A656-F844EC0AF9DB +TCGA-NJ-A7XG,adenocarcinoma,Site-182,TCGA-NJ-A7XG-01Z-00-DX1.4A876254-653C-410B-A36C-55FC41C1DD93 +TCGA-95-7039,adenocarcinoma,Site-179,TCGA-95-7039-01Z-00-DX1.C191A46D-A667-4BF6-8E6B-3E5CBF9DF43F +TCGA-38-6178,adenocarcinoma,Site-130,TCGA-38-6178-01Z-00-DX1.5932337D-BA2B-455D-9E34-22961EFA2170 +TCGA-86-A456,adenocarcinoma,Site-9,TCGA-86-A456-01Z-00-DX1.5C7CBF9B-0AE3-4776-9434-296AA0C605CC +TCGA-64-5775,adenocarcinoma,Site-40,TCGA-64-5775-01Z-00-DX1.9A338C1A-CDB6-41AF-B6D5-EDE608B5E3E1 +TCGA-62-A46O,adenocarcinoma,Site-124,TCGA-62-A46O-01Z-00-DX1.39B361AD-ED1D-4C0C-8611-0E8CCBDBD252 +TCGA-78-7542,adenocarcinoma,Site-96,TCGA-78-7542-01Z-00-DX1.2ee042a6-11ee-4668-b254-44144fca4234 +TCGA-99-7458,adenocarcinoma,Site-175,TCGA-99-7458-01Z-00-DX1.10ea0b2c-c763-40d1-83a4-3d4ae957fdb0 +TCGA-55-A4DG,adenocarcinoma,Site-67,TCGA-55-A4DG-01Z-00-DX1.9CE9B7BE-48EF-44F1-9C25-F15700A3E5DE +TCGA-05-4433,adenocarcinoma,Site-61,TCGA-05-4433-01Z-00-DX1.5843D3CC-CB24-4A0A-BB65-78A5DAA11D84 +TCGA-49-4507,adenocarcinoma,Site-69,TCGA-49-4507-01Z-00-DX1.1df4e286-ea89-4c38-ad53-a26e5b7b0402 +TCGA-J2-A4AG,adenocarcinoma,Site-2,TCGA-J2-A4AG-01Z-00-DX1.487780B8-18E7-494A-8054-5637239870EB +TCGA-86-7953,adenocarcinoma,Site-9,TCGA-86-7953-01Z-00-DX1.847865ce-df59-4677-bac2-ee88a258fe4e +TCGA-97-A4M1,adenocarcinoma,Site-181,TCGA-97-A4M1-01Z-00-DX1.1C4B8247-BA3F-4841-BC9D-554F03F7894C +TCGA-MP-A4T8,adenocarcinoma,Site-180,TCGA-MP-A4T8-01Z-00-DX1.3ECB340F-66C7-4F64-8E00-463BF82AAD83 +TCGA-64-5779,adenocarcinoma,Site-40,TCGA-64-5779-01Z-00-DX1.B3268447-DCE0-4927-B5A9-EC4D61E9B17D +TCGA-55-8299,adenocarcinoma,Site-67,TCGA-55-8299-01Z-00-DX1.6F652EC5-EBB8-499C-A66D-E0951FBFCAA2 +TCGA-55-6987,adenocarcinoma,Site-67,TCGA-55-6987-01Z-00-DX1.0c52b721-2209-4818-af8f-b22d37e6e81e +TCGA-44-6778,adenocarcinoma,Site-29,TCGA-44-6778-01Z-00-DX1.5e932267-cc68-4a8e-a660-fe2cc0c59ff4 +TCGA-49-AARO,adenocarcinoma,Site-69,TCGA-49-AARO-01Z-00-DX1.FB8C82DC-F823-43CF-A8EA-1208C767AF54 +TCGA-55-6985,adenocarcinoma,Site-67,TCGA-55-6985-01Z-00-DX1.428ad69e-c0ea-445e-8d38-fa2dab6eb81f +TCGA-75-5146,adenocarcinoma,Site-93,TCGA-75-5146-01Z-00-DX1.4958A631-7E6F-4FBB-A1C3-B8F8368D46C5 +TCGA-69-7764,adenocarcinoma,Site-178,TCGA-69-7764-01Z-00-DX1.be9fec48-68eb-4c27-a028-5315d7e368e0 +TCGA-69-7761,adenocarcinoma,Site-178,TCGA-69-7761-01Z-00-DX1.aa7ef5ef-8162-4961-a923-3a29646efba3 +TCGA-75-7031,adenocarcinoma,Site-93,TCGA-75-7031-01Z-00-DX1.B058D81B-E869-4AEA-8CAC-4B6264FB86A8 +TCGA-05-4424,adenocarcinoma,Site-61,TCGA-05-4424-01Z-00-DX1.4892016C-1A96-4A72-ACB8-FFD282FC4E69 +TCGA-49-AARN,adenocarcinoma,Site-69,TCGA-49-AARN-01Z-00-DX1.E13AA041-978E-427D-AFA5-D3180224A03A +TCGA-L9-A50W,adenocarcinoma,Site-23,TCGA-L9-A50W-01Z-00-DX1.21FAAB58-58EF-4A6A-8F0A-16D9AB9B1056 +TCGA-55-8092,adenocarcinoma,Site-67,TCGA-55-8092-01Z-00-DX1.04e44341-c4bb-4b0b-965c-7ec83f3877a6 +TCGA-67-3771,adenocarcinoma,Site-108,TCGA-67-3771-01Z-00-DX1.7B54D132-6AB7-4F2D-8BFF-33AFE83CF204 +TCGA-05-4430,adenocarcinoma,Site-61,TCGA-05-4430-01Z-00-DX1.95659bbb-3091-4370-bc1d-6c6c1baa7b3d +TCGA-44-6775,adenocarcinoma,Site-29,TCGA-44-6775-01Z-00-DX1.38d3bc6d-8bb5-4126-8d61-d39e6a1d7bcf +TCGA-93-A4JQ,adenocarcinoma,Site-183,TCGA-93-A4JQ-01Z-00-DX1.24573112-917C-43E9-B0FD-763C290AA88A +TCGA-78-7146,adenocarcinoma,Site-96,TCGA-78-7146-01Z-00-DX1.830209b2-94c5-4277-a7b6-b2f2634de0dc +TCGA-86-A4D0,adenocarcinoma,Site-9,TCGA-86-A4D0-01Z-00-DX1.165461AD-A8DA-4B7B-87CE-F5DBDBD2C0A7 +TCGA-NJ-A4YI,adenocarcinoma,Site-182,TCGA-NJ-A4YI-01Z-00-DX1.C4111D01-27BF-486F-8E0E-C8053DB16133 +TCGA-44-A4SU,adenocarcinoma,Site-29,TCGA-44-A4SU-01Z-00-DX1.AE2607C1-B208-4557-ADB4-6AAF03C6D3A9 +TCGA-86-7713,adenocarcinoma,Site-9,TCGA-86-7713-01Z-00-DX1.23f9e213-b566-47bb-beb8-7d12b2f0508b +TCGA-55-A48Z,adenocarcinoma,Site-67,TCGA-55-A48Z-01Z-00-DX1.0867DC6A-2A51-4CF1-AE3F-0526CE2DD740 +TCGA-86-7955,adenocarcinoma,Site-9,TCGA-86-7955-01Z-00-DX1.ef4f4d94-5efb-4a07-97cf-b0ed69085827 +TCGA-05-4415,adenocarcinoma,Site-61,TCGA-05-4415-01Z-00-DX1.55E0C429-B308-4962-8DA9-41D7D3F7764E +TCGA-86-6851,adenocarcinoma,Site-9,TCGA-86-6851-01Z-00-DX1.0b13e600-fd7b-44a2-9ec2-26e9938fb7bc +TCGA-75-6206,adenocarcinoma,Site-93,TCGA-75-6206-01Z-00-DX1.63230D94-7015-4247-936D-E7ACD6692AB3 +TCGA-49-4514,adenocarcinoma,Site-69,TCGA-49-4514-01Z-00-DX1.9c304807-2a0c-44f7-97cc-fbd30f2c740f +TCGA-97-A4M0,adenocarcinoma,Site-181,TCGA-97-A4M0-01Z-00-DX1.BDDFB04C-2475-485A-9A85-C4B9EB15155E +TCGA-L9-A7SV,adenocarcinoma,Site-23,TCGA-L9-A7SV-01Z-00-DX1.153B8E2D-54CE-4747-A22E-7A6ADCA03DB5 +TCGA-05-4249,adenocarcinoma,Site-61,TCGA-05-4249-01Z-00-DX1.9fce0297-cc19-4c04-872c-4466ee4024db +TCGA-55-A490,adenocarcinoma,Site-67,TCGA-55-A490-01Z-00-DX1.07D77502-7216-4C23-9BB8-8DE99AECC920 +TCGA-67-3772,adenocarcinoma,Site-108,TCGA-67-3772-01Z-00-DX1.CB05C5CC-2801-4BB9-94A0-83E9005BCA2E +TCGA-78-8648,adenocarcinoma,Site-96,TCGA-78-8648-01Z-00-DX1.D3EAEF8E-E739-490F-BF48-A4796B0C0E7A +TCGA-49-AAQV,adenocarcinoma,Site-69,TCGA-49-AAQV-01Z-00-DX1.4CC37621-C952-4684-96E3-9B97CDAD8056 +TCGA-99-8033,adenocarcinoma,Site-175,TCGA-99-8033-01Z-00-DX1.06f8ca81-c0a7-4ab5-b2fe-d5fc6671e29d +TCGA-MP-A4T4,adenocarcinoma,Site-180,TCGA-MP-A4T4-01Z-00-DX1.5F72FF0B-59C0-4E95-A8C1-BF22A948F7AC +TCGA-53-7813,adenocarcinoma,Site-149,TCGA-53-7813-01Z-00-DX1.97c5eff3-ceeb-499b-b2e1-bcd3cd60c5e7 +TCGA-05-4402,adenocarcinoma,Site-61,TCGA-05-4402-01Z-00-DX1.c653ddc2-88c1-45ac-88e7-4e512b8e8d53 +TCGA-49-6745,adenocarcinoma,Site-69,TCGA-49-6745-01Z-00-DX2.3b8e947b-885e-486f-9edc-f2b247bdf95b +TCGA-78-7633,adenocarcinoma,Site-96,TCGA-78-7633-01Z-00-DX1.f0cd4825-eb92-4098-954a-fbb1e4851660 +TCGA-O1-A52J,adenocarcinoma,Site-176,TCGA-O1-A52J-01Z-00-DX1.26F6ECCA-D614-4950-98E6-4D76E82F71B4 +TCGA-73-A9RS,adenocarcinoma,Site-103,TCGA-73-A9RS-01Z-00-DX1.EDCEFE41-61E2-48C9-B8D5-28B55372E0CA +TCGA-55-A493,adenocarcinoma,Site-67,TCGA-55-A493-01Z-00-DX1.1CF94B85-7A64-4213-9B04-16504512DE2E +TCGA-44-5644,adenocarcinoma,Site-29,TCGA-44-5644-01Z-00-DX1.7e4edaa7-25c6-49f4-81bc-f50d6c5fac38 +TCGA-69-8255,adenocarcinoma,Site-178,TCGA-69-8255-01Z-00-DX1.75d1666c-c088-405c-80da-52d84b990361 +TCGA-55-A48Y,adenocarcinoma,Site-67,TCGA-55-A48Y-01Z-00-DX1.D8735347-E060-439F-B5A1-1E47A958A392 +TCGA-55-5899,adenocarcinoma,Site-67,TCGA-55-5899-01Z-00-DX1.faa65b08-150c-4c74-95aa-1e8743f0152c +TCGA-05-4422,adenocarcinoma,Site-61,TCGA-05-4422-01Z-00-DX1.4802093f-71ea-43d5-bd92-b4bd7fc8c5bf +TCGA-J2-8194,adenocarcinoma,Site-2,TCGA-J2-8194-01Z-00-DX1.7700924D-B6AF-46A7-A7D7-B5C17A66C5F7 +TCGA-97-A4M7,adenocarcinoma,Site-181,TCGA-97-A4M7-01Z-00-DX2.9D0A8A7E-5A84-439B-B3D1-DFA22BC6ED87 +TCGA-44-7672,adenocarcinoma,Site-29,TCGA-44-7672-01Z-00-DX1.71f4a0c6-a9cc-481c-ace8-c8c314952aa1 +TCGA-44-6147,adenocarcinoma,Site-29,TCGA-44-6147-01Z-00-DX1.a34a9f49-e86b-43ee-9e2d-7ef35da0aecf +TCGA-73-4675,adenocarcinoma,Site-103,TCGA-73-4675-01Z-00-DX1.1027587f-8ec4-4b8c-8768-ff7b669683e0 +TCGA-55-6986,adenocarcinoma,Site-67,TCGA-55-6986-01Z-00-DX1.ae52d41e-3cd6-4d36-b04b-2ea7d29bd8da +TCGA-55-7907,adenocarcinoma,Site-67,TCGA-55-7907-01Z-00-DX1.f5a264f5-3926-4d09-82f3-cd38d554f3db +TCGA-44-7670,adenocarcinoma,Site-29,TCGA-44-7670-01Z-00-DX1.2a981dd6-0063-438e-b20e-b180b16e2b23 +TCGA-78-7147,adenocarcinoma,Site-96,TCGA-78-7147-01Z-00-DX1.a9fd2c5d-b8f3-4524-bd64-6c3362ce373b +TCGA-78-7159,adenocarcinoma,Site-96,TCGA-78-7159-01Z-00-DX1.399d865f-f4f9-4602-b15b-908be0afe459 +TCGA-J2-A4AD,adenocarcinoma,Site-2,TCGA-J2-A4AD-01Z-00-DX1.C3C3B0C6-0666-4CCF-8422-345560B237DA +TCGA-55-8619,adenocarcinoma,Site-67,TCGA-55-8619-01Z-00-DX1.46C7678A-6032-4C95-B535-85C87B92B6ED +TCGA-44-6779,adenocarcinoma,Site-29,TCGA-44-6779-01Z-00-DX1.40f10ad8-55f3-4043-a89b-37d33a26dd8c +TCGA-L9-A443,adenocarcinoma,Site-23,TCGA-L9-A443-01Z-00-DX1.9F68A13A-2865-41BF-88A7-B0DF75961C6F +TCGA-86-7714,adenocarcinoma,Site-9,TCGA-86-7714-01Z-00-DX1.8ee03c3b-013e-4dd5-aa0e-9c1a771cbdc5 +TCGA-69-7763,adenocarcinoma,Site-178,TCGA-69-7763-01Z-00-DX1.33203e8a-43e9-4e3e-98ba-1ac08b80a44f +TCGA-97-7553,adenocarcinoma,Site-181,TCGA-97-7553-01Z-00-DX1.e31eec5e-0053-407a-b057-9b87555b122d +TCGA-97-A4M5,adenocarcinoma,Site-181,TCGA-97-A4M5-01Z-00-DX1.283AFBA6-A349-425F-A18E-5CA186084C23 +TCGA-86-8668,adenocarcinoma,Site-9,TCGA-86-8668-01Z-00-DX1.d720d486-02c7-4f98-8feb-e0e50a12c158 +TCGA-67-6215,adenocarcinoma,Site-108,TCGA-67-6215-01Z-00-DX1.c014ffaf-6db8-42ef-a50c-9de98d6a5345 +TCGA-73-4668,adenocarcinoma,Site-103,TCGA-73-4668-01Z-00-DX1.3470b27e-210f-4ff3-9dde-69aa85faa9ee +TCGA-55-6642,adenocarcinoma,Site-67,TCGA-55-6642-01Z-00-DX1.0487cfc6-8216-4da3-aae1-a35452e9e4f6 +TCGA-55-8512,adenocarcinoma,Site-67,TCGA-55-8512-01Z-00-DX1.FD4B9CF5-1955-420E-A955-6D284772D2BE +TCGA-49-6743,adenocarcinoma,Site-69,TCGA-49-6743-01Z-00-DX2.f6b71e89-19ff-4d9b-a3f1-3a52949f1dc7 +TCGA-44-5643,adenocarcinoma,Site-29,TCGA-44-5643-01Z-00-DX1.83fb7631-0ecb-4473-bfb2-c8daa2f87d35 +TCGA-55-7816,adenocarcinoma,Site-67,TCGA-55-7816-01Z-00-DX1.A58C6196-E1A0-48B3-9B77-2EDCD7A43CBA +TCGA-44-4112,adenocarcinoma,Site-29,TCGA-44-4112-01Z-00-DX1.5380b3b4-0e24-4c6c-aa8c-298c5fbe068f +TCGA-78-7145,adenocarcinoma,Site-96,TCGA-78-7145-01Z-00-DX1.ecde66bf-2de8-4e31-b3ab-4fed4f055dda +TCGA-55-7284,adenocarcinoma,Site-67,TCGA-55-7284-01Z-00-DX1.68b95b9b-1aab-4f03-aad3-1132467b7499 +TCGA-55-1596,adenocarcinoma,Site-67,TCGA-55-1596-01Z-00-DX1.de65522d-92ab-408e-9b7e-24210d386b97 +TCGA-95-7947,adenocarcinoma,Site-179,TCGA-95-7947-01Z-00-DX1.44291078-2D8E-4E79-8FAB-439E52BAAD60 +TCGA-95-7948,adenocarcinoma,Site-179,TCGA-95-7948-01Z-00-DX1.FF1AEF38-19E5-4475-87C8-B318204FEF1D +TCGA-93-A4JP,adenocarcinoma,Site-183,TCGA-93-A4JP-01Z-00-DX1.994E229B-5251-4F61-8C41-863047F3A136 +TCGA-44-3917,adenocarcinoma,Site-29,TCGA-44-3917-01Z-00-DX1.cf5b5b49-de5e-4f2e-90b4-b138f55560a9 +TCGA-86-8669,adenocarcinoma,Site-9,TCGA-86-8669-01Z-00-DX1.845b8b4e-6445-4da1-8275-42b765c024cc +TCGA-75-7025,adenocarcinoma,Site-93,TCGA-75-7025-01Z-00-DX1.284477D7-4F47-4084-A15C-0A7E9571E3DF +TCGA-NJ-A55R,adenocarcinoma,Site-182,TCGA-NJ-A55R-01Z-00-DX1.2E2B3642-4E1C-47DB-AF7B-988D586C0986 +TCGA-86-8281,adenocarcinoma,Site-9,TCGA-86-8281-01Z-00-DX1.02237263-003b-4de1-8a2f-0f3ad14768c7 +TCGA-44-2659,adenocarcinoma,Site-29,TCGA-44-2659-01Z-00-DX1.b33f40e6-bc18-48b0-8dad-bdb0a6a200b5 +TCGA-97-7554,adenocarcinoma,Site-181,TCGA-97-7554-01Z-00-DX1.4107ca63-4642-42b9-b6f2-13778ecaf9a7 +TCGA-55-8513,adenocarcinoma,Site-67,TCGA-55-8513-01Z-00-DX1.E21776FE-DF49-48D1-90D6-6F71D0764492 +TCGA-86-8075,adenocarcinoma,Site-9,TCGA-86-8075-01Z-00-DX1.171bd8bd-af24-4770-b24b-732e675efd75 +TCGA-50-6673,adenocarcinoma,Site-157,TCGA-50-6673-01Z-00-DX1.e24fb432-3fff-4f00-a316-16d05275c1e7 +TCGA-55-7573,adenocarcinoma,Site-67,TCGA-55-7573-01Z-00-DX1.43a4bbd2-6a3b-4910-9356-2d750a736817 +TCGA-64-5781,adenocarcinoma,Site-40,TCGA-64-5781-01Z-00-DX1.81474D4D-6B48-473F-BAA9-ABD75C9223C3 +TCGA-55-8204,adenocarcinoma,Site-67,TCGA-55-8204-01Z-00-DX1.30ba69f3-53f1-41cc-826c-20dce3cfe86b +TCGA-78-7156,adenocarcinoma,Site-96,TCGA-78-7156-01Z-00-DX1.9f7c8eb9-73de-49be-a487-e1fd4411b7b5 +TCGA-86-8674,adenocarcinoma,Site-9,TCGA-86-8674-01Z-00-DX1.9fcaae4e-4062-45ab-a49d-7dba43040919 +TCGA-67-3770,adenocarcinoma,Site-108,TCGA-67-3770-01Z-00-DX1.4C14BE72-905C-4AAE-B07A-6437D98A23FA +TCGA-97-8177,adenocarcinoma,Site-181,TCGA-97-8177-01Z-00-DX1.4c2ddf48-40d3-4fa0-8b5f-0485e8270ac7 +TCGA-50-5055,adenocarcinoma,Site-157,TCGA-50-5055-01Z-00-DX2.446EC3BB-2ED7-4253-8A39-AC68331F08E7 +TCGA-97-8552,adenocarcinoma,Site-181,TCGA-97-8552-01Z-00-DX1.69631717-0E05-49C1-B112-15C2E36BC2F8 +TCGA-95-A4VP,adenocarcinoma,Site-179,TCGA-95-A4VP-01Z-00-DX1.1B145AC0-2471-4974-BBB6-C88BDF0BE8BD +TCGA-55-8091,adenocarcinoma,Site-67,TCGA-55-8091-01Z-00-DX1.0996c58a-6e93-4092-8cb8-014d548fe60c +TCGA-55-7910,adenocarcinoma,Site-67,TCGA-55-7910-01Z-00-DX1.041e15d5-4852-49f6-85d9-74b510497651 +TCGA-55-8301,adenocarcinoma,Site-67,TCGA-55-8301-01Z-00-DX1.2A66E0FC-84F2-4DF8-B06E-D7CDD7AA82C7 +TCGA-55-6983,adenocarcinoma,Site-67,TCGA-55-6983-01Z-00-DX1.8f940a64-1f1b-4e6e-99ea-418175be2b3f +TCGA-86-8358,adenocarcinoma,Site-9,TCGA-86-8358-01Z-00-DX1.C50100F8-9414-4A06-BEDF-93E7B01B24D6 +TCGA-44-6146,adenocarcinoma,Site-29,TCGA-44-6146-01Z-00-DX1.8db8c567-690e-4df3-98a9-f241b6a8d811 +TCGA-38-A44F,adenocarcinoma,Site-130,TCGA-38-A44F-01Z-00-DX1.B9958072-D8BB-40DD-B28E-2E83CE2B550B +TCGA-55-7283,adenocarcinoma,Site-67,TCGA-55-7283-01Z-00-DX1.0f716216-0a46-404e-a4d4-5bb6d7e3f0d1 +TCGA-05-4432,adenocarcinoma,Site-61,TCGA-05-4432-01Z-00-DX1.90d00d62-402b-404a-85e2-0edf869193ab +TCGA-44-A479,adenocarcinoma,Site-29,TCGA-44-A479-01Z-00-DX1.CA5654C6-A623-452E-A8AC-DB6279CA97B1 +TCGA-55-7728,adenocarcinoma,Site-67,TCGA-55-7728-01Z-00-DX1.1d47a5fe-cab5-4d5a-a62b-16f345334d25 +TCGA-62-A46S,adenocarcinoma,Site-124,TCGA-62-A46S-01Z-00-DX1.7A8A6F38-76EA-4F43-BC03-1C42E6787E33 +TCGA-44-7671,adenocarcinoma,Site-29,TCGA-44-7671-01Z-00-DX1.00297baa-00e7-4894-a7b8-c9ac2a9d42d8 +TCGA-97-A4LX,adenocarcinoma,Site-181,TCGA-97-A4LX-01Z-00-DX1.B2334672-A626-4E74-B4D1-A79F70F65909 +TCGA-44-6777,adenocarcinoma,Site-29,TCGA-44-6777-01Z-00-DX1.aca7b23d-a601-4476-8bfd-eb46f0b6a96e +TCGA-78-7152,adenocarcinoma,Site-96,TCGA-78-7152-01Z-00-DX1.a04f9d84-3304-402c-9c0c-f64545c14781 +TCGA-69-7974,adenocarcinoma,Site-178,TCGA-69-7974-01Z-00-DX1.e4282067-fc3d-4fee-bae7-fe99000979d4 +TCGA-55-6980,adenocarcinoma,Site-67,TCGA-55-6980-01Z-00-DX1.2f1d1858-42b9-4e3b-966d-2f3f429570a6 +TCGA-55-6975,adenocarcinoma,Site-67,TCGA-55-6975-01Z-00-DX1.2762a872-b1c3-47fd-8ed5-b715a5b20163 +TCGA-69-7979,adenocarcinoma,Site-178,TCGA-69-7979-01Z-00-DX1.c9bc265d-4889-4333-9852-b8b535887f1e +TCGA-69-8453,adenocarcinoma,Site-178,TCGA-69-8453-01Z-00-DX1.C472096B-95BA-42CD-ADEB-326C38F9DC95 +TCGA-44-6776,adenocarcinoma,Site-29,TCGA-44-6776-01Z-00-DX1.551e200f-5f40-4b95-8b2f-7c6000a21be0 +TCGA-NJ-A4YQ,adenocarcinoma,Site-182,TCGA-NJ-A4YQ-01Z-00-DX1.56C29975-77E9-4822-A9E1-83AD70D6C5F7 +TCGA-67-6217,adenocarcinoma,Site-108,TCGA-67-6217-01Z-00-DX1.3fa5c69b-60d1-41fb-a86f-48f77d81285f +TCGA-38-4627,adenocarcinoma,Site-130,TCGA-38-4627-01Z-00-DX1.fe406eb9-b38b-410d-aa1a-84ab8ac091c7 +TCGA-44-7667,adenocarcinoma,Site-29,TCGA-44-7667-01Z-00-DX1.2be2e379-7368-4fa7-a619-ace84febecb3 +TCGA-44-6145,adenocarcinoma,Site-29,TCGA-44-6145-01Z-00-DX1.175ad655-b054-407d-bb22-3a4f85849488 +TCGA-55-6712,adenocarcinoma,Site-67,TCGA-55-6712-01Z-00-DX1.8cd4b879-294e-4da0-8636-704f61b510dc +TCGA-L9-A8F4,adenocarcinoma,Site-23,TCGA-L9-A8F4-01Z-00-DX1.E2BBB8DE-94E2-4781-9B55-8A4CFBF8A69D +TCGA-78-7155,adenocarcinoma,Site-96,TCGA-78-7155-01Z-00-DX1.32258467-7550-4a1b-8df9-a869fb83e559 +TCGA-55-8203,adenocarcinoma,Site-67,TCGA-55-8203-01Z-00-DX1.f530d261-be19-4ff0-98d0-27b789dffb23 +TCGA-55-6982,adenocarcinoma,Site-67,TCGA-55-6982-01Z-00-DX1.055aeec1-2fc1-4796-b5df-c159335e0300 +TCGA-64-1678,adenocarcinoma,Site-40,TCGA-64-1678-01Z-00-DX1.92C851BD-37AC-4C4C-9920-DF394411CC2C +TCGA-95-8494,adenocarcinoma,Site-179,TCGA-95-8494-01Z-00-DX1.716299EF-71BB-4095-8F4D-F0C2252CE594 +TCGA-38-4630,adenocarcinoma,Site-130,TCGA-38-4630-01Z-00-DX1.d08d0193-c5ed-4db8-b236-1a4c3c3177b8 +TCGA-55-7281,adenocarcinoma,Site-67,TCGA-55-7281-01Z-00-DX1.a92dafa4-3a65-4f5c-97f0-83aff231274a +TCGA-55-8507,adenocarcinoma,Site-67,TCGA-55-8507-01Z-00-DX1.3C6198CF-3E13-4C4C-B851-5BDFF8552621 +TCGA-55-8510,adenocarcinoma,Site-67,TCGA-55-8510-01Z-00-DX1.BB1EAC72-6215-400B-BCBF-E3D51A60182D +TCGA-55-7903,adenocarcinoma,Site-67,TCGA-55-7903-01Z-00-DX1.40fc5dcb-b5bb-46b4-90bd-2eeb3ea880f5 +TCGA-86-8671,adenocarcinoma,Site-9,TCGA-86-8671-01Z-00-DX1.1fd0019c-df9d-48cc-9055-8de57759f273 +TCGA-44-3398,adenocarcinoma,Site-29,TCGA-44-3398-01Z-00-DX1.74757c91-a0c6-4e7f-b2db-8748c68ffa44 +TCGA-55-6970,adenocarcinoma,Site-67,TCGA-55-6970-01Z-00-DX1.be042a9b-3eab-4b6c-9cd7-7486ffab037f +TCGA-73-7498,adenocarcinoma,Site-103,TCGA-73-7498-01Z-00-DX1.5313bba9-7f6a-49b4-8d6e-a0f46a04f189 +TCGA-64-1680,adenocarcinoma,Site-40,TCGA-64-1680-01Z-00-DX1.9CB14F20-E7B0-49A1-A327-404B3160FD65 +TCGA-93-A4JO,adenocarcinoma,Site-183,TCGA-93-A4JO-01Z-00-DX1.314C53B6-603C-402B-8CF7-E3CE851A0A0F +TCGA-55-8302,adenocarcinoma,Site-67,TCGA-55-8302-01Z-00-DX1.E0C2B152-BBA9-4AA7-9E6B-BD57B68163CB +TCGA-78-7166,adenocarcinoma,Site-96,TCGA-78-7166-01Z-00-DX1.d19ad2d9-b006-4a13-ba54-2fec234c2373 +TCGA-44-3919,adenocarcinoma,Site-29,TCGA-44-3919-01Z-00-DX1.41144972-46a1-4b25-9faa-ed1b9d1996d2 +TCGA-49-4506,adenocarcinoma,Site-69,TCGA-49-4506-01Z-00-DX1.b13b8ef2-fd06-4309-a767-696a9afeaedd +TCGA-93-A4JN,adenocarcinoma,Site-183,TCGA-93-A4JN-01Z-00-DX1.ED4C9365-6CCF-4AEE-B4C9-3CC5EC57286C +TCGA-MP-A4T2,adenocarcinoma,Site-180,TCGA-MP-A4T2-01Z-00-DX1.ADE8B45C-26F3-465E-ADB6-2C36E6326974 +TCGA-05-4427,adenocarcinoma,Site-61,TCGA-05-4427-01Z-00-DX1.36CF4E93-3FBB-4346-A26B-85F0D9D0575D +TCGA-97-8175,adenocarcinoma,Site-181,TCGA-97-8175-01Z-00-DX1.f7148cec-5e4b-46ac-b167-d41b36eb3ee5 +TCGA-MN-A4N1,adenocarcinoma,Site-14,TCGA-MN-A4N1-01Z-00-DX2.9B0852C4-16BF-4962-B86F-E2570E48A89E +TCGA-93-7348,adenocarcinoma,Site-183,TCGA-93-7348-01Z-00-DX1.d657dc43-ca6c-404a-8ebe-6e6b78a8d1d1 +TCGA-L9-A5IP,adenocarcinoma,Site-23,TCGA-L9-A5IP-01Z-00-DX1.E699D989-8BB8-41DF-9B95-A880858653E6 +TCGA-44-2664,adenocarcinoma,Site-29,TCGA-44-2664-01Z-00-DX1.35c75ae5-eee0-4e69-a13a-31b5fad9869a +TCGA-55-6968,adenocarcinoma,Site-67,TCGA-55-6968-01Z-00-DX1.810e0fee-99ab-465b-8ae9-fb416e4dca3c +TCGA-78-7535,adenocarcinoma,Site-96,TCGA-78-7535-01Z-00-DX1.c4ca06f3-22d1-4e39-85c8-98d1fa2b0e60 +TCGA-NJ-A4YF,adenocarcinoma,Site-182,TCGA-NJ-A4YF-01Z-00-DX1.4A3DE76A-9F1E-4766-A013-5E7E3A665071 +TCGA-86-8074,adenocarcinoma,Site-9,TCGA-86-8074-01Z-00-DX1.0c34b434-8701-4060-a4ea-08a72371ee1e +TCGA-MP-A4SY,adenocarcinoma,Site-180,TCGA-MP-A4SY-01Z-00-DX1.9535274F-C850-409E-80A1-B933B2F1A110 +TCGA-86-8280,adenocarcinoma,Site-9,TCGA-86-8280-01Z-00-DX1.52627b97-1cfa-4382-819b-949b58c0f995 +TCGA-86-8073,adenocarcinoma,Site-9,TCGA-86-8073-01Z-00-DX1.33c016fc-5c9e-4ad6-8de2-a7f8521d205c +TCGA-44-3396,adenocarcinoma,Site-29,TCGA-44-3396-01Z-00-DX1.187b0a68-ad51-4fcd-819c-b74a8599dbe9 +TCGA-95-A4VK,adenocarcinoma,Site-179,TCGA-95-A4VK-01Z-00-DX1.D09778E0-285E-4593-84C8-B6009DDF4E41 +TCGA-69-A59K,adenocarcinoma,Site-178,TCGA-69-A59K-01Z-00-DX1.01EAF520-9AC1-4ECC-8EF3-B9122924A1E3 +TCGA-05-4403,adenocarcinoma,Site-61,TCGA-05-4403-01Z-00-DX1.fee8e988-956c-42a2-a6c5-06b6d6736295 +TCGA-53-7624,adenocarcinoma,Site-149,TCGA-53-7624-01Z-00-DX1.51b4de19-3531-4bcb-b822-fa966480f2ad +TCGA-05-4244,adenocarcinoma,Site-61,TCGA-05-4244-01Z-00-DX1.d4ff32cd-38cf-40ea-8213-45c2b100ac01 +TCGA-64-5815,adenocarcinoma,Site-40,TCGA-64-5815-01Z-00-DX1.1E3600B5-7E83-47BE-AC70-E0C2D623740B +TCGA-50-8457,adenocarcinoma,Site-157,TCGA-50-8457-01Z-00-DX1.F4B6ECCE-F22B-43FC-9A08-4366A24A67A4 +TCGA-MP-A4TF,adenocarcinoma,Site-180,TCGA-MP-A4TF-01Z-00-DX1.F98B0438-6D32-4EA3-ABA8-E624457EA658 +TCGA-38-4632,adenocarcinoma,Site-130,TCGA-38-4632-01Z-00-DX1.5E05F85C-53BC-4280-8E7F-8B9DECEF849F +TCGA-44-7662,adenocarcinoma,Site-29,TCGA-44-7662-01Z-00-DX1.2f0b6cea-795a-40ad-93a9-319858e6fb3b +TCGA-49-AARQ,adenocarcinoma,Site-69,TCGA-49-AARQ-01Z-00-DX1.3FA36419-1658-4602-9170-84A622787C3E +TCGA-MP-A4TC,adenocarcinoma,Site-180,TCGA-MP-A4TC-01Z-00-DX1.C7DB4A7F-F6B8-4FA0-84C1-9B87D41C7749 +TCGA-55-6981,adenocarcinoma,Site-67,TCGA-55-6981-01Z-00-DX1.5fecd134-42a0-40a2-be75-1e8bd14c5b30 +TCGA-86-A4JF,adenocarcinoma,Site-9,TCGA-86-A4JF-01Z-00-DX1.22828B15-D68E-4300-A3A8-FBC11B319BC5 +TCGA-49-AAR4,adenocarcinoma,Site-69,TCGA-49-AAR4-01Z-00-DX1.EDB32358-AF23-4F81-A99F-15574A2DE28E +TCGA-38-4628,adenocarcinoma,Site-130,TCGA-38-4628-01Z-00-DX1.eee4f100-142f-4d9c-8e38-b16aab123413 +TCGA-49-4512,adenocarcinoma,Site-69,TCGA-49-4512-01Z-00-DX3.2f6ec7bc-0dac-4be0-95fe-4071a93a856f +TCGA-4B-A93V,adenocarcinoma,Site-71,TCGA-4B-A93V-01Z-00-DX1.C263DC1C-298D-47ED-AAF8-128043828530 +TCGA-55-8514,adenocarcinoma,Site-67,TCGA-55-8514-01Z-00-DX1.499CF563-B404-4061-9D1A-57954BBE2CE7 +TCGA-49-4488,adenocarcinoma,Site-69,TCGA-49-4488-01Z-00-DX2.ce896a98-8b45-4606-8849-3a377b81e3de +TCGA-55-8508,adenocarcinoma,Site-67,TCGA-55-8508-01Z-00-DX1.96B173E2-A298-4017-9AEC-113E547A3272 +TCGA-49-4490,adenocarcinoma,Site-69,TCGA-49-4490-01Z-00-DX1.bde106d9-1494-4775-b77d-11c86bb09b72 +TCGA-75-7027,adenocarcinoma,Site-93,TCGA-75-7027-01Z-00-DX1.AC91B34F-DF67-4F95-B145-079A71A54E83 +TCGA-55-8094,adenocarcinoma,Site-67,TCGA-55-8094-01Z-00-DX1.8dc29615-e124-4f17-81a1-c0b20c38d12c +TCGA-49-AAR0,adenocarcinoma,Site-69,TCGA-49-AAR0-01Z-00-DX1.1635383C-0FB3-4CF9-BABC-08AC19CCE501 +TCGA-50-8459,adenocarcinoma,Site-157,TCGA-50-8459-01Z-00-DX1.24D876E7-5265-4C5D-A215-0BAE5450EEF3 +TCGA-55-7227,adenocarcinoma,Site-67,TCGA-55-7227-01Z-00-DX1.e1a9e09c-4a50-4ee5-b1e4-2c7ae34dc913 +TCGA-64-1681,adenocarcinoma,Site-40,TCGA-64-1681-01Z-00-DX1.fb0244f6-51a8-4da2-b45c-38d8e02edab7 +TCGA-73-4666,adenocarcinoma,Site-103,TCGA-73-4666-01Z-00-DX1.dd9c8922-ddd1-4674-8124-9e334532e47e +TCGA-75-5125,adenocarcinoma,Site-93,TCGA-75-5125-01Z-00-DX1.2E419CAF-E248-4959-951E-498D8054433E +TCGA-67-6216,adenocarcinoma,Site-108,TCGA-67-6216-01Z-00-DX1.04da6a64-bd3a-48ed-bc99-d651e00a1aa4 +TCGA-44-2668,adenocarcinoma,Site-29,TCGA-44-2668-01Z-00-DX1.f66fa464-dad1-4a9f-8486-3d7579ca42b6 +TCGA-49-AARE,adenocarcinoma,Site-69,TCGA-49-AARE-01Z-00-DX1.3559B3A9-6699-472F-B731-1BD9F93EE82A +TCGA-78-7161,adenocarcinoma,Site-96,TCGA-78-7161-01Z-00-DX1.646cf15a-5961-4728-b9fe-fe3fff11831f +TCGA-80-5611,adenocarcinoma,Site-93,TCGA-80-5611-01Z-00-DX1.31920706-8AF4-46D2-B1D4-7DD5AF4AAC77 +TCGA-44-7659,adenocarcinoma,Site-29,TCGA-44-7659-01Z-00-DX1.818bf5b3-b4c1-4d23-a6ae-98d5015eb8c1 +TCGA-86-7954,adenocarcinoma,Site-9,TCGA-86-7954-01Z-00-DX1.0a063ee6-18b7-4b66-9726-04fb452156cb +TCGA-38-7271,adenocarcinoma,Site-130,TCGA-38-7271-01Z-00-DX1.EF258BEB-8DEA-483E-823A-1E0B90AAA392 +TCGA-78-7149,adenocarcinoma,Site-96,TCGA-78-7149-01Z-00-DX1.1b83dbcb-0c9f-4e50-bb98-bf9a157052cf +TCGA-73-4677,adenocarcinoma,Site-103,TCGA-73-4677-01Z-00-DX1.b557785c-9ccc-42a7-bb54-7bb47638ddfc +TCGA-44-2662,adenocarcinoma,Site-29,TCGA-44-2662-01Z-00-DX1.398f3f84-fd48-4454-b789-a3e90cd36d73 +TCGA-44-A47F,adenocarcinoma,Site-29,TCGA-44-A47F-01Z-00-DX1.C1EC3F2D-33C8-483B-9DDA-2F01B5FED618 +TCGA-97-7547,adenocarcinoma,Site-181,TCGA-97-7547-01Z-00-DX1.3563966f-21ee-4405-9106-9e0bf8df9f1a +TCGA-99-8032,adenocarcinoma,Site-175,TCGA-99-8032-01Z-00-DX1.7380b78f-ea25-43e0-ac90-194b5c6b1432 +TCGA-75-5126,adenocarcinoma,Site-93,TCGA-75-5126-01Z-00-DX1.164887F4-9986-415F-A94A-E40E92D80DA3 +TCGA-05-4398,adenocarcinoma,Site-61,TCGA-05-4398-01Z-00-DX1.269bc75f-492e-48b1-87ee-85924aa80e74 +TCGA-64-5774,adenocarcinoma,Site-40,TCGA-64-5774-01Z-00-DX1.5E3B95EB-BF45-4687-BCF7-AA2A41AA806C +TCGA-55-7914,adenocarcinoma,Site-67,TCGA-55-7914-01Z-00-DX1.875bffe1-8c56-4c29-ab11-6840ee3a643c +TCGA-86-A4P8,adenocarcinoma,Site-9,TCGA-86-A4P8-01Z-00-DX1.D65E8855-F7A1-4584-A501-BBFBC61C2DEB +TCGA-50-5946,adenocarcinoma,Site-157,TCGA-50-5946-01Z-00-DX1.5076CACA-E3C0-4114-BA69-5B4896FF0C4B +TCGA-55-8207,adenocarcinoma,Site-67,TCGA-55-8207-01Z-00-DX1.2dafc442-f927-4b0d-b197-cc8c5f86d0fc +TCGA-44-6148,adenocarcinoma,Site-29,TCGA-44-6148-01Z-00-DX1.94c29c83-1085-458f-a0c9-3ffa8592cb98 +TCGA-86-7701,adenocarcinoma,Site-9,TCGA-86-7701-01Z-00-DX1.a8a6e71e-9fa9-42c6-a186-0ac7526e9960 +TCGA-50-6597,adenocarcinoma,Site-157,TCGA-50-6597-01Z-00-DX1.ec7fc0b2-78a1-4384-bddd-e89f02ee5eb6 +TCGA-95-7043,adenocarcinoma,Site-179,TCGA-95-7043-01Z-00-DX1.AE0FD8AA-9B88-45FE-B247-402BED1285EF +TCGA-75-6211,adenocarcinoma,Site-93,TCGA-75-6211-01Z-00-DX1.3E7B5948-521E-48EE-9204-80F5713ECD25 +TCGA-44-A47G,adenocarcinoma,Site-29,TCGA-44-A47G-01Z-00-DX1.810F67EC-3A0C-4056-B736-3331A11412CC +TCGA-L9-A444,adenocarcinoma,Site-23,TCGA-L9-A444-01Z-00-DX1.88CF6F01-0C1F-4572-81E3-1A5790692861 +TCGA-86-8672,adenocarcinoma,Site-9,TCGA-86-8672-01Z-00-DX1.94ca1954-921d-4ef6-a078-4e0fcb09157f +TCGA-38-4625,adenocarcinoma,Site-130,TCGA-38-4625-01Z-00-DX1.ffdbfc1d-7a21-4e67-a54f-89597bd14a7f +TCGA-78-8662,adenocarcinoma,Site-96,TCGA-78-8662-01Z-00-DX1.754A4C57-B5FA-4077-99E4-F6EFA6459410 +TCGA-MP-A4TH,adenocarcinoma,Site-180,TCGA-MP-A4TH-01Z-00-DX1.E89D2C19-F9B2-4BF2-AA5F-6104CBC076D1 +TCGA-71-6725,adenocarcinoma,Site-59,TCGA-71-6725-01Z-00-DX1.b9e1d4aa-439d-481f-b24b-134c4e470f28 +TCGA-55-7726,adenocarcinoma,Site-67,TCGA-55-7726-01Z-00-DX1.37df197b-d4fb-41c4-ac6a-72dbaa31027b +TCGA-55-6984,adenocarcinoma,Site-67,TCGA-55-6984-01Z-00-DX1.d53e387a-9618-4486-98f3-a75604f25a7d +TCGA-NJ-A55A,adenocarcinoma,Site-182,TCGA-NJ-A55A-01Z-00-DX1.42C356FE-52A3-4A60-886B-75F84ADBC534 +TCGA-86-8359,adenocarcinoma,Site-9,TCGA-86-8359-01Z-00-DX1.11B8FDC3-4B84-4A22-B89D-46EA76345F13 +TCGA-50-5939,adenocarcinoma,Site-157,TCGA-50-5939-01Z-00-DX1.745D7503-0744-46B1-BC89-EBB8FCE2D55C +TCGA-86-8585,adenocarcinoma,Site-9,TCGA-86-8585-01Z-00-DX1.bc0b1c69-f71d-46fc-8195-2d205002e6e5 +TCGA-49-AAR3,adenocarcinoma,Site-69,TCGA-49-AAR3-01Z-00-DX1.B4D85F13-CEB4-46DA-BF39-0200B9E55835 +TCGA-44-7660,adenocarcinoma,Site-29,TCGA-44-7660-01Z-00-DX1.f96c114b-f0fc-498a-97cc-e262344de357 +TCGA-55-7913,adenocarcinoma,Site-67,TCGA-55-7913-01Z-00-DX1.2295fcc5-70b0-412e-a1fd-b5366fce0739 +TCGA-80-5608,adenocarcinoma,Site-93,TCGA-80-5608-01Z-00-DX1.CB85BA53-AF00-4C7D-8489-C2FF0F4F49AB +TCGA-55-8087,adenocarcinoma,Site-67,TCGA-55-8087-01Z-00-DX1.548f2800-8caf-4c0e-a7b5-6d3d28315d9c +TCGA-NJ-A55O,adenocarcinoma,Site-182,TCGA-NJ-A55O-01Z-00-DX1.8E23C821-B8BB-4D89-9E38-E97424A685CE +TCGA-78-7154,adenocarcinoma,Site-96,TCGA-78-7154-01Z-00-DX1.596c20c2-e811-4b5e-961c-98343afcccde +TCGA-86-8673,adenocarcinoma,Site-9,TCGA-86-8673-01Z-00-DX1.21191041-c10e-4166-ad00-1243656feaa7 +TCGA-55-8085,adenocarcinoma,Site-67,TCGA-55-8085-01Z-00-DX1.ff220eba-3515-4d2e-b317-76337dd3d206 +TCGA-35-3615,adenocarcinoma,Site-33,TCGA-35-3615-01Z-00-DX1.585128eb-6652-4b05-9a83-dc8f242904a6 +TCGA-75-5122,adenocarcinoma,Site-93,TCGA-75-5122-01Z-00-DX1.627FD712-3881-44CC-83E0-6C4DFC6C88C1 +TCGA-50-7109,adenocarcinoma,Site-157,TCGA-50-7109-01Z-00-DX1.08f6420c-baa2-4e79-908e-ebbdeec503a6 +TCGA-05-4417,adenocarcinoma,Site-61,TCGA-05-4417-01Z-00-DX1.6E534856-996F-4BD9-8AFC-7B1BF627E505 +TCGA-97-A4M2,adenocarcinoma,Site-181,TCGA-97-A4M2-01Z-00-DX1.5C72522C-1538-4D68-B517-5FB05A9D401A +TCGA-86-8076,adenocarcinoma,Site-9,TCGA-86-8076-01Z-00-DX1.e7378b2f-e20e-4d2f-a86c-3a8ead08a385 +TCGA-71-8520,adenocarcinoma,Site-59,TCGA-71-8520-01Z-00-DX1.EAE99132-A397-4B4D-8B09-CBB774A7B62F +TCGA-L9-A743,adenocarcinoma,Site-23,TCGA-L9-A743-01Z-00-DX1.27ED2955-E8B5-4A3C-ADAA-82568ECCFB83 +TCGA-44-3918,adenocarcinoma,Site-29,TCGA-44-3918-01Z-00-DX1.6da70a8b-6307-423a-9d2d-380c16962855 +TCGA-49-6744,adenocarcinoma,Site-69,TCGA-49-6744-01Z-00-DX4.a3d7995d-399f-4c53-aab8-adc4ea4dbfa8 +TCGA-MN-A4N4,adenocarcinoma,Site-14,TCGA-MN-A4N4-01Z-00-DX2.9550732D-8FB1-43D9-B094-7C0CD310E9C0 +TCGA-05-4250,adenocarcinoma,Site-61,TCGA-05-4250-01Z-00-DX1.90f67fdf-dff9-46ca-af71-0978d7c221ba +TCGA-64-5778,adenocarcinoma,Site-40,TCGA-64-5778-01Z-00-DX1.96C39819-8A65-4651-BE83-39959F6FAD05 +TCGA-93-8067,adenocarcinoma,Site-183,TCGA-93-8067-01Z-00-DX1.325d0cd5-9bb5-4f51-a197-75d3c3d19aaf +TCGA-MP-A4T7,adenocarcinoma,Site-180,TCGA-MP-A4T7-01Z-00-DX1.C74D7A5B-1D1A-423B-85CA-C11E0CCF5873 +TCGA-62-A46R,adenocarcinoma,Site-124,TCGA-62-A46R-01Z-00-DX1.AD823FBA-A63F-4D36-8B84-2C995DE5FC47 +TCGA-05-4382,adenocarcinoma,Site-61,TCGA-05-4382-01Z-00-DX1.76b49a4c-dbbb-48b0-b677-6d3037e5ce88 +TCGA-44-6144,adenocarcinoma,Site-29,TCGA-44-6144-01Z-00-DX1.604b3c7c-92e8-474a-bae8-e48415ea6196 +TCGA-55-8511,adenocarcinoma,Site-67,TCGA-55-8511-01Z-00-DX1.8EDFB05B-5B59-46EA-973C-1048B1E284D2 +TCGA-78-7220,adenocarcinoma,Site-96,TCGA-78-7220-01Z-00-DX1.3df84ce0-4395-4c9e-ba62-76b55676a440 +TCGA-38-4629,adenocarcinoma,Site-130,TCGA-38-4629-01Z-00-DX1.d00cc280-5370-4b9f-9655-6c35deb94647 +TCGA-55-8505,adenocarcinoma,Site-67,TCGA-55-8505-01Z-00-DX1.D364C30D-BFB8-486B-A0D3-948FF8E90C3E +TCGA-78-7167,adenocarcinoma,Site-96,TCGA-78-7167-01Z-00-DX1.f79e1a9b-a3eb-4c91-a1fd-7bb58b1620b1 +TCGA-MP-A4T9,adenocarcinoma,Site-180,TCGA-MP-A4T9-01Z-00-DX1.F7B341C4-EBCD-455F-BE90-3B77AC6B76EC +TCGA-62-A46U,adenocarcinoma,Site-124,TCGA-62-A46U-01Z-00-DX1.5129535F-8B6A-4647-8A88-3EE125642874 +TCGA-55-1592,adenocarcinoma,Site-67,TCGA-55-1592-01Z-00-DX1.ee15e261-2f9c-4a93-a19f-27cb1c3bae15 +TCGA-75-6214,adenocarcinoma,Site-93,TCGA-75-6214-01Z-00-DX1.00424C13-65B8-4872-81EB-EE207AB3F78C +TCGA-05-4420,adenocarcinoma,Site-61,TCGA-05-4420-01Z-00-DX1.2df0552e-87dc-491e-9be2-be8fbce2660d +TCGA-78-7539,adenocarcinoma,Site-96,TCGA-78-7539-01Z-00-DX1.b838d859-2fb7-48d1-a442-d366ac404099 +TCGA-86-8056,adenocarcinoma,Site-9,TCGA-86-8056-01Z-00-DX1.eee7e03a-842c-4a44-bff8-f7b906725605 +TCGA-S2-AA1A,adenocarcinoma,Site-5,TCGA-S2-AA1A-01Z-00-DX1.4B5D5FAE-8305-4D2D-B245-584589352886 +TCGA-05-4396,adenocarcinoma,Site-61,TCGA-05-4396-01Z-00-DX1.49DD5F68-7473-4945-B384-EA6D5AE383CB +TCGA-55-8506,adenocarcinoma,Site-67,TCGA-55-8506-01Z-00-DX1.A908E3D2-91F7-4DE2-98C9-95D1038B9F49 +TCGA-55-8615,adenocarcinoma,Site-67,TCGA-55-8615-01Z-00-DX1.928CFCA6-059D-4901-8D6A-BC7D9486C29F +TCGA-55-8096,adenocarcinoma,Site-67,TCGA-55-8096-01Z-00-DX1.c833417d-10c1-4430-a241-d6f5496e1cd9 +TCGA-MP-A4SV,adenocarcinoma,Site-180,TCGA-MP-A4SV-01Z-00-DX1.5430F2CF-BE4B-42D6-9CD4-92E6FB3F6E99 +TCGA-NJ-A4YG,adenocarcinoma,Site-182,TCGA-NJ-A4YG-01Z-00-DX1.507BAE41-5EE9-4657-AAD1-AD456750BA2F +TCGA-44-5645,adenocarcinoma,Site-29,TCGA-44-5645-01Z-00-DX1.8b0d89d7-7849-49a6-a2f5-a8faf2c5aab4 +TCGA-05-4426,adenocarcinoma,Site-61,TCGA-05-4426-01Z-00-DX1.4c9da782-f4aa-4c4f-b3aa-fae2259f236f +TCGA-50-5045,adenocarcinoma,Site-157,TCGA-50-5045-01Z-00-DX1.ec50bbfc-1721-4e94-ac11-9ebd843e11ba +TCGA-75-5147,adenocarcinoma,Site-93,TCGA-75-5147-01Z-00-DX1.3156FA5C-9737-4EAA-989C-FD578FBF15A1 +TCGA-55-A494,adenocarcinoma,Site-67,TCGA-55-A494-01Z-00-DX1.8C7C4BE0-791A-47CA-B0A8-1935366350E0 +TCGA-L4-A4E5,adenocarcinoma,Site-49,TCGA-L4-A4E5-01Z-00-DX1.C2A5EF71-1F40-4E34-AC30-58D7121C1338 +TCGA-97-8179,adenocarcinoma,Site-181,TCGA-97-8179-01Z-00-DX1.5501338a-dd89-4c41-8267-5baaaea71643 +TCGA-73-4676,adenocarcinoma,Site-103,TCGA-73-4676-01Z-00-DX1.4d781bbc-a45e-4f9d-b6b6-2265282dff99 +TCGA-MP-A4TK,adenocarcinoma,Site-180,TCGA-MP-A4TK-01Z-00-DX1.57494698-D9D9-4C04-AAB2-16616CCFDCC9 +TCGA-97-7941,adenocarcinoma,Site-181,TCGA-97-7941-01Z-00-DX1.6845ef95-2231-44a2-aeae-004a117f1964 +TCGA-75-7030,adenocarcinoma,Site-93,TCGA-75-7030-01Z-00-DX1.5DDF24B5-00D1-4418-A067-A9B609E15314 +TCGA-49-AAR2,adenocarcinoma,Site-69,TCGA-49-AAR2-01Z-00-DX1.1F09F896-446E-4C55-8D01-6C34A98AB1D2 +TCGA-55-7995,adenocarcinoma,Site-67,TCGA-55-7995-01Z-00-DX1.fef24d04-35a0-4f57-8f51-7ad602a78871 +TCGA-64-1676,adenocarcinoma,Site-40,TCGA-64-1676-01Z-00-DX1.3B8F1FAC-0FB2-45DE-AFA1-6ED451A4A61B +TCGA-44-2665,adenocarcinoma,Site-29,TCGA-44-2665-01Z-00-DX1.d429e37e-3915-4da6-b5bc-bf2f09f4fbf7 +TCGA-44-8119,adenocarcinoma,Site-29,TCGA-44-8119-01Z-00-DX1.1EBEBFA7-22DB-4365-9DF8-C4E679C11312 +TCGA-99-8025,adenocarcinoma,Site-175,TCGA-99-8025-01Z-00-DX1.76d4e012-91a7-4f1d-ba0f-4cecd812ec49 +TCGA-50-6590,adenocarcinoma,Site-157,TCGA-50-6590-01Z-00-DX1.B08B2656-3DC7-4D35-A54D-9C57A7A773B3 +TCGA-35-4123,adenocarcinoma,Site-33,TCGA-35-4123-01Z-00-DX1.990553a8-657e-49ec-9782-e6d30a4c4909 +TCGA-95-7567,adenocarcinoma,Site-179,TCGA-95-7567-01Z-00-DX1.982FF122-A440-4949-926C-B58AF8381090 +TCGA-64-1679,adenocarcinoma,Site-40,TCGA-64-1679-01Z-00-DX1.f9fd24ef-d51c-4530-9fb5-e8a8f09dba73 +TCGA-69-7973,adenocarcinoma,Site-178,TCGA-69-7973-01Z-00-DX1.aefb015a-cb51-425f-a80f-cd51091a0ee3 +TCGA-97-7546,adenocarcinoma,Site-181,TCGA-97-7546-01Z-00-DX1.dad9871f-cdfb-424d-a5b2-4e0905c29137 +TCGA-44-7669,adenocarcinoma,Site-29,TCGA-44-7669-01Z-00-DX1.34d0c9ac-d3dc-4a5e-b99c-a8c1c1e120d1 +TCGA-78-7153,adenocarcinoma,Site-96,TCGA-78-7153-01Z-00-DX1.5190321a-f1aa-4c6e-b4b9-6767b779682b +TCGA-97-7937,adenocarcinoma,Site-181,TCGA-97-7937-01Z-00-DX1.6db4b2ff-722c-4bb8-b12f-15f1a38b4ab1 +TCGA-97-8172,adenocarcinoma,Site-181,TCGA-97-8172-01Z-00-DX1.812ab977-692e-42f8-af57-7ce9084ab045 +TCGA-55-8090,adenocarcinoma,Site-67,TCGA-55-8090-01Z-00-DX1.83f86d5f-7cb9-443a-a812-b19ba4aea836 +TCGA-49-4510,adenocarcinoma,Site-69,TCGA-49-4510-01Z-00-DX2.e0552b15-6bd7-4660-8a60-9edd0d638e7e +TCGA-MP-A4TI,adenocarcinoma,Site-180,TCGA-MP-A4TI-01Z-00-DX1.08AB4C0B-F953-41B5-A9BB-7452A8E34A54 +TCGA-91-A4BC,adenocarcinoma,Site-1,TCGA-91-A4BC-01Z-00-DX1.8D8783B1-D1BA-4970-B77B-3AED8506C487 +TCGA-69-8253,adenocarcinoma,Site-178,TCGA-69-8253-01Z-00-DX1.631c741f-46bd-4d4b-8f6c-64787fb4c402 +TCGA-95-8039,adenocarcinoma,Site-179,TCGA-95-8039-01Z-00-DX1.38b07435-987d-495d-9940-ba5f20e6ef97 +TCGA-44-6774,adenocarcinoma,Site-29,TCGA-44-6774-01Z-00-DX1.f169485b-f863-4be0-9844-258d78170b64 +TCGA-99-AA5R,adenocarcinoma,Site-175,TCGA-99-AA5R-01Z-00-DX1.4DE7FC30-D338-49CE-B8D9-EB0F34A3DD0B +TCGA-78-8655,adenocarcinoma,Site-96,TCGA-78-8655-01Z-00-DX1.DC81D146-A4A6-411E-A5DC-C0689A8D445F +TCGA-50-6591,adenocarcinoma,Site-157,TCGA-50-6591-01Z-00-DX1.12e1050b-75e9-4059-b945-291995b3e93c +TCGA-55-8205,adenocarcinoma,Site-67,TCGA-55-8205-01Z-00-DX1.6df6d499-f325-409d-acb1-9e13172f2bb4 +TCGA-80-5607,adenocarcinoma,Site-93,TCGA-80-5607-01Z-00-DX1.94C9CF78-2F79-4A0B-8939-1CD9B915222D +TCGA-50-5068,adenocarcinoma,Site-157,TCGA-50-5068-01Z-00-DX1.13f1c31e-4d4c-4562-8447-c5bbfbf670a9 +TCGA-55-6979,adenocarcinoma,Site-67,TCGA-55-6979-01Z-00-DX1.9238436e-3886-4a10-86fe-26658967dff2 +TCGA-MP-A4SW,adenocarcinoma,Site-180,TCGA-MP-A4SW-01Z-00-DX1.A6651B25-A35A-4BDE-9EBF-D264A02003C8 +TCGA-MP-A5C7,adenocarcinoma,Site-180,TCGA-MP-A5C7-01Z-00-DX1.26E5FA5A-3E6D-4CD3-8FA2-F331B7012311 +TCGA-MP-A4TJ,adenocarcinoma,Site-180,TCGA-MP-A4TJ-01Z-00-DX1.14EDBE5C-5D0C-4002-BE95-AF5C9D9F3D43 +TCGA-97-7938,adenocarcinoma,Site-181,TCGA-97-7938-01Z-00-DX1.d774d0a2-ee34-43ba-9d9d-34bf9718292e +TCGA-55-6969,adenocarcinoma,Site-67,TCGA-55-6969-01Z-00-DX1.713df9f6-1a91-4e0e-ad43-42e66dcca191 +TCGA-53-A4EZ,adenocarcinoma,Site-149,TCGA-53-A4EZ-01Z-00-DX1.5D155F0B-A677-4589-AF00-A4C451F5B6B6 +TCGA-78-7536,adenocarcinoma,Site-96,TCGA-78-7536-01Z-00-DX1.6b066140-8b82-40a5-bd48-430eeb4463fe +TCGA-67-3773,adenocarcinoma,Site-108,TCGA-67-3773-01Z-00-DX1.3E9DFC22-E962-4C18-BF5B-27EBEA089F5D +TCGA-62-A471,adenocarcinoma,Site-124,TCGA-62-A471-01Z-00-DX1.51A235BB-7F67-4C0D-8B15-2E26116D8822 +TCGA-05-4245,adenocarcinoma,Site-61,TCGA-05-4245-01Z-00-DX1.36ff5403-d4bb-4415-b2c5-7c750d655cde +TCGA-55-A491,adenocarcinoma,Site-67,TCGA-55-A491-01Z-00-DX1.E5F3B4E5-18EA-4067-AE28-119DABCE739A +TCGA-67-4679,adenocarcinoma,Site-108,TCGA-67-4679-01Z-00-DX1.a3d84341-fed7-4235-a053-638ed5294954 +TCGA-73-7499,adenocarcinoma,Site-103,TCGA-73-7499-01Z-00-DX1.81f5c1c1-3d40-4773-a1fb-e1f8eb9f6730 +TCGA-95-7944,adenocarcinoma,Site-179,TCGA-95-7944-01Z-00-DX1.E66A3565-7B0D-412B-9C0C-81D7397E8B87 +TCGA-91-A4BD,adenocarcinoma,Site-1,TCGA-91-A4BD-01Z-00-DX1.2E575CAC-6AEB-4049-972A-EA43457A9C9D +TCGA-49-AAR9,adenocarcinoma,Site-69,TCGA-49-AAR9-01Z-00-DX1.B6A9966E-89B8-404B-8ABC-1F8F8553D4DA +TCGA-86-8279,adenocarcinoma,Site-9,TCGA-86-8279-01Z-00-DX1.fd12b60e-d181-454b-a655-298a973a849d +TCGA-50-5044,adenocarcinoma,Site-157,TCGA-50-5044-01Z-00-DX1.7E0E651F-411B-4784-B0FA-6EB612527430 +TCGA-50-6593,adenocarcinoma,Site-157,TCGA-50-6593-01Z-00-DX1.a63e298c-cbe7-44e8-8d8e-34ebb93530ca +TCGA-62-A46P,adenocarcinoma,Site-124,TCGA-62-A46P-01Z-00-DX1.9136CC96-51B9-4FEB-8BCF-2C3676C708B1 +TCGA-MP-A4T6,adenocarcinoma,Site-180,TCGA-MP-A4T6-01Z-00-DX1.085C4F5A-DB1B-434A-9D62-E2187D133B0A +TCGA-69-7765,adenocarcinoma,Site-178,TCGA-69-7765-01Z-00-DX1.ac389366-febb-488c-9190-fe00bc07cd20 +TCGA-55-7724,adenocarcinoma,Site-67,TCGA-55-7724-01Z-00-DX1.31a194ac-62e3-4225-8e32-8c2a83dcdd10 +TCGA-05-4434,adenocarcinoma,Site-61,TCGA-05-4434-01Z-00-DX1.f3f5c89c-306f-4876-98fa-f7bd231f2e10 +TCGA-95-A4VN,adenocarcinoma,Site-179,TCGA-95-A4VN-01Z-00-DX1.B997A80F-C396-40AB-808A-B70EB83A2B74 +TCGA-86-8055,adenocarcinoma,Site-9,TCGA-86-8055-01Z-00-DX1.546dc42e-3742-4da5-8f9b-80732180ce76 +TCGA-49-6767,adenocarcinoma,Site-69,TCGA-49-6767-01Z-00-DX1.53459c0e-b8ec-4893-9910-87b63c503134 +TCGA-05-4395,adenocarcinoma,Site-61,TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9 +TCGA-49-6742,adenocarcinoma,Site-69,TCGA-49-6742-01Z-00-DX2.2c6b4df0-867d-40c5-8bee-14e2d219224b +TCGA-55-A48X,adenocarcinoma,Site-67,TCGA-55-A48X-01Z-00-DX1.A46C6373-8458-4D55-88C3-4C70A05F9F47 +TCGA-35-4122,adenocarcinoma,Site-33,TCGA-35-4122-01Z-00-DX1.2ac022e4-e796-49e5-9a24-f0ff3f76a527 +TCGA-99-8028,adenocarcinoma,Site-175,TCGA-99-8028-01Z-00-DX1.23de89b1-67f8-41fb-980a-010ea190d687 +TCGA-55-8614,adenocarcinoma,Site-67,TCGA-55-8614-01Z-00-DX1.043DE2B5-A453-4570-8830-99170450658C +TCGA-55-7815,adenocarcinoma,Site-67,TCGA-55-7815-01Z-00-DX1.288408e6-f6b3-4de4-a1ce-cb2498d9d46d +TCGA-62-A472,adenocarcinoma,Site-124,TCGA-62-A472-01Z-00-DX1.C2F76C9A-C51A-4521-9B95-2A20063E9931 +TCGA-38-4626,adenocarcinoma,Site-130,TCGA-38-4626-01Z-00-DX1.142bc018-bd79-4db9-84e2-f83a617ea92a +TCGA-73-4658,adenocarcinoma,Site-103,TCGA-73-4658-01Z-00-DX1.d5beb44f-9d76-485a-8af4-407b0f1a610e +TCGA-55-6971,adenocarcinoma,Site-67,TCGA-55-6971-01Z-00-DX1.6c790768-8252-466a-95e1-97d2086a8132 +TCGA-53-7626,adenocarcinoma,Site-149,TCGA-53-7626-01Z-00-DX1.FD4CAF75-07B5-4117-83DD-99233DAA3EBC +TCGA-75-6205,adenocarcinoma,Site-93,TCGA-75-6205-01Z-00-DX1.B75BC6BA-5196-4F62-BDA2-3F1D320ABD7C +TCGA-44-2666,adenocarcinoma,Site-29,TCGA-44-2666-01Z-00-DX1.f81a89cf-6033-47ea-9a43-290a5d6a28c1 +TCGA-55-8616,adenocarcinoma,Site-67,TCGA-55-8616-01Z-00-DX1.4F53B2AB-8221-4E67-80D2-7A8E3815B730 +TCGA-05-4405,adenocarcinoma,Site-61,TCGA-05-4405-01Z-00-DX1.D57EC2B2-3A59-4954-86A7-61782938BCC5 +TCGA-69-7978,adenocarcinoma,Site-178,TCGA-69-7978-01Z-00-DX1.90a2bf2f-18e2-4bfe-b858-d2de619e2493 +TCGA-95-7562,adenocarcinoma,Site-179,TCGA-95-7562-01Z-00-DX1.C722891A-C5C6-4F0D-A73F-259FA8DB2394 +TCGA-78-7150,adenocarcinoma,Site-96,TCGA-78-7150-01Z-00-DX1.46e6e65c-521c-49de-b106-419dc840b235 +TCGA-69-7760,adenocarcinoma,Site-178,TCGA-69-7760-01Z-00-DX1.7fc295e3-5bfc-4017-801c-491489d0eb34 +TCGA-L4-A4E6,adenocarcinoma,Site-49,TCGA-L4-A4E6-01Z-00-DX1.1B537DE0-2CE3-4B90-9ED3-71F1F8747EE9 +TCGA-69-8254,adenocarcinoma,Site-178,TCGA-69-8254-01Z-00-DX1.83a81d08-8c70-4f07-b4da-cbd17c4fe4d4 +TCGA-67-3774,adenocarcinoma,Site-108,TCGA-67-3774-01Z-00-DX1.FE85FDCB-91D2-47C5-8D13-D622A51A7BE0 +TCGA-55-7576,adenocarcinoma,Site-67,TCGA-55-7576-01Z-00-DX1.3a6142e9-c90a-4938-a1d1-061c2ac261f0 +TCGA-78-7540,adenocarcinoma,Site-96,TCGA-78-7540-01Z-00-DX1.25c29857-0d5f-4e52-be0c-32c6686b653f +TCGA-69-7980,adenocarcinoma,Site-178,TCGA-69-7980-01Z-00-DX1.8bbf8cc0-eca7-49e5-a022-c22e3e6ed6dc +TCGA-62-A470,adenocarcinoma,Site-124,TCGA-62-A470-01Z-00-DX1.3983A979-E756-413C-941A-C8B83AD3BBA7 +TCGA-MN-A4N5,adenocarcinoma,Site-14,TCGA-MN-A4N5-01Z-00-DX1.D0FBA0D2-6245-4EFD-AF2B-849215600DBE +TCGA-97-A4M6,adenocarcinoma,Site-181,TCGA-97-A4M6-01Z-00-DX1.01D83621-3D8E-4339-9F9B-CBD3083D5AE0 +TCGA-86-A4P7,adenocarcinoma,Site-9,TCGA-86-A4P7-01Z-00-DX1.37026DCF-5A81-4098-B444-C8A40CA168D0 +TCGA-97-7552,adenocarcinoma,Site-181,TCGA-97-7552-01Z-00-DX1.a5ab8959-ff7a-4287-b014-87439957fb67 +TCGA-44-A47B,adenocarcinoma,Site-29,TCGA-44-A47B-01Z-00-DX1.177D0531-E037-435B-BFD4-382B2150B10D +TCGA-75-6212,adenocarcinoma,Site-93,TCGA-75-6212-01Z-00-DX1.FCB36956-DF6C-485D-8E7A-922A2BDAE669 +TCGA-44-A4SS,adenocarcinoma,Site-29,TCGA-44-A4SS-01Z-00-DX1.F3D85B63-D963-42D2-815F-CB9D791A0F7B +TCGA-75-6203,adenocarcinoma,Site-93,TCGA-75-6203-01Z-00-DX1.9796ACAB-9E1A-4BC5-B80C-4EC856FA14B7 +TCGA-44-2657,adenocarcinoma,Site-29,TCGA-44-2657-01Z-00-DX1.b7f608bd-2381-40a2-a757-ca4648a7de8a +TCGA-44-2656,adenocarcinoma,Site-29,TCGA-44-2656-01Z-00-DX1.5f247f31-4bb5-460d-a1f7-7dcdba33c835 +TCGA-55-6543,adenocarcinoma,Site-67,TCGA-55-6543-01Z-00-DX1.08806fe0-84d3-4fd6-8746-6cf557241958 +TCGA-78-7163,adenocarcinoma,Site-96,TCGA-78-7163-01Z-00-DX1.42880bdf-271d-41dd-aada-e7ec1156ff31 +TCGA-55-7911,adenocarcinoma,Site-67,TCGA-55-7911-01Z-00-DX1.081bc002-06ee-4218-a25f-26c0e6c4b333 +TCGA-55-8620,adenocarcinoma,Site-67,TCGA-55-8620-01Z-00-DX1.6591EEC4-D696-4826-80BD-99BE1D99EBB8 +TCGA-MP-A4TA,adenocarcinoma,Site-180,TCGA-MP-A4TA-01Z-00-DX1.11DDADA4-B7A2-4216-9B18-7AD993AE12E4 +TCGA-97-A4M3,adenocarcinoma,Site-181,TCGA-97-A4M3-01Z-00-DX1.16330A66-2501-48D2-AF6A-713698BF6918 +TCGA-64-1677,adenocarcinoma,Site-40,TCGA-64-1677-01Z-00-DX1.7720D42B-1F57-4C23-9BC5-87C456E7737B +TCGA-49-4505,adenocarcinoma,Site-69,TCGA-49-4505-01Z-00-DX4.623c4278-fc3e-4c80-bb4d-000e24fbb1c2 +TCGA-78-8640,adenocarcinoma,Site-96,TCGA-78-8640-01Z-00-DX1.6F929E56-34D5-45A6-8A9E-ACB4B88A68B1 +TCGA-55-1595,adenocarcinoma,Site-67,TCGA-55-1595-01Z-00-DX1.6270c9cf-cd18-4fa5-b313-33350ad55f89 +TCGA-NJ-A4YP,adenocarcinoma,Site-182,TCGA-NJ-A4YP-01Z-00-DX1.148BFF66-4DC7-468C-8783-97F27C2E1245 +TCGA-44-2661,adenocarcinoma,Site-29,TCGA-44-2661-01Z-00-DX1.20cfa0f8-e3ca-4c26-9dfe-b9d416cd94b1 +TCGA-86-8054,adenocarcinoma,Site-9,TCGA-86-8054-01Z-00-DX1.2c4c08f6-be1c-46d1-a719-5983cade0c54 +TCGA-55-7574,adenocarcinoma,Site-67,TCGA-55-7574-01Z-00-DX1.09639e6a-d85f-4d84-abbc-0f5a6d679683 +TCGA-49-4501,adenocarcinoma,Site-69,TCGA-49-4501-01Z-00-DX3.b6c2cc84-1c94-4816-92e7-8cf4446ac9ac +TCGA-86-8278,adenocarcinoma,Site-9,TCGA-86-8278-01Z-00-DX1.6500afef-f0f0-4e7c-a6c0-60da81993641 +TCGA-44-A47A,adenocarcinoma,Site-29,TCGA-44-A47A-01Z-00-DX1.62448803-3A7B-41EC-9794-67916DA0792E +TCGA-55-7725,adenocarcinoma,Site-67,TCGA-55-7725-01Z-00-DX1.4d678777-63b1-4f4a-932a-7fccabf504c7 +TCGA-49-AARR,adenocarcinoma,Site-69,TCGA-49-AARR-01Z-00-DX1.9BCAFE3D-447C-40D6-85DD-AFABF6978376 +TCGA-55-8208,adenocarcinoma,Site-67,TCGA-55-8208-01Z-00-DX1.6eccb7e2-16e4-4d25-9a1e-b370e016020f +TCGA-97-8171,adenocarcinoma,Site-181,TCGA-97-8171-01Z-00-DX1.a55cbfd1-6674-4206-b3e0-2a4f12a74e5d +TCGA-55-A4DF,adenocarcinoma,Site-67,TCGA-55-A4DF-01Z-00-DX1.C945E1AD-BEF4-403F-BB11-BAC831D24CDA +TCGA-78-7158,adenocarcinoma,Site-96,TCGA-78-7158-01Z-00-DX1.74db3dcd-2715-4ce7-8e48-30c8b7946ba0 +TCGA-55-A492,adenocarcinoma,Site-67,TCGA-55-A492-01Z-00-DX1.31B23842-7F8B-4311-88DF-C51CB4F54AED +TCGA-05-4397,adenocarcinoma,Site-61,TCGA-05-4397-01Z-00-DX1.00e9cdb3-b50e-439c-86b0-d7b73b802c0d +TCGA-78-7148,adenocarcinoma,Site-96,TCGA-78-7148-01Z-00-DX1.aac94f79-77f2-45e6-b0c3-0b103fecbb22 +TCGA-55-6972,adenocarcinoma,Site-67,TCGA-55-6972-01Z-00-DX1.0b441ad0-c30f-4f63-849a-36c98d6e2d3b +TCGA-50-5942,adenocarcinoma,Site-157,TCGA-50-5942-01Z-00-DX1.da1b1274-8740-4c79-adf9-25800913eafa +TCGA-73-4659,adenocarcinoma,Site-103,TCGA-73-4659-01Z-00-DX1.578400cd-d284-451b-8fdf-9f82c1458276 +TCGA-93-7347,adenocarcinoma,Site-183,TCGA-93-7347-01Z-00-DX1.a22b7163-07bd-4f6a-97c7-f1561c4d73cb +TCGA-73-4670,adenocarcinoma,Site-103,TCGA-73-4670-01Z-00-DX1.a92a23d5-01aa-45a2-9f83-4784cd65fd58 +TCGA-55-7727,adenocarcinoma,Site-67,TCGA-55-7727-01Z-00-DX1.0c7a4953-16dc-41e0-845b-b89167ec17d7 +TCGA-55-8089,adenocarcinoma,Site-67,TCGA-55-8089-01Z-00-DX1.da4d99ff-4a7c-45a3-b79c-039c0c9e9712 +TCGA-MP-A4TE,adenocarcinoma,Site-180,TCGA-MP-A4TE-01Z-00-DX1.B4680570-FD2E-47A5-8D50-A508CEB6A62C +TCGA-55-7570,adenocarcinoma,Site-67,TCGA-55-7570-01Z-00-DX1.9020ae67-cf60-4bac-8c60-403c8891c4a3 +TCGA-35-5375,adenocarcinoma,Site-33,TCGA-35-5375-01Z-00-DX1.10708170-750A-451B-BC10-0B4042540033 +TCGA-78-8660,adenocarcinoma,Site-96,TCGA-78-8660-01Z-00-DX1.11E923ED-C01B-439D-8796-08C9E3DC2D93 +TCGA-55-8097,adenocarcinoma,Site-67,TCGA-55-8097-01Z-00-DX1.2f847b65-a5dc-41be-9dd0-a1e11df3cd10 +TCGA-86-6562,adenocarcinoma,Site-9,TCGA-86-6562-01Z-00-DX1.5dea3015-e606-4837-9f99-ac14f0aa091b +TCGA-21-A5DI,squamous,Site-40,TCGA-21-A5DI-01Z-00-DX1.E9123261-ADE7-468C-9E9A-334E131FFF97 +TCGA-43-5670,squamous,Site-29,TCGA-43-5670-01Z-00-DX1.1b5d262e-1f39-4f6f-883c-52101b57791f +TCGA-18-3415,squamous,Site-97,TCGA-18-3415-01Z-00-DX1.8C62F2CD-4A2F-4D1E-A662-D7D5AFE557AB +TCGA-43-2576,squamous,Site-29,TCGA-43-2576-01Z-00-DX1.779df209-95e1-4303-9c32-4083e8088d8e +TCGA-33-4533,squamous,Site-69,TCGA-33-4533-01Z-00-DX1.ee36717d-0571-40b3-8ab5-5465d2cca920 +TCGA-NC-A5HT,squamous,Site-177,TCGA-NC-A5HT-01Z-00-DX1.9295B0E3-37FE-4914-AFB3-78B56C893B6D +TCGA-56-8628,squamous,Site-67,TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895 +TCGA-63-A5MM,squamous,Site-93,TCGA-63-A5MM-01Z-00-DX1.F385687A-3741-4E73-87F1-D9B00B1B6186 +TCGA-21-1081,squamous,Site-40,TCGA-21-1081-01Z-00-DX1.fce8927a-2c5f-4a64-8414-da66424b3859 +TCGA-60-2707,squamous,Site-103,TCGA-60-2707-01Z-00-DX1.4aafd76b-eb0e-4ab9-a740-682c169a3c3d +TCGA-56-8625,squamous,Site-67,TCGA-56-8625-01Z-00-DX1.11AADFB3-33FF-4EAA-B423-CD0F07163747 +TCGA-33-4586,squamous,Site-69,TCGA-33-4586-01Z-00-DX2.a9798fe4-8db9-41d7-8d00-eb2a9eef1bc0 +TCGA-22-5471,squamous,Site-72,TCGA-22-5471-01Z-00-DX1.AACEB098-E9B8-4A2B-905E-7D66BE962922 +TCGA-34-2604,squamous,Site-157,TCGA-34-2604-01Z-00-DX1.C04E4FF6-6E62-432D-AD1E-D0AACAA66875 +TCGA-NC-A5HE,squamous,Site-177,TCGA-NC-A5HE-01Z-00-DX1.57DE9D24-24C8-4EDF-942B-06777B003F68 +TCGA-85-A53L,squamous,Site-9,TCGA-85-A53L-01Z-00-DX1.4B1C0D74-CF56-4B52-9EA0-9DC72EA07B1C +TCGA-43-2581,squamous,Site-29,TCGA-43-2581-01Z-00-DX1.0d1203cf-9ac9-49a2-8bd0-063ceae7b1d3 +TCGA-60-2720,squamous,Site-103,TCGA-60-2720-01Z-00-DX1.5ff16714-210d-44c3-856a-5877fd06e986 +TCGA-85-7698,squamous,Site-9,TCGA-85-7698-01Z-00-DX1.bc8a89a7-356e-4913-9632-627774730f48 +TCGA-51-4080,squamous,Site-130,TCGA-51-4080-01Z-00-DX1.fd660ce3-3e30-4907-b54a-81002ec071f2 +TCGA-39-5031,squamous,Site-78,TCGA-39-5031-01Z-00-DX1.f9264b3b-9fdf-4fb7-8752-2d3ffd3de3ea +TCGA-77-8136,squamous,Site-96,TCGA-77-8136-01Z-00-DX1.15cdacc3-ee04-4323-b5e4-4f6d7085bd38 +TCGA-18-3421,squamous,Site-97,TCGA-18-3421-01Z-00-DX1.3777597A-8B36-48A9-B950-C8233392A6D5 +TCGA-90-A59Q,squamous,Site-1,TCGA-90-A59Q-01Z-00-DX1.9F4ABA20-E9F7-4524-874E-E8C42D84AFFE +TCGA-52-7622,squamous,Site-149,TCGA-52-7622-01Z-00-DX1.cb3bb056-27dd-4c15-9004-b06cc8923663 +TCGA-96-A4JL,squamous,Site-181,TCGA-96-A4JL-01Z-00-DX1.AB297723-8AF1-418D-ADBE-46B5D7703C34 +TCGA-22-5472,squamous,Site-72,TCGA-22-5472-01Z-00-DX1.1E2A1D5E-C7D3-449E-910F-1199A396804C +TCGA-85-8350,squamous,Site-9,TCGA-85-8350-01Z-00-DX1.F9A32453-2E24-4874-A1FA-67B4173ABF03 +TCGA-60-2697,squamous,Site-103,TCGA-60-2697-01Z-00-DX1.e11a5b5b-c4be-4fe1-b7db-08879e55d552 +TCGA-85-8287,squamous,Site-9,TCGA-85-8287-01Z-00-DX1.3bd80053-5eb1-40f0-bdbd-9d4944c04328 +TCGA-66-2790,squamous,Site-61,TCGA-66-2790-01Z-00-DX1.86f436e3-e233-4986-90bb-aaa2adc4596f +TCGA-68-7755,squamous,Site-178,TCGA-68-7755-01Z-00-DX1.62BC3770-5E81-4006-8F60-47BC66150AB3 +TCGA-56-A4BX,squamous,Site-67,TCGA-56-A4BX-01Z-00-DX1.60E64193-473C-4220-9E38-0F5D952BD157 +TCGA-22-0940,squamous,Site-72,TCGA-22-0940-01Z-00-DX1.775CB101-6A55-40DD-B19F-7BFCA7820F5F +TCGA-56-A4ZK,squamous,Site-67,TCGA-56-A4ZK-01Z-00-DX1.D6B1AC68-BF70-4C49-A5A1-233D6DA4AE27 +TCGA-21-1077,squamous,Site-40,TCGA-21-1077-01Z-00-DX1.e7d0d3ca-b24c-4b2b-bb8f-13cc6991fb78 +TCGA-96-A4JK,squamous,Site-181,TCGA-96-A4JK-01Z-00-DX1.DD4071B6-75CA-475D-B742-925EA6BE9F0A +TCGA-68-7756,squamous,Site-178,TCGA-68-7756-01Z-00-DX1.15BEAECF-8C36-4761-85A8-EC9E54B61A0C +TCGA-96-8170,squamous,Site-181,TCGA-96-8170-01Z-00-DX1.8283cd4c-f374-4320-88bd-36a462866d24 +TCGA-O2-A52V,squamous,Site-176,TCGA-O2-A52V-01Z-00-DX1.561ADDAE-EC55-461A-84B5-535C93E39C56 +TCGA-66-2773,squamous,Site-61,TCGA-66-2773-01Z-00-DX1.70fe76fc-749d-4720-a97e-40cd097f2340 +TCGA-56-A4BY,squamous,Site-67,TCGA-56-A4BY-01Z-00-DX1.DCA0D153-7DB7-44FB-A87A-20931D36856A +TCGA-98-A53C,squamous,Site-175,TCGA-98-A53C-01Z-00-DX1.1E06B7B5-32C3-436C-AD66-27E5829ADA85 +TCGA-O2-A52W,squamous,Site-176,TCGA-O2-A52W-01Z-00-DX1.3D388EB0-8402-4540-9E50-7C3757FA5D83 +TCGA-77-8133,squamous,Site-96,TCGA-77-8133-01Z-00-DX1.fcc117be-cc86-437a-8500-edb045d4195a +TCGA-34-2605,squamous,Site-157,TCGA-34-2605-01Z-00-DX1.4DC249E9-26AD-4C0A-918C-3BC6C2DACDD0 +TCGA-96-7545,squamous,Site-181,TCGA-96-7545-01Z-00-DX1.AAEDB4A3-4E12-40E7-BE9F-A8F4ADA61C23 +TCGA-56-8309,squamous,Site-67,TCGA-56-8309-01Z-00-DX1.5FC5FA81-405A-48AB-AA7D-2D38003A8942 +TCGA-66-2780,squamous,Site-61,TCGA-66-2780-01Z-00-DX1.895b75f9-6127-49f9-8106-0161b5de18ec +TCGA-85-8584,squamous,Site-9,TCGA-85-8584-01Z-00-DX1.859bb615-578f-4aad-b1e3-8c92b7e3ef2d +TCGA-O2-A52S,squamous,Site-176,TCGA-O2-A52S-01Z-00-DX1.65C217A1-6757-433D-AF35-7F0923023095 +TCGA-77-8154,squamous,Site-96,TCGA-77-8154-01Z-00-DX1.5ca9afa5-e5d8-46d7-b542-c27649bd214e +TCGA-21-1080,squamous,Site-40,TCGA-21-1080-01Z-00-DX1.7afa55df-e460-406e-ae9a-43ac4a71d0b0 +TCGA-63-A5MI,squamous,Site-93,TCGA-63-A5MI-01Z-00-DX1.D9A91E42-FF46-4A88-90A0-2A0214634EB4 +TCGA-O2-A52N,squamous,Site-176,TCGA-O2-A52N-01Z-00-DX1.FF1DD149-4DBE-44C7-872D-FA7E8FE770A8 +TCGA-NC-A5HI,squamous,Site-177,TCGA-NC-A5HI-01Z-00-DX1.7C40C0F4-E3A3-4B86-A54E-3347E8CB8BBA +TCGA-66-2765,squamous,Site-61,TCGA-66-2765-01Z-00-DX1.c9d8b0f7-b2c8-45ea-8bd1-81e231fd6e91 +TCGA-33-AASI,squamous,Site-69,TCGA-33-AASI-01Z-00-DX1.71BC119D-9916-4214-9294-26301F2E430F +TCGA-66-2787,squamous,Site-61,TCGA-66-2787-01Z-00-DX1.5a4bb006-181e-417c-9868-4767f68f691a +TCGA-60-2710,squamous,Site-103,TCGA-60-2710-01Z-00-DX1.45f8ef49-d5bb-404c-9ac5-c7882bc59b45 +TCGA-NC-A5HG,squamous,Site-177,TCGA-NC-A5HG-01Z-00-DX1.129E83E5-1C4D-4B31-8750-64D10A65267F +TCGA-NK-A5CX,squamous,Site-182,TCGA-NK-A5CX-01Z-00-DX1.CA1C230C-7A29-491B-A124-7D3B600DAFB7 +TCGA-66-2800,squamous,Site-61,TCGA-66-2800-01Z-00-DX1.19a7cec8-8894-4755-aa2c-ae5266800355 +TCGA-60-2703,squamous,Site-103,TCGA-60-2703-01Z-00-DX1.13cdede5-0135-4e05-9478-3b728cad247e +TCGA-66-2785,squamous,Site-61,TCGA-66-2785-01Z-00-DX1.b9439ee1-d22b-4ccd-b53b-ce7717a37a17 +TCGA-63-A5MW,squamous,Site-93,TCGA-63-A5MW-01Z-00-DX1.DEDF1D36-BBFA-4B25-BE54-262025F9F571 +TCGA-43-5668,squamous,Site-29,TCGA-43-5668-01Z-00-DX1.0d024872-fbcc-4e91-948b-acc96a00d5eb +TCGA-39-5035,squamous,Site-78,TCGA-39-5035-01Z-00-DX1.ad5d9143-f194-4cef-9a13-03190cf97ccf +TCGA-85-A4CN,squamous,Site-9,TCGA-85-A4CN-01Z-00-DX1.051538DE-0F7E-4CE1-9D87-18155D9DCC8B +TCGA-66-2753,squamous,Site-61,TCGA-66-2753-01Z-00-DX1.ae9d8372-adb8-46f9-a140-2f0a1ddff579 +TCGA-18-3417,squamous,Site-97,TCGA-18-3417-01Z-00-DX1.B424DA03-9007-4FD0-9A92-BCB40319711C +TCGA-63-A5MH,squamous,Site-93,TCGA-63-A5MH-01Z-00-DX1.596077FF-9CE1-4EA7-9BB6-222C69872CA4 +TCGA-94-8035,squamous,Site-179,TCGA-94-8035-01Z-00-DX1.6962cc61-cb5f-44c9-95fe-1f382efec77a +TCGA-43-3920,squamous,Site-29,TCGA-43-3920-01Z-00-DX1.33ab7101-e3e6-4156-9c16-efae940c8156 +TCGA-85-8049,squamous,Site-9,TCGA-85-8049-01Z-00-DX1.59554c0f-d04a-41f2-b152-654a848b0443 +TCGA-56-7579,squamous,Site-67,TCGA-56-7579-01Z-00-DX1.627f65b9-ac66-4f71-a6f4-394338b647f0 +TCGA-NK-A5D1,squamous,Site-182,TCGA-NK-A5D1-01Z-00-DX1.CE05BFD3-F0C6-4440-AD09-8569D2BB89FA +TCGA-22-4607,squamous,Site-72,TCGA-22-4607-01Z-00-DX1.5C521672-8405-43A2-ACB5-16174A74BCB7 +TCGA-43-7658,squamous,Site-29,TCGA-43-7658-01Z-00-DX1.db69fbb6-59d7-49f5-adbf-f9c030b85b73 +TCGA-22-A5C4,squamous,Site-72,TCGA-22-A5C4-01Z-00-DX1.54058689-5CA5-4F92-B18A-86208C24C87D +TCGA-NC-A5HQ,squamous,Site-177,TCGA-NC-A5HQ-01Z-00-DX1.E7FA7F88-2B81-4EBD-8A48-883BD6953428 +TCGA-56-7223,squamous,Site-67,TCGA-56-7223-01Z-00-DX1.dd01c478-6dfa-4757-85fd-d956d70ec372 +TCGA-34-8454,squamous,Site-157,TCGA-34-8454-01Z-00-DX1.A2308ED3-E430-4448-853F-B51412354279 +TCGA-22-4593,squamous,Site-72,TCGA-22-4593-01Z-00-DX1.B573D1D5-88CE-4989-8A18-F9545E8ACA82 +TCGA-58-A46L,squamous,Site-124,TCGA-58-A46L-01Z-00-DX1.DA77E0D9-23A2-4D54-81E9-36849E04FBC8 +TCGA-90-A4ED,squamous,Site-1,TCGA-90-A4ED-01Z-00-DX1.4F9CF7EC-D7CC-4AF6-A9C2-6BA7A02A9F84 +TCGA-68-A59J,squamous,Site-178,TCGA-68-A59J-01Z-00-DX1.B143855E-2B30-4562-B809-68AA723F5BD9 +TCGA-85-8666,squamous,Site-9,TCGA-85-8666-01Z-00-DX1.d856233a-beef-4da2-a724-759fe23e6227 +TCGA-94-8491,squamous,Site-179,TCGA-94-8491-01Z-00-DX1.CC486AFA-3215-4CED-9000-EE4095FF7947 +TCGA-56-A4BW,squamous,Site-67,TCGA-56-A4BW-01Z-00-DX1.8ED20788-092B-4C21-9D83-5423D853422F +TCGA-22-4595,squamous,Site-72,TCGA-22-4595-01Z-00-DX1.42F28D1E-7AC2-4113-B4EC-F1E7850EA14D +TCGA-52-7812,squamous,Site-149,TCGA-52-7812-01Z-00-DX1.dd6fa49a-f9fe-40a1-80b2-0824b128f3b2 +TCGA-63-A5MV,squamous,Site-93,TCGA-63-A5MV-01Z-00-DX1.72B97C7E-0AE9-468B-9E85-C98BA40CBBCB +TCGA-85-8277,squamous,Site-9,TCGA-85-8277-01Z-00-DX1.ffad077e-e969-40ba-8504-2f2103911758 +TCGA-43-6647,squamous,Site-29,TCGA-43-6647-01Z-00-DX1.ac38a9bb-d5af-42c1-b96c-c1d6b958dd17 +TCGA-56-7221,squamous,Site-67,TCGA-56-7221-01Z-00-DX1.f897f1ee-2796-4183-9312-f47871383327 +TCGA-NC-A5HH,squamous,Site-177,TCGA-NC-A5HH-01Z-00-DX1.4138DC62-3487-4F4B-8D5C-D7693B303387 +TCGA-34-7107,squamous,Site-157,TCGA-34-7107-01Z-00-DX1.c72202e6-740d-4891-945c-f360f7c56bc9 +TCGA-22-4591,squamous,Site-72,TCGA-22-4591-01Z-00-DX1.8C89ABEB-A1E3-47D1-B247-02539EF3F959 +TCGA-33-AASB,squamous,Site-69,TCGA-33-AASB-01Z-00-DX1.7AAF5628-3B4D-4230-8375-AB1A51054466 +TCGA-22-4596,squamous,Site-72,TCGA-22-4596-01Z-00-DX1.0B754257-5EBB-4ABF-A847-7EFA71C7D9E7 +TCGA-56-7822,squamous,Site-67,TCGA-56-7822-01Z-00-DX1.c387e241-ff4c-44ef-af45-12a85066f0ce +TCGA-22-5485,squamous,Site-72,TCGA-22-5485-01Z-00-DX1.952DBCF0-08DA-49BD-B020-DEC1A77A1D81 +TCGA-39-5021,squamous,Site-78,TCGA-39-5021-01Z-00-DX1.4d22d3ca-cb06-43d0-aba7-583f1d706105 +TCGA-37-4129,squamous,Site-33,TCGA-37-4129-01Z-00-DX1.0b52aa1e-20d4-4084-8fa1-2f1b7f09dd5b +TCGA-77-6843,squamous,Site-96,TCGA-77-6843-01Z-00-DX1.5ced4995-81a1-4dfd-82b2-0cabf4538bc5 +TCGA-43-6143,squamous,Site-29,TCGA-43-6143-01Z-00-DX1.52d974a8-3f07-4e7f-8d3f-5d47f321298c +TCGA-21-1076,squamous,Site-40,TCGA-21-1076-01Z-00-DX1.533ca070-93e0-4d49-8770-8f42f341ca38 +TCGA-85-6560,squamous,Site-9,TCGA-85-6560-01Z-00-DX1.bf857c85-ebe7-4135-a4ec-038a57efc319 +TCGA-98-A53D,squamous,Site-175,TCGA-98-A53D-01Z-00-DX1.C83DC622-2C62-40D6-AC07-E8275D414B22 +TCGA-LA-A446,squamous,Site-23,TCGA-LA-A446-01Z-00-DX1.0DC9ABAF-2FED-4F4E-B453-299441CEC607 +TCGA-98-8021,squamous,Site-175,TCGA-98-8021-01Z-00-DX1.2ab0e341-e248-4df3-84c4-df5e60dc0372 +TCGA-85-A4CL,squamous,Site-9,TCGA-85-A4CL-01Z-00-DX1.05BC31FF-3B88-489F-9BD2-5B2B31C72EDB +TCGA-85-A4JB,squamous,Site-9,TCGA-85-A4JB-01Z-00-DX1.3CEC3662-68C1-4874-8273-540459B8F138 +TCGA-63-7021,squamous,Site-93,TCGA-63-7021-01Z-00-DX1.568DD2DB-2196-4817-AA00-8843F9A24A62 +TCGA-39-5024,squamous,Site-78,TCGA-39-5024-01Z-00-DX1.b159135e-c9d5-4768-83a6-1de9961d20ad +TCGA-39-5028,squamous,Site-78,TCGA-39-5028-01Z-00-DX1.7994ec22-746d-4c30-8138-e6c9bc67c71f +TCGA-96-8169,squamous,Site-181,TCGA-96-8169-01Z-00-DX1.d2e29d5e-abd4-4f36-958c-d9d22b836a34 +TCGA-98-8020,squamous,Site-175,TCGA-98-8020-01Z-00-DX1.fcf8e956-c413-40ae-9f09-610403e70fb3 +TCGA-21-5787,squamous,Site-40,TCGA-21-5787-01Z-00-DX1.FEE037E3-B9B0-4C2E-97EF-D6E4F64E1DF9 +TCGA-63-A5MR,squamous,Site-93,TCGA-63-A5MR-01Z-00-DX1.CCF83112-AEEA-4F76-A43F-2AADE321CE68 +TCGA-94-7943,squamous,Site-179,TCGA-94-7943-01Z-00-DX1.361fc645-89ae-4934-8c27-12907bc2a9ee +TCGA-22-0944,squamous,Site-72,TCGA-22-0944-01Z-00-DX1.3DC7624F-2D59-40D5-9E1F-95FCCD80D0C4 +TCGA-6A-AB49,squamous,Site-146,TCGA-6A-AB49-01Z-00-DX1.FDF2EED7-57A3-4019-A382-21DED11780F6 +TCGA-56-1622,squamous,Site-67,TCGA-56-1622-01Z-00-DX1.d4178664-1e21-452f-9dbf-1b23467ddabc +TCGA-46-3767,squamous,Site-112,TCGA-46-3767-01Z-00-DX1.1f1899f9-7d6e-438c-b7ac-701ea088734d +TCGA-66-2766,squamous,Site-61,TCGA-66-2766-01Z-00-DX1.05729d5f-7c12-44e1-9a61-90296f555e89 +TCGA-85-A510,squamous,Site-9,TCGA-85-A510-01Z-00-DX1.F51CBC2D-EAD4-48B8-9444-C864012562F4 +TCGA-22-1012,squamous,Site-72,TCGA-22-1012-01Z-00-DX1.053F81FA-F91A-42B0-8F12-CD2495FA9E99 +TCGA-60-2714,squamous,Site-103,TCGA-60-2714-01Z-00-DX1.13f280a8-c8d3-48bb-8a38-49ce9ad9e1a2 +TCGA-63-A5M9,squamous,Site-93,TCGA-63-A5M9-01Z-00-DX1.4E04AC12-6DAC-4DA2-A991-AFCD42FA1CF2 +TCGA-NC-A5HF,squamous,Site-177,TCGA-NC-A5HF-01Z-00-DX1.6371242D-227A-4531-8BB3-FD1AABF770C9 +TCGA-77-8007,squamous,Site-96,TCGA-77-8007-01Z-00-DX1.06f0e4a1-936d-4bc2-acbf-cfec25fed190 +TCGA-46-3768,squamous,Site-112,TCGA-46-3768-01Z-00-DX1.226f6cbc-f1f4-4df8-8366-c94fd78e037b +TCGA-60-2695,squamous,Site-103,TCGA-60-2695-01Z-00-DX1.4cc6c566-d60d-4ff9-9ea8-18a9da46ae6b +TCGA-39-5011,squamous,Site-78,TCGA-39-5011-01Z-00-DX1.be58f65a-0b90-4ba4-bc5f-df74d4a3df22 +TCGA-18-4083,squamous,Site-97,TCGA-18-4083-01Z-00-DX1.337D4583-326F-4108-9441-DC74FAED6AE2 +TCGA-77-8153,squamous,Site-96,TCGA-77-8153-01Z-00-DX1.E8E40968-E7AD-4EA2-A832-8EC04D5CB7A1 +TCGA-85-7699,squamous,Site-9,TCGA-85-7699-01Z-00-DX1.418bb5e3-9721-41b0-82f6-21ce3cc29de2 +TCGA-56-8626,squamous,Site-67,TCGA-56-8626-01Z-00-DX1.C9836EAF-F005-40E9-B781-1D5586141F0D +TCGA-63-A5MB,squamous,Site-93,TCGA-63-A5MB-01Z-00-DX1.BA2648E1-4683-4992-A00C-96179C51A002 +TCGA-56-A49D,squamous,Site-67,TCGA-56-A49D-01Z-00-DX1.1ECF40A0-1490-42B3-8924-A80E4ED35F93 +TCGA-77-8130,squamous,Site-96,TCGA-77-8130-01Z-00-DX1.e9f78b45-ed59-4e80-83c2-c6d5bb68e622 +TCGA-21-5782,squamous,Site-40,TCGA-21-5782-01Z-00-DX1.E0C856C0-9055-4437-9393-695D699E4F57 +TCGA-33-6738,squamous,Site-69,TCGA-33-6738-01Z-00-DX2.3a2fa17d-0297-4b0c-8a5f-aa18c5342ef4 +TCGA-56-6545,squamous,Site-67,TCGA-56-6545-01Z-00-DX1.24a6678b-2dbf-4539-844e-1e3786c3fac3 +TCGA-77-8143,squamous,Site-96,TCGA-77-8143-01Z-00-DX1.e844a7e1-ebba-4acb-8fe2-08aee2102848 +TCGA-66-2794,squamous,Site-61,TCGA-66-2794-01Z-00-DX1.933c35d1-b63f-44ba-928a-1f49c7fbc452 +TCGA-66-2742,squamous,Site-61,TCGA-66-2742-01Z-00-DX1.8fdd6990-a08c-457b-80e4-586c619a784e +TCGA-66-2770,squamous,Site-61,TCGA-66-2770-01Z-00-DX2.58f25170-ac3a-471a-80a0-0586f2ae1669 +TCGA-22-4609,squamous,Site-72,TCGA-22-4609-01Z-00-DX1.35112DF6-5AD4-4EF5-8D00-41AEED41C246 +TCGA-22-5473,squamous,Site-72,TCGA-22-5473-01Z-00-DX1.C452FE31-443B-43AE-A5FA-451DF0127F8F +TCGA-37-3783,squamous,Site-33,TCGA-37-3783-01Z-00-DX1.08a2c536-e48d-416c-bcad-9db7b2c04c1a +TCGA-21-5784,squamous,Site-40,TCGA-21-5784-01Z-00-DX1.E50E7F4B-BE37-4171-94A7-E824CFF4B3BB +TCGA-94-A5I4,squamous,Site-179,TCGA-94-A5I4-01Z-00-DX1.F0CE4558-63A7-45FB-BA8C-1B20AB29A847 +TCGA-21-1082,squamous,Site-40,TCGA-21-1082-01Z-00-DX1.cae5bbaa-76f5-416d-b532-16cf773e017e +TCGA-85-6561,squamous,Site-9,TCGA-85-6561-01Z-00-DX1.7bb9f936-00e1-4014-a6ae-8dc1c64d7f42 +TCGA-85-A4QQ,squamous,Site-9,TCGA-85-A4QQ-01Z-00-DX1.321B5149-FC07-434D-9819-3F8821D77849 +TCGA-58-A46N,squamous,Site-124,TCGA-58-A46N-01Z-00-DX1.5D51252C-5294-4E40-ABB2-31C3BE1039F8 +TCGA-77-A5G7,squamous,Site-96,TCGA-77-A5G7-01Z-00-DX1.F1FECF5D-C940-43D0-B49B-66C95AD9C85A +TCGA-34-5929,squamous,Site-157,TCGA-34-5929-01Z-00-DX1.a461aed7-0ba9-44dc-8f5b-941cb57aa855 +TCGA-85-7950,squamous,Site-9,TCGA-85-7950-01Z-00-DX1.ddf7d987-7e4f-4327-83d5-0047e2806ed1 +TCGA-46-6026,squamous,Site-112,TCGA-46-6026-01Z-00-DX1.e0248ea6-ae04-4960-95a8-f39b8e168a38 +TCGA-60-2721,squamous,Site-103,TCGA-60-2721-01Z-00-DX1.8ea139d5-018b-4a66-915e-4784c808742e +TCGA-21-5783,squamous,Site-40,TCGA-21-5783-01Z-00-DX1.3FAB28DF-5748-42D5-8257-3D440C4FB5FB +TCGA-56-8624,squamous,Site-67,TCGA-56-8624-01Z-00-DX1.18DFDAB8-1701-4E15-9321-7D99CD15E4B0 +TCGA-22-5491,squamous,Site-72,TCGA-22-5491-01Z-00-DX1.19908BEC-3DDA-4BCD-9EB1-0D2C4A1E5110 +TCGA-34-5241,squamous,Site-157,TCGA-34-5241-01Z-00-DX1.7abc4737-9a9c-4270-8297-58ea167cea63 +TCGA-56-8304,squamous,Site-67,TCGA-56-8304-01Z-00-DX1.F7A7975D-C8AB-49C0-B9CD-18CFD01A0655 +TCGA-52-7809,squamous,Site-149,TCGA-52-7809-01Z-00-DX1.91f184ce-e4e1-42da-bdbc-36d6781bbf31 +TCGA-60-2719,squamous,Site-103,TCGA-60-2719-01Z-00-DX1.e834671a-7468-4517-bf95-4e100f87e9a3 +TCGA-60-2713,squamous,Site-103,TCGA-60-2713-01Z-00-DX1.8cc5fb8d-be9c-4f25-b088-85e1feff834a +TCGA-98-A539,squamous,Site-175,TCGA-98-A539-01Z-00-DX1.420AA419-0146-4991-B98D-28E647E45F07 +TCGA-22-5474,squamous,Site-72,TCGA-22-5474-01Z-00-DX1.8736FB24-7E65-4ACB-9325-382D7F864F62 +TCGA-56-7582,squamous,Site-67,TCGA-56-7582-01Z-00-DX1.81e6c419-f6a1-49cf-9a83-7c1a679fbfc6 +TCGA-77-7140,squamous,Site-96,TCGA-77-7140-01Z-00-DX1.e75db0f0-839e-4ade-aa52-aa4736318d6c +TCGA-18-5592,squamous,Site-97,TCGA-18-5592-01Z-00-DX1.41BD6380-A3A0-4ED1-8752-42BBC3B4680C +TCGA-43-3394,squamous,Site-29,TCGA-43-3394-01Z-00-DX1.4c2f49b9-9dac-41d8-a62b-64c8928caa3c +TCGA-85-8351,squamous,Site-9,TCGA-85-8351-01Z-00-DX1.E3340212-A6C4-4CB4-89A3-E632D3481BE4 +TCGA-33-A5GW,squamous,Site-69,TCGA-33-A5GW-01Z-00-DX1.6F0833B7-EDAB-4DF4-83D9-393D8ADBFCF3 +TCGA-60-2711,squamous,Site-103,TCGA-60-2711-01Z-00-DX1.186b9a92-fcae-45b6-a52c-e8da0c1a419c +TCGA-77-8131,squamous,Site-96,TCGA-77-8131-01Z-00-DX1.dcb8e2c7-0d2f-4b38-9db8-cc026a61699e +TCGA-60-2706,squamous,Site-103,TCGA-60-2706-01Z-00-DX1.db003232-66ab-437e-81c7-f490310ca03a +TCGA-39-5036,squamous,Site-78,TCGA-39-5036-01Z-00-DX1.5a601fbd-a711-478e-a471-2046d2b63f7c +TCGA-37-4135,squamous,Site-33,TCGA-37-4135-01Z-00-DX1.3f193ee1-530c-4f37-a669-56af75bba8db +TCGA-22-1005,squamous,Site-72,TCGA-22-1005-01Z-00-DX1.4A151CC3-E57F-4345-817A-2B9DBAF98477 +TCGA-77-7138,squamous,Site-96,TCGA-77-7138-01Z-00-DX1.8c912762-0829-4692-92a2-545fea2b3e48 +TCGA-66-2758,squamous,Site-61,TCGA-66-2758-01Z-00-DX1.23611c66-bd21-47d1-abb0-2f3da38c377f +TCGA-63-5131,squamous,Site-93,TCGA-63-5131-01Z-00-DX1.C1C3724A-D9FC-46D6-9D7B-9357A58ACAEF +TCGA-94-A5I6,squamous,Site-179,TCGA-94-A5I6-01Z-00-DX1.55C3DD79-D988-47FA-81FE-2BA20C291530 +TCGA-63-A5MT,squamous,Site-93,TCGA-63-A5MT-01Z-00-DX1.9F168173-804C-49CA-B157-7970CB55ED93 +TCGA-77-8145,squamous,Site-96,TCGA-77-8145-01Z-00-DX1.9d76de05-3cb7-4cd8-879b-11ea7f88d583 +TCGA-18-3407,squamous,Site-97,TCGA-18-3407-01Z-00-DX1.E7DDA7E3-45A2-4E06-A081-E6A81A672155 +TCGA-51-4081,squamous,Site-130,TCGA-51-4081-01Z-00-DX1.5971d87d-5181-4c7a-91ea-2dfad1bc5fca +TCGA-22-5481,squamous,Site-72,TCGA-22-5481-01Z-00-DX1.6BB77BD0-92FF-43FF-A86C-9EF41B2921DA +TCGA-56-8083,squamous,Site-67,TCGA-56-8083-01Z-00-DX1.140c8d5b-f660-4fef-b8da-6bb2c119c021 +TCGA-22-1000,squamous,Site-72,TCGA-22-1000-01Z-00-DX1.D2ACF3D3-6187-49A1-ABBC-8D8E611EEE55 +TCGA-63-6202,squamous,Site-93,TCGA-63-6202-01Z-00-DX1.A0A13923-E41B-4A6D-BC55-532D8E90DE0F +TCGA-92-8063,squamous,Site-183,TCGA-92-8063-01Z-00-DX1.692a8000-846e-4ed7-b790-b3281acb9dec +TCGA-98-8023,squamous,Site-175,TCGA-98-8023-01Z-00-DX1.d04d976f-0a9f-44c6-a5f1-4b06885847f8 +TCGA-56-8308,squamous,Site-67,TCGA-56-8308-01Z-00-DX1.48CAE02B-05B1-4681-B68C-DD32C414FD2D +TCGA-43-6770,squamous,Site-29,TCGA-43-6770-01Z-00-DX1.466dd07f-b147-48bb-9349-fe1f5f3bcae5 +TCGA-39-5040,squamous,Site-78,TCGA-39-5040-01Z-00-DX1.F314DFF3-AEBE-4171-8F04-5D2A61FDE208 +TCGA-63-7023,squamous,Site-93,TCGA-63-7023-01Z-00-DX1.B2435594-037A-40F2-B90F-60B76D2C859D +TCGA-18-3408,squamous,Site-97,TCGA-18-3408-01Z-00-DX1.B4AFC08A-7460-4EE6-B033-629C6A6CA6E8 +TCGA-56-8504,squamous,Site-67,TCGA-56-8504-01Z-00-DX1.032AF45F-89C3-4265-8287-A455780DA887 +TCGA-18-3406,squamous,Site-97,TCGA-18-3406-01Z-00-DX1.8D07F006-425C-4724-BBB3-5BA099401234 +TCGA-85-8352,squamous,Site-9,TCGA-85-8352-01Z-00-DX1.47426016-95BC-49C0-9844-06A3A7EAE5D6 +TCGA-56-A62T,squamous,Site-67,TCGA-56-A62T-01Z-00-DX1.05B9E747-DF22-4FFB-A79A-C023AB667CA0 +TCGA-77-8140,squamous,Site-96,TCGA-77-8140-01Z-00-DX1.b35a0672-9f04-4c34-a894-0ccc38151639 +TCGA-63-A5MS,squamous,Site-93,TCGA-63-A5MS-01Z-00-DX1.F58D7779-B8E0-4086-AF14-256B812F83CE +TCGA-77-7338,squamous,Site-96,TCGA-77-7338-01Z-00-DX1.52f2f410-a5cb-47e8-a4f9-242616a21e8e +TCGA-60-2708,squamous,Site-103,TCGA-60-2708-01Z-00-DX1.3b59f6dc-02f8-4118-9037-f18ebb9bd539 +TCGA-33-AASJ,squamous,Site-69,TCGA-33-AASJ-01Z-00-DX1.0201CE2A-4DE6-4B4E-A010-37D773207F52 +TCGA-66-2792,squamous,Site-61,TCGA-66-2792-01Z-00-DX1.988d556c-f65e-4cae-892d-4dfe046ea626 +TCGA-18-4086,squamous,Site-97,TCGA-18-4086-01Z-00-DX1.1D06B771-F978-4763-8D4E-7B540E02C55A +TCGA-77-8144,squamous,Site-96,TCGA-77-8144-01Z-00-DX1.5194d537-c942-45f3-967b-eb213646ff24 +TCGA-37-4133,squamous,Site-33,TCGA-37-4133-01Z-00-DX1.3dfcb0df-6f4b-4df7-9d04-085b36e03024 +TCGA-85-7710,squamous,Site-9,TCGA-85-7710-01Z-00-DX1.8bcfe326-1c19-4ebe-879d-83e77739f006 +TCGA-39-5030,squamous,Site-78,TCGA-39-5030-01Z-00-DX1.09d96fb6-9a19-40ae-967e-49f378340f9b +TCGA-56-8307,squamous,Site-67,TCGA-56-8307-01Z-00-DX1.20EE4B4C-403F-44AA-8B0E-711003E60B8F +TCGA-22-5489,squamous,Site-72,TCGA-22-5489-01Z-00-DX1.0518AF53-5642-40FB-A631-4C50D7707C8F +TCGA-37-4130,squamous,Site-33,TCGA-37-4130-01Z-00-DX1.37ccfdfe-fc20-4809-8dc7-e342242d0bfa +TCGA-85-8481,squamous,Site-9,TCGA-85-8481-01Z-00-DX1.0f626863-3eb5-4b35-b4fe-22a03d767f15 +TCGA-18-3409,squamous,Site-97,TCGA-18-3409-01Z-00-DX1.3E32CC78-B066-4F85-A781-EDAF39DE0704 +TCGA-60-2704,squamous,Site-103,TCGA-60-2704-01Z-00-DX1.e904afb3-7935-4df5-90dc-f7151e15b31b +TCGA-39-5037,squamous,Site-78,TCGA-39-5037-01Z-00-DX1.02636c3e-c843-4741-8fbf-406957fe9d87 +TCGA-66-2744,squamous,Site-61,TCGA-66-2744-01Z-00-DX1.e67796a0-fc9e-4137-8cbf-fae91ea90a7c +TCGA-33-A4WN,squamous,Site-69,TCGA-33-A4WN-01Z-00-DX2.0AC8FFA9-BD94-4B28-8066-5430D6053432 +TCGA-34-2600,squamous,Site-157,TCGA-34-2600-01Z-00-DX1.CDCAAE91-3056-4952-B988-00F3722B520B +TCGA-66-2783,squamous,Site-61,TCGA-66-2783-01Z-00-DX1.34df2ea9-f8d8-448e-bd54-02364f434355 +TCGA-43-6771,squamous,Site-29,TCGA-43-6771-01Z-00-DX1.445d4d3f-6a28-4ea7-b7ab-612366c7bb4b +TCGA-18-3416,squamous,Site-97,TCGA-18-3416-01Z-00-DX1.EDAD3DEB-8A86-4F34-82EC-6691BFDE900D +TCGA-56-7823,squamous,Site-67,TCGA-56-7823-01Z-00-DX1.354861d2-79f1-4b70-a74e-20cb0f78468e +TCGA-22-4605,squamous,Site-72,TCGA-22-4605-01Z-00-DX1.5B725941-80CE-497B-8A1E-856D16DD2870 +TCGA-60-2726,squamous,Site-103,TCGA-60-2726-01Z-00-DX1.3c30934b-c93a-4dad-9c44-b417eae17137 +TCGA-77-6844,squamous,Site-96,TCGA-77-6844-01Z-00-DX1.5ace8702-ea00-4e0d-84e8-6615fc361f5a +TCGA-60-2715,squamous,Site-103,TCGA-60-2715-01Z-00-DX1.22710dc4-eaf5-44c2-8438-c6dcfae1d452 +TCGA-77-8139,squamous,Site-96,TCGA-77-8139-01Z-00-DX1.242d68e3-5ee0-4484-8403-e2ffd1ede20b +TCGA-33-4538,squamous,Site-69,TCGA-33-4538-01Z-00-DX2.62dcecc8-576b-406b-81db-1651e5ea589e +TCGA-94-7557,squamous,Site-179,TCGA-94-7557-01Z-00-DX1.ac61f900-6f9f-44eb-91ad-98c98af8b741 +TCGA-66-2791,squamous,Site-61,TCGA-66-2791-01Z-00-DX1.7ae2b6ab-5284-4772-bc1e-fd608f70e583 +TCGA-77-8150,squamous,Site-96,TCGA-77-8150-01Z-00-DX1.8ae93dac-2038-4426-9ddf-6fb857f6938a +TCGA-21-1078,squamous,Site-40,TCGA-21-1078-01Z-00-DX1.12d4a290-04fa-413e-b274-7c143d42c266 +TCGA-85-8288,squamous,Site-9,TCGA-85-8288-01Z-00-DX1.0e68535f-a1f3-450c-b8fa-a859f2269da8 +TCGA-66-2767,squamous,Site-61,TCGA-66-2767-01Z-00-DX1.7bcc7633-3e70-46e3-8689-542c23577ed1 +TCGA-46-3765,squamous,Site-112,TCGA-46-3765-01Z-00-DX1.f45e4e30-e60c-40e5-a0b7-4513c0c37fda +TCGA-56-8503,squamous,Site-67,TCGA-56-8503-01Z-00-DX1.87FEB654-FC88-4CDA-8F52-ACF0D9A19191 +TCGA-66-2781,squamous,Site-61,TCGA-66-2781-01Z-00-DX1.ed9ff5b7-7c66-4bf7-bfb5-ca48b9e1a54e +TCGA-63-A5MN,squamous,Site-93,TCGA-63-A5MN-01Z-00-DX1.34F9BFA5-847A-456B-8159-7993A0F78543 +TCGA-85-7844,squamous,Site-9,TCGA-85-7844-01Z-00-DX1.9035209b-75ae-4c9b-8edf-5851af490f29 +TCGA-NK-A5CR,squamous,Site-182,TCGA-NK-A5CR-01Z-00-DX1.A7C57B30-E2C6-4A23-AE71-7E4D7714F8EA +TCGA-56-5898,squamous,Site-67,TCGA-56-5898-01Z-00-DX1.d539fc8b-ae31-4820-8c2e-5cf587958eb3 +TCGA-66-2759,squamous,Site-61,TCGA-66-2759-01Z-00-DX1.10135ce3-22d5-4802-907d-057fe54f5554 +TCGA-77-A5G1,squamous,Site-96,TCGA-77-A5G1-01Z-00-DX1.ABE53201-3FAD-4F05-9F2B-91BA6257569C +TCGA-NC-A5HK,squamous,Site-177,TCGA-NC-A5HK-01Z-00-DX1.9CC2445A-D1EA-4E6E-96CF-710D2CDA073B +TCGA-43-6773,squamous,Site-29,TCGA-43-6773-01Z-00-DX1.a05f9597-51b5-4a5a-9936-c61535b115bb +TCGA-85-A4QR,squamous,Site-9,TCGA-85-A4QR-01Z-00-DX1.B5E18F89-D5B4-4F51-BC7E-9FA868A6A443 +TCGA-92-7340,squamous,Site-183,TCGA-92-7340-01Z-00-DX1.e0b0240c-9b6c-4da9-b9cb-bc634051fc45 +TCGA-60-2722,squamous,Site-103,TCGA-60-2722-01Z-00-DX1.f3781266-e8dc-4386-9702-5b29e6f2cfa3 +TCGA-60-2709,squamous,Site-103,TCGA-60-2709-01Z-00-DX1.ded10e7c-c1ca-4402-be0e-f3978994d4af +TCGA-56-8622,squamous,Site-67,TCGA-56-8622-01Z-00-DX1.62B552C4-5CBE-4469-8DBB-E27497FF95F4 +TCGA-18-3410,squamous,Site-97,TCGA-18-3410-01Z-00-DX1.DB186D75-4AEE-4E1B-83D5-5A1970F03581 +TCGA-33-AASL,squamous,Site-69,TCGA-33-AASL-01Z-00-DX1.B6BE8062-EFB3-4615-8F9A-52E2C6D4617F +TCGA-96-7544,squamous,Site-181,TCGA-96-7544-01Z-00-DX1.44ce4169-c232-4a9b-ac40-55226179c4fa +TCGA-63-7022,squamous,Site-93,TCGA-63-7022-01Z-00-DX1.93EE6221-32E1-4E97-ABDB-E1814B655852 +TCGA-34-A5IX,squamous,Site-157,TCGA-34-A5IX-01Z-00-DX1.E10EE7FC-91EA-4A39-8B1B-1312C8D9620B +TCGA-66-2771,squamous,Site-61,TCGA-66-2771-01Z-00-DX1.b84a28ec-2726-4386-9445-0f5ca351d985 +TCGA-43-A56V,squamous,Site-29,TCGA-43-A56V-01Z-00-DX1.AA93FE03-FA7D-42C4-A118-B98C2400D9DA +TCGA-NC-A5HL,squamous,Site-177,TCGA-NC-A5HL-01Z-00-DX1.E40E17C1-42BA-4C02-AA01-0E2891B3174A +TCGA-77-7465,squamous,Site-96,TCGA-77-7465-01Z-00-DX1.25e4b0b4-4948-432f-8010-a6c6e4652cab +TCGA-22-4601,squamous,Site-72,TCGA-22-4601-01Z-00-DX1.2EABA7B1-1E31-4AFE-B97B-D40EEAD66D85 +TCGA-92-8064,squamous,Site-183,TCGA-92-8064-01Z-00-DX1.72c1a8db-146c-4408-8d31-43baba769fdf +TCGA-39-5022,squamous,Site-78,TCGA-39-5022-01Z-00-DX1.1bf6a4da-0b59-4478-bcf1-b004f0356342 +TCGA-43-8115,squamous,Site-29,TCGA-43-8115-01Z-00-DX1.1CFF45F8-6155-498C-AA52-269E6547FE0D +TCGA-39-5016,squamous,Site-78,TCGA-39-5016-01Z-00-DX1.b568cdcd-f8bf-4917-ae40-dbd505c36d1e +TCGA-66-2763,squamous,Site-61,TCGA-66-2763-01Z-00-DX1.99bc6dec-4e57-45bb-8d94-4dc38657d852 +TCGA-MF-A522,squamous,Site-151,TCGA-MF-A522-01Z-00-DX1.4934A011-8C04-43B7-A358-16AA64BB6B0C +TCGA-85-A4PA,squamous,Site-9,TCGA-85-A4PA-01Z-00-DX1.00AF818F-39BA-4EA7-8873-9B6C4F2D2141 +TCGA-66-2769,squamous,Site-61,TCGA-66-2769-01Z-00-DX1.101b6493-a7b3-42cf-ae2f-a90948f3fb8f +TCGA-56-A5DR,squamous,Site-67,TCGA-56-A5DR-01Z-00-DX1.F8F196F2-9380-490E-9D77-ABAC1E58BC5C +TCGA-33-4583,squamous,Site-69,TCGA-33-4583-01Z-00-DX2.8fc10fa1-82b8-49c5-972b-e4303b4293a7 +TCGA-92-8065,squamous,Site-183,TCGA-92-8065-01Z-00-DX1.cfadb520-1653-4082-9fc3-0680bf39253b +TCGA-60-2724,squamous,Site-103,TCGA-60-2724-01Z-00-DX1.98f9e48f-9c09-4969-aa34-05666616ee9a +TCGA-66-2727,squamous,Site-61,TCGA-66-2727-01Z-00-DX1.cab55166-3a9a-4f2e-9f7a-33859ab32d12 +TCGA-77-7142,squamous,Site-96,TCGA-77-7142-01Z-00-DX1.699e0ebe-b37b-425f-a0a5-ddb0263da84c +TCGA-33-AASD,squamous,Site-69,TCGA-33-AASD-01Z-00-DX1.DAB3D05A-A6CE-47D1-90B6-79EBABAC3454 +TCGA-37-5819,squamous,Site-33,TCGA-37-5819-01Z-00-DX1.3647F836-7BB4-4D41-A85E-CFC0B5CBA216 +TCGA-22-4613,squamous,Site-72,TCGA-22-4613-01Z-00-DX1.4BC3B7B8-99E7-4744-8F5A-089ECCDE144E +TCGA-43-A56U,squamous,Site-29,TCGA-43-A56U-01Z-00-DX1.B60C921A-4EE4-4CAE-9E55-311C7D2DA9E4 +TCGA-37-3789,squamous,Site-33,TCGA-37-3789-01Z-00-DX1.d7a4923e-988e-45e9-a64e-30c983740614 +TCGA-22-4604,squamous,Site-72,TCGA-22-4604-01Z-00-DX1.C398D192-8564-42AB-89B8-461F0830D8A1 +TCGA-34-5232,squamous,Site-157,TCGA-34-5232-01Z-00-DX1.7483c36b-8473-439b-a726-2821ec834fc2 +TCGA-85-8072,squamous,Site-9,TCGA-85-8072-01Z-00-DX1.3a0ad5a6-c93e-428c-94e7-809ceaf01ef1 +TCGA-77-6845,squamous,Site-96,TCGA-77-6845-01Z-00-DX1.1d8e2711-a82c-4d9b-97e2-d966417731b7 +TCGA-NC-A5HR,squamous,Site-177,TCGA-NC-A5HR-01Z-00-DX1.B8B85EDD-96A6-40DF-B3BF-C97311A7E552 +TCGA-85-8070,squamous,Site-9,TCGA-85-8070-01Z-00-DX1.54e7edf1-28d3-4fd5-bab3-951e58620386 +TCGA-39-5027,squamous,Site-78,TCGA-39-5027-01Z-00-DX1.54a3e035-7178-4786-870b-1f6adfb82331 +TCGA-56-7731,squamous,Site-67,TCGA-56-7731-01Z-00-DX1.e205943f-cb2e-4006-b3be-b07d95fea663 +TCGA-46-3769,squamous,Site-112,TCGA-46-3769-01Z-00-DX1.97260244-fe08-42da-b5db-80da7f7d8766 +TCGA-77-A5GH,squamous,Site-96,TCGA-77-A5GH-01Z-00-DX1.2F4C2F84-17AE-415A-BF6C-01A899BAAE68 +TCGA-21-1079,squamous,Site-40,TCGA-21-1079-01Z-00-DX1.fb8e44cd-f73f-48a4-82d2-34f334d79f53 +TCGA-56-8082,squamous,Site-67,TCGA-56-8082-01Z-00-DX1.f0056ce7-58cf-4fc7-b249-ddeadeafafb3 +TCGA-77-8146,squamous,Site-96,TCGA-77-8146-01Z-00-DX1.19213f57-1890-4e56-9909-86e506be1833 +TCGA-68-A59I,squamous,Site-178,TCGA-68-A59I-01Z-00-DX1.6EE9E466-AE08-449D-A830-98667E6D48C7 +TCGA-18-3419,squamous,Site-97,TCGA-18-3419-01Z-00-DX1.69597BCF-2D4E-4D9B-B4E8-C03137445AF4 +TCGA-60-2723,squamous,Site-103,TCGA-60-2723-01Z-00-DX1.1c12de5a-5d31-468e-9169-feb23d24e7d6 +TCGA-85-8048,squamous,Site-9,TCGA-85-8048-01Z-00-DX1.1a663caa-dff1-4f03-b06b-c8b8b5ce08c6 +TCGA-18-3412,squamous,Site-97,TCGA-18-3412-01Z-00-DX1.699A17A5-356B-42F8-815A-5DBD60293242 +TCGA-NC-A5HJ,squamous,Site-177,TCGA-NC-A5HJ-01Z-00-DX1.0BC0D0C7-F3D3-463E-BA94-C4289D97F1D9 +TCGA-NK-A5CT,squamous,Site-182,TCGA-NK-A5CT-01Z-00-DX1.D41A2900-E398-434F-AAC4-5CEA1FDC0479 +TCGA-66-2754,squamous,Site-61,TCGA-66-2754-01Z-00-DX1.3cd19122-06f3-420e-9c3f-362225ff49eb +TCGA-37-A5EM,squamous,Site-33,TCGA-37-A5EM-01Z-00-DX1.FF7B9A1C-9D2C-43E4-9AE9-711214ACF77D +TCGA-56-8623,squamous,Site-67,TCGA-56-8623-01Z-00-DX1.119C11E2-16E4-48A9-B8BD-6F1E8CBFEBA1 +TCGA-66-2789,squamous,Site-61,TCGA-66-2789-01Z-00-DX1.93719e52-7dd4-4bbb-b908-c91db1d99b76 +TCGA-90-A4EE,squamous,Site-1,TCGA-90-A4EE-01Z-00-DX1.3FC7A945-7D20-4A50-9D9E-436924F1A103 +TCGA-39-5039,squamous,Site-78,TCGA-39-5039-01Z-00-DX1.29f1281d-93d7-4499-bfce-19eeeb1870a4 +TCGA-68-7757,squamous,Site-178,TCGA-68-7757-01Z-00-DX1.4eece254-a119-436a-af3c-fad8e8b7f74b +TCGA-22-5483,squamous,Site-72,TCGA-22-5483-01Z-00-DX1.79EF4B70-4703-4FF1-81CA-20014A932437 +TCGA-85-A5B5,squamous,Site-9,TCGA-85-A5B5-01Z-00-DX1.88097FC2-BAAD-4986-AECD-C07C9F3ED856 +TCGA-22-1011,squamous,Site-72,TCGA-22-1011-01Z-00-DX1.A67E3D3F-4B94-4335-8199-03B7CB75608E +TCGA-56-5897,squamous,Site-67,TCGA-56-5897-01Z-00-DX1.776271b1-77a9-4653-8ed3-13294c9fc056 +TCGA-J1-A4AH,squamous,Site-2,TCGA-J1-A4AH-01Z-00-DX1.702A5CC6-0BFA-42F4-995B-8567A1DE0D09 +TCGA-22-4599,squamous,Site-72,TCGA-22-4599-01Z-00-DX1.2178D5C0-A444-44F5-B878-F5D1FB2943C9 +TCGA-NC-A5HN,squamous,Site-177,TCGA-NC-A5HN-01Z-00-DX1.51BA5504-7EFE-47F9-82A3-DCE0DCCF76D1 +TCGA-85-A4JC,squamous,Site-9,TCGA-85-A4JC-01Z-00-DX1.6B4ED63F-ADF9-49F4-A30B-AB3A5EEFDDB7 +TCGA-56-8201,squamous,Site-67,TCGA-56-8201-01Z-00-DX1.883903fb-d70d-4c72-be76-6788b1bc3b35 +TCGA-21-1070,squamous,Site-40,TCGA-21-1070-01Z-00-DX1.06363f8a-ef29-4d73-95da-a3172d7873c0 +TCGA-85-6798,squamous,Site-9,TCGA-85-6798-01Z-00-DX1.c8ca3a56-b337-4345-b55a-83b0ae9a75f0 +TCGA-22-5480,squamous,Site-72,TCGA-22-5480-01Z-00-DX1.A6E25FF1-9616-4F5F-97BD-BA1912C46008 +TCGA-22-4594,squamous,Site-72,TCGA-22-4594-01Z-00-DX1.3FCEBC89-8473-4841-87A2-F84AF58A7793 +TCGA-63-A5MU,squamous,Site-93,TCGA-63-A5MU-01Z-00-DX1.C80B700E-D924-4A6F-AC8E-D9C91D452A57 +TCGA-68-8250,squamous,Site-178,TCGA-68-8250-01Z-00-DX1.a8669a85-9262-4c37-b61c-ca7019a3677c +TCGA-34-5240,squamous,Site-157,TCGA-34-5240-01Z-00-DX1.03785f14-494a-4898-9ae0-59e64250c33b +TCGA-98-A53B,squamous,Site-175,TCGA-98-A53B-01Z-00-DX1.5429DF3F-9F1C-4DEE-8A7A-CCB3E2BBDC8A +TCGA-94-8490,squamous,Site-179,TCGA-94-8490-01Z-00-DX1.5A44A8C6-38F4-47A4-80C6-7BB805E9116C +TCGA-34-5928,squamous,Site-157,TCGA-34-5928-01Z-00-DX1.d336efc9-3ebb-4457-a81b-1e448781f999 +TCGA-63-A5MP,squamous,Site-93,TCGA-63-A5MP-01Z-00-DX1.BBA62567-2E99-4AE4-83DB-DCF044F21C5E +TCGA-85-6175,squamous,Site-9,TCGA-85-6175-01Z-00-DX1.102f9bad-4084-4e3a-99e9-91e026ad9a62 +TCGA-22-5478,squamous,Site-72,TCGA-22-5478-01Z-00-DX1.6368788A-96D5-499B-A199-89BF1413AE43 +TCGA-33-4532,squamous,Site-69,TCGA-33-4532-01Z-00-DX3.a9eb5d3f-e5e4-45e0-8fca-3e50396445eb +TCGA-85-8479,squamous,Site-9,TCGA-85-8479-01Z-00-DX1.28fef288-9691-47dd-ad51-b2dc0cd84832 +TCGA-33-4566,squamous,Site-69,TCGA-33-4566-01Z-00-DX1.C256546D-D3EC-41DA-9681-06BA4B011DD2 +TCGA-85-A511,squamous,Site-9,TCGA-85-A511-01Z-00-DX1.24345C5F-BDC2-4D9D-BAE5-686B4C8C66DE +TCGA-66-2755,squamous,Site-61,TCGA-66-2755-01Z-00-DX1.ccc904f0-4cd3-44a5-8b68-7272f2a7558f +TCGA-51-6867,squamous,Site-130,TCGA-51-6867-01Z-00-DX1.5f3a0562-efbe-413f-8e13-9826aaefa298 +TCGA-33-4547,squamous,Site-69,TCGA-33-4547-01Z-00-DX3.9f19ec7f-6469-463a-a027-124c3fe5a06c +TCGA-66-2734,squamous,Site-61,TCGA-66-2734-01Z-00-DX1.2c23bc49-e2be-480a-b6a8-d3259a584930 +TCGA-77-7335,squamous,Site-96,TCGA-77-7335-01Z-00-DX1.DDE20A50-7303-4A2D-936A-D02FE15A0752 +TCGA-98-A53H,squamous,Site-175,TCGA-98-A53H-01Z-00-DX1.B05E6339-440C-4F2E-95B5-F2C96413D488 +TCGA-43-A475,squamous,Site-29,TCGA-43-A475-01Z-00-DX1.48694B33-6070-4D6E-9CE4-BDEF072500A1 +TCGA-37-4141,squamous,Site-33,TCGA-37-4141-01Z-00-DX1.65256656-b0e0-4705-ad00-38f6c8922cfc +TCGA-37-A5EL,squamous,Site-33,TCGA-37-A5EL-01Z-00-DX1.DFD49A5B-27F6-407E-8ADA-54EDACAA0788 +TCGA-98-A53J,squamous,Site-175,TCGA-98-A53J-01Z-00-DX1.EEC6256E-D331-4731-B00C-08622C725F61 +TCGA-60-2716,squamous,Site-103,TCGA-60-2716-01Z-00-DX1.28642302-a114-4e80-b9db-c04eee7d92d8 +TCGA-56-8305,squamous,Site-67,TCGA-56-8305-01Z-00-DX1.2C1D792D-79E0-48D7-9938-DFD44428874E +TCGA-37-3792,squamous,Site-33,TCGA-37-3792-01Z-00-DX1.d95a9f5f-5bad-48fc-9f90-ed2580328b80 +TCGA-85-A50Z,squamous,Site-9,TCGA-85-A50Z-01Z-00-DX1.A8911EBB-EAB6-4787-9F6E-3E8A9F1078F3 +TCGA-66-2737,squamous,Site-61,TCGA-66-2737-01Z-00-DX1.3c508c22-fadc-4b05-ada4-b447b21e098c +TCGA-58-A46M,squamous,Site-124,TCGA-58-A46M-01Z-00-DX1.DCC711DA-EB83-498E-9AFA-33E874EA537C +TCGA-98-A53A,squamous,Site-175,TCGA-98-A53A-01Z-00-DX1.CB328332-8AA9-483D-8BBF-5FD752A2A6B6 +TCGA-77-8009,squamous,Site-96,TCGA-77-8009-01Z-00-DX1.54adaaba-ddce-4480-9b22-af8e8a85d1d9 +TCGA-46-6025,squamous,Site-112,TCGA-46-6025-01Z-00-DX1.05600f09-0b48-448c-b305-cb881970fc52 +TCGA-34-2596,squamous,Site-157,TCGA-34-2596-01Z-00-DX1.be778d17-d06c-4300-aaf1-c0acbef8fa41 +TCGA-94-7033,squamous,Site-179,TCGA-94-7033-01Z-00-DX1.43146ed9-30a5-420d-bd92-b1acdce11103 +TCGA-22-5479,squamous,Site-72,TCGA-22-5479-01Z-00-DX1.3DCD4654-459D-4046-87C3-E59C61A1E7FA +TCGA-66-2778,squamous,Site-61,TCGA-66-2778-01Z-00-DX1.5806ab4d-fdde-45b6-a131-fd5f71bcde0b +TCGA-21-1083,squamous,Site-40,TCGA-21-1083-01Z-00-DX1.8288ca92-b611-4acf-986e-0b8a3f620b94 +TCGA-22-5477,squamous,Site-72,TCGA-22-5477-01Z-00-DX1.D4CA03F8-FFA8-4926-B93C-EBFDC6791B1F +TCGA-18-3411,squamous,Site-97,TCGA-18-3411-01Z-00-DX1.D9228674-5B13-44CB-A7AE-C64FD97A3BBB +TCGA-66-2795,squamous,Site-61,TCGA-66-2795-01Z-00-DX1.4141cb69-b675-49c4-85c5-21d630b5143d +TCGA-66-2757,squamous,Site-61,TCGA-66-2757-01Z-00-DX1.fa182184-fbf1-46ab-bdae-6f383441fd85 +TCGA-56-7222,squamous,Site-67,TCGA-56-7222-01Z-00-DX1.f932abe9-4f5c-46d8-908b-29669681d2db +TCGA-39-5034,squamous,Site-78,TCGA-39-5034-01Z-00-DX1.fb4191aa-07dc-4615-ac71-b9e4c67ed28b +TCGA-NC-A5HO,squamous,Site-177,TCGA-NC-A5HO-01Z-00-DX1.0DA8CF7E-0B14-44BA-BFA1-DC2E35B8A20C +TCGA-39-5019,squamous,Site-78,TCGA-39-5019-01Z-00-DX1.FE0485DE-AEB3-443F-AFCA-C186C336F7EC +TCGA-98-A53I,squamous,Site-175,TCGA-98-A53I-01Z-00-DX1.B245E2C2-8D75-45EE-8930-44CFFF8B2106 +TCGA-85-8355,squamous,Site-9,TCGA-85-8355-01Z-00-DX1.146ECC8D-858B-4DDA-B7FE-DBC22CC8C24D +TCGA-21-5786,squamous,Site-40,TCGA-21-5786-01Z-00-DX1.928370C0-0BD4-43D6-94EB-FE7CB172265E +TCGA-66-2777,squamous,Site-61,TCGA-66-2777-01Z-00-DX1.78b3c4b0-e7e5-427b-b4d7-3ca9c5b7bbd6 +TCGA-43-7656,squamous,Site-29,TCGA-43-7656-01Z-00-DX1.6076ce10-fcb2-4f29-b3f3-c3f5649dc469 +TCGA-85-7696,squamous,Site-9,TCGA-85-7696-01Z-00-DX1.d8756b4c-819f-4a5c-b148-125b8c6b3c27 +TCGA-22-1016,squamous,Site-72,TCGA-22-1016-01Z-00-DX1.807510FD-222D-4E65-B261-3EB0815F8A50 +TCGA-90-7964,squamous,Site-1,TCGA-90-7964-01Z-00-DX1.25D2CC5D-2383-41FE-B5D5-0ACFC0F8209B +TCGA-56-7580,squamous,Site-67,TCGA-56-7580-01Z-00-DX1.dc642f9d-20ed-411b-8697-4e33bf7db57e +TCGA-63-A5MY,squamous,Site-93,TCGA-63-A5MY-01Z-00-DX1.D3753889-13DC-40D4-8845-22A168F35256 +TCGA-77-A5GF,squamous,Site-96,TCGA-77-A5GF-01Z-00-DX1.8E70F660-4492-4BEB-A688-217DE27C1E6D +TCGA-77-A5GA,squamous,Site-96,TCGA-77-A5GA-01Z-00-DX1.8A1605A0-A0FF-479C-BFAC-AD6BD7081AC9 +TCGA-85-8276,squamous,Site-9,TCGA-85-8276-01Z-00-DX1.6306624d-e197-4744-a872-0a20413bd9fa +TCGA-L3-A4E7,squamous,Site-49,TCGA-L3-A4E7-01Z-00-DX1.69F3B1AF-F6DA-41FE-A72E-609F9D42E766 +TCGA-56-A4ZJ,squamous,Site-67,TCGA-56-A4ZJ-01Z-00-DX1.6BB25D9D-75E7-4B44-8560-8A3C3EC9F118 +TCGA-46-3766,squamous,Site-112,TCGA-46-3766-01Z-00-DX1.5870b2ed-78f0-4f68-b01e-2ef6d5353b57 +TCGA-66-2793,squamous,Site-61,TCGA-66-2793-01Z-00-DX1.133bc2be-1407-422c-a356-2477f1020c50 +TCGA-39-5029,squamous,Site-78,TCGA-39-5029-01Z-00-DX1.74c5cb38-650e-4fe7-bc63-f6e88761346d +TCGA-22-5482,squamous,Site-72,TCGA-22-5482-01Z-00-DX1.901F90D4-1F4E-4344-925B-37B7DD78B751 +TCGA-37-4132,squamous,Site-33,TCGA-37-4132-01Z-00-DX1.5e5a564d-6561-48e7-a3a2-81a523392d8b +TCGA-77-8128,squamous,Site-96,TCGA-77-8128-01Z-00-DX1.5831331a-8c82-4817-977e-1842250d9c7b +TCGA-56-7730,squamous,Site-67,TCGA-56-7730-01Z-00-DX1.6f127663-c517-4b5d-99a6-a3cd314304b1 +TCGA-77-A5GB,squamous,Site-96,TCGA-77-A5GB-01Z-00-DX1.5EB72C3D-40BE-4ABD-9CD6-ADF5769BC295 +TCGA-85-A50M,squamous,Site-9,TCGA-85-A50M-01Z-00-DX1.CFB67271-FC1B-4BA0-95DB-51D859D68D18 +TCGA-56-8629,squamous,Site-67,TCGA-56-8629-01Z-00-DX1.6F79BFD6-36B0-46F4-918B-2F4FF083E3E3 +TCGA-66-2756,squamous,Site-61,TCGA-66-2756-01Z-00-DX1.e6bda67d-2647-4751-9c16-b711933a1d6c +TCGA-22-1002,squamous,Site-72,TCGA-22-1002-01Z-00-DX1.A41CF987-B2E7-4ADE-8580-735631EEE963 +TCGA-92-7341,squamous,Site-183,TCGA-92-7341-01Z-00-DX1.6ecbb768-fde4-4f41-8e28-bc5be860825f +TCGA-66-2786,squamous,Site-61,TCGA-66-2786-01Z-00-DX1.5fd5c25f-cb17-4650-b59e-122c6ddafe86 +TCGA-85-7843,squamous,Site-9,TCGA-85-7843-01Z-00-DX1.0aa90a1d-6039-482c-a489-8ee41cc70161 +TCGA-18-3414,squamous,Site-97,TCGA-18-3414-01Z-00-DX1.B76AF657-7A93-4F33-BC1A-2E8BF652FC17 +TCGA-22-1017,squamous,Site-72,TCGA-22-1017-01Z-00-DX1.9562FE79-A261-42D3-B394-F3E0E2FF7DDA +TCGA-37-A5EN,squamous,Site-33,TCGA-37-A5EN-01Z-00-DX1.C389C26A-2B88-4DCA-9B32-FEECDFFC35A6 +TCGA-34-8455,squamous,Site-157,TCGA-34-8455-01Z-00-DX1.9B7A1CBA-0B53-4F2A-9AEC-D72FDB6553F7 +TCGA-52-7810,squamous,Site-149,TCGA-52-7810-01Z-00-DX1.54fcdfff-c1ac-4427-bb33-54959695134f +TCGA-58-A46J,squamous,Site-124,TCGA-58-A46J-01Z-00-DX1.4BF4AAC3-2020-41EA-A181-5E90A23DA024 +TCGA-XC-AA0X,squamous,Site-5,TCGA-XC-AA0X-01Z-00-DX1.61A34BE0-F16B-4EC1-8E7F-7BF94F6629F4 +TCGA-22-5492,squamous,Site-72,TCGA-22-5492-01Z-00-DX1.A17DE349-CD76-4B12-9C5C-DB8DFB23C003 +TCGA-21-1072,squamous,Site-40,TCGA-21-1072-01Z-00-DX1.9e6c90ad-0529-4b8b-a4f8-1b05430600f9 +TCGA-56-6546,squamous,Site-67,TCGA-56-6546-01Z-00-DX1.4fecf41b-6757-4c0d-98c4-24b9a65144f1 +TCGA-85-A513,squamous,Site-9,TCGA-85-A513-01Z-00-DX1.9C4C6BA2-BBEB-4EC6-9A70-65B1F6198E1F +TCGA-60-2698,squamous,Site-103,TCGA-60-2698-01Z-00-DX1.738ae69e-5170-46d6-b691-9f761f9a794b +TCGA-60-2712,squamous,Site-103,TCGA-60-2712-01Z-00-DX1.97003dfc-4b37-4491-86e8-0801e6825aae +TCGA-77-7337,squamous,Site-96,TCGA-77-7337-01Z-00-DX1.e15727e3-e9c8-44a0-87bb-caeaeb00840c +TCGA-77-8008,squamous,Site-96,TCGA-77-8008-01Z-00-DX1.29c3c1c6-7602-40c6-bc86-b40132df9309 +TCGA-L3-A524,squamous,Site-49,TCGA-L3-A524-01Z-00-DX1.F3C6DE3B-C59E-4805-A03F-4A0CA3AB2A5E +TCGA-LA-A7SW,squamous,Site-23,TCGA-LA-A7SW-01Z-00-DX1.A65863F5-EA8F-46F5-A4D1-52FAD3D93E26 +TCGA-60-2696,squamous,Site-103,TCGA-60-2696-01Z-00-DX1.17748315-09b2-4abd-97f1-93c9951b0a70 +TCGA-77-8138,squamous,Site-96,TCGA-77-8138-01Z-00-DX1.fac6dcf9-7367-4345-a764-38e5471763c0 +TCGA-85-7697,squamous,Site-9,TCGA-85-7697-01Z-00-DX1.b33aa550-81b0-4fab-8243-d041b0beaab5 +TCGA-85-8582,squamous,Site-9,TCGA-85-8582-01Z-00-DX1.ac10318b-f56e-4897-8355-076366fe8581 +TCGA-66-2788,squamous,Site-61,TCGA-66-2788-01Z-00-DX1.1a57324a-5b00-44ed-912d-9522d0141500 +TCGA-21-1071,squamous,Site-40,TCGA-21-1071-01Z-00-DX1.a9bba825-1c92-4101-9086-c4d1c91117af +TCGA-85-8664,squamous,Site-9,TCGA-85-8664-01Z-00-DX1.d83b183e-4cc6-4608-93fb-46e69d36a2fa +TCGA-85-8354,squamous,Site-9,TCGA-85-8354-01Z-00-DX1.C9188133-08F4-43DD-BB60-6F7293CDB5B9 +TCGA-63-A5ML,squamous,Site-93,TCGA-63-A5ML-01Z-00-DX1.14703756-F6F7-4EEF-A462-7D40910AE04D +TCGA-52-7811,squamous,Site-149,TCGA-52-7811-01Z-00-DX1.7093a626-0e8f-4e4c-80c6-6cf32a2f725e +TCGA-77-7139,squamous,Site-96,TCGA-77-7139-01Z-00-DX1.b8b79b0c-4ea0-4f6c-8e6e-29ad76eca78f +TCGA-77-8156,squamous,Site-96,TCGA-77-8156-01Z-00-DX1.9ed8c07d-61f3-4f6b-9c8b-312917c64e04 +TCGA-33-4582,squamous,Site-69,TCGA-33-4582-01Z-00-DX1.629AEDB6-E9AA-4615-92E8-5DDAAFF6103E +TCGA-56-A5DS,squamous,Site-67,TCGA-56-A5DS-01Z-00-DX1.952C8777-4B8F-415A-9D27-C155D1623A86 +TCGA-63-A5MJ,squamous,Site-93,TCGA-63-A5MJ-01Z-00-DX1.4F060025-3A21-4FEB-8A63-F55A95BFBA30 +TCGA-85-8353,squamous,Site-9,TCGA-85-8353-01Z-00-DX1.2A333CFA-3D8A-41B2-9D08-8AAF431BFE54 +TCGA-77-6842,squamous,Site-96,TCGA-77-6842-01Z-00-DX1.934f1e6a-9c30-40ad-a764-7d0b6fdb92e9 +TCGA-70-6723,squamous,Site-59,TCGA-70-6723-01Z-00-DX1.1e6468e8-0d39-48a9-af32-9968f9207810 +TCGA-94-A4VJ,squamous,Site-179,TCGA-94-A4VJ-01Z-00-DX1.FBC6E75E-80D8-42EA-9292-181EC4D5D170 +TCGA-85-A512,squamous,Site-9,TCGA-85-A512-01Z-00-DX1.154BBD51-803C-46F0-840E-D2597E648365 +TCGA-77-8148,squamous,Site-96,TCGA-77-8148-01Z-00-DX1.2066e518-636c-4917-9dd4-78ab3605e8f6 +TCGA-63-A5MG,squamous,Site-93,TCGA-63-A5MG-01Z-00-DX1.EC6B6782-C52F-42B5-AD91-AB718B7EE1A6 +TCGA-85-8052,squamous,Site-9,TCGA-85-8052-01Z-00-DX1.26b66ae7-b73f-4263-85f7-e82dab5b657b +TCGA-33-AAS8,squamous,Site-69,TCGA-33-AAS8-01Z-00-DX1.774A64E2-17E4-468C-BFE9-85F0D4F52D0E +TCGA-43-7657,squamous,Site-29,TCGA-43-7657-01Z-00-DX1.d8a5d257-c5ca-4192-b6a7-4698b8390fca +TCGA-18-5595,squamous,Site-97,TCGA-18-5595-01Z-00-DX1.75235488-2F6F-42CE-8D7D-0E9A360E1C61 +TCGA-63-7020,squamous,Site-93,TCGA-63-7020-01Z-00-DX1.33435AC2-9EBC-422C-89DB-962153E8F3F6 +TCGA-79-5596,squamous,Site-93,TCGA-79-5596-01Z-00-DX1.FE977268-C9D7-4812-887C-C9FF9943C127 +TCGA-NC-A5HP,squamous,Site-177,TCGA-NC-A5HP-01Z-00-DX1.655093A9-AAA7-4637-A33E-90FE3AE2FC43 +TCGA-77-A5G3,squamous,Site-96,TCGA-77-A5G3-01Z-00-DX1.7ED6F76A-333F-43DB-B8DF-86BF259DCEFB +TCGA-18-4721,squamous,Site-97,TCGA-18-4721-01Z-00-DX1.C4952E79-071A-49C5-8E28-474442832D4E +TCGA-NC-A5HM,squamous,Site-177,TCGA-NC-A5HM-01Z-00-DX1.CBAF06E0-5185-4A9C-B005-F7D2C0945032 +TCGA-NC-A5HD,squamous,Site-177,TCGA-NC-A5HD-01Z-00-DX1.881DF35E-F816-4705-9589-6FB72784A905 +TCGA-43-A474,squamous,Site-29,TCGA-43-A474-01Z-00-DX1.9BCC325D-875A-4285-85E9-2EA550A10949 +TCGA-51-4079,squamous,Site-130,TCGA-51-4079-01Z-00-DX1.111f7796-3797-4334-b543-918e374dbc22 +TCGA-58-A46K,squamous,Site-124,TCGA-58-A46K-01Z-00-DX1.A293339A-A592-4CA2-AE49-CDDD0810C624 +TCGA-98-7454,squamous,Site-175,TCGA-98-7454-01Z-00-DX1.2d81415b-5edd-4dd8-b17b-849d732da4cd +TCGA-21-1075,squamous,Site-40,TCGA-21-1075-01Z-00-DX1.937872ae-4d6f-4d7a-b54f-b7e797cb84b0 +TCGA-77-7463,squamous,Site-96,TCGA-77-7463-01Z-00-DX1.c255e325-1ba7-4108-b8b8-22c02241c83f +TCGA-85-8580,squamous,Site-9,TCGA-85-8580-01Z-00-DX1.1b942bbe-2c6e-4411-8609-1725e5cb93da +TCGA-60-2725,squamous,Site-103,TCGA-60-2725-01Z-00-DX1.b9d038d6-c2fb-4e46-a4ef-80fdbfb56d0a +TCGA-77-A5G8,squamous,Site-96,TCGA-77-A5G8-01Z-00-DX1.4F66CA44-C238-4293-882C-5357ABD7115F +TCGA-43-2578,squamous,Site-29,TCGA-43-2578-01Z-00-DX1.3314c57b-2875-4cb3-8fff-cb9cf65aa697 +TCGA-70-6722,squamous,Site-59,TCGA-70-6722-01Z-00-DX1.2f8498bb-bbd9-4eae-bb9e-6fe939ca0551 +TCGA-85-8071,squamous,Site-9,TCGA-85-8071-01Z-00-DX1.876f6b0b-f615-43b2-8923-ee21f94568b7 +TCGA-66-2768,squamous,Site-61,TCGA-66-2768-01Z-00-DX1.02fadcc4-9d05-4b37-9114-d8e80c09ef1a +TCGA-68-8251,squamous,Site-178,TCGA-68-8251-01Z-00-DX1.9d453881-d5b3-4527-93fa-799f34ba2c8f +TCGA-77-7141,squamous,Site-96,TCGA-77-7141-01Z-00-DX1.3263490a-cfba-43a8-8198-5139dbf00bf9 +TCGA-77-A5G6,squamous,Site-96,TCGA-77-A5G6-01Z-00-DX1.5AE97930-9554-40BD-A24A-17CA4273F2ED +TCGA-63-5128,squamous,Site-93,TCGA-63-5128-01Z-00-DX1.7D6C6ABF-D035-4F48-811B-629A0B4AD597 +TCGA-98-A538,squamous,Site-175,TCGA-98-A538-01Z-00-DX1.42BC06E8-8E8F-4CF0-A757-B5201BB2794F diff --git a/datasets/lung_adeno_squam/lung_labels_mini.csv b/datasets/lung_adeno_squam/lung_labels_mini.csv new file mode 100644 index 000000000..3db76a3a6 --- /dev/null +++ b/datasets/lung_adeno_squam/lung_labels_mini.csv @@ -0,0 +1,21 @@ +patient,subtype,site,slide +TCGA-83-5908,adenocarcinoma,Site-28,TCGA-83-5908-01Z-00-DX1.381c8f82-61a0-4e9d-982d-1ad0af7bead9 +TCGA-62-A46V,adenocarcinoma,Site-124,TCGA-62-A46V-01Z-00-DX1.631E54D0-9E57-4932-B4EF-81820E56A95B +TCGA-44-2655,adenocarcinoma,Site-29,TCGA-44-2655-01Z-00-DX1.ee255271-780c-461c-ab23-5cd3504b5e4a +TCGA-05-4418,adenocarcinoma,Site-61,TCGA-05-4418-01Z-00-DX1.f3863ea5-564f-482f-9878-cc104cf69401 +TCGA-49-4487,adenocarcinoma,Site-69,TCGA-49-4487-01Z-00-DX1.3a3a0720-463c-430e-849b-e2f8991bdfa5 +TCGA-38-4631,adenocarcinoma,Site-130,TCGA-38-4631-01Z-00-DX1.5e0c873a-9c4c-4e0b-bf2e-e3cd8b760761 +TCGA-55-1594,adenocarcinoma,Site-67,TCGA-55-1594-01Z-00-DX1.bd90c500-7c0b-4c45-a3f7-2d9177384b1d +TCGA-75-6207,adenocarcinoma,Site-93,TCGA-75-6207-01Z-00-DX1.837B7B0F-424C-423B-9045-A905E7C1C54C +TCGA-MP-A4TD,adenocarcinoma,Site-180,TCGA-MP-A4TD-01Z-00-DX1.937DEBC9-F5D5-4682-AA9A-13D8226EE06C +TCGA-78-7537,adenocarcinoma,Site-96,TCGA-78-7537-01Z-00-DX1.e5597e41-ebba-4d6f-8a1f-15cd81d8f026 +TCGA-21-A5DI,squamous,Site-40,TCGA-21-A5DI-01Z-00-DX1.E9123261-ADE7-468C-9E9A-334E131FFF97 +TCGA-43-5670,squamous,Site-29,TCGA-43-5670-01Z-00-DX1.1b5d262e-1f39-4f6f-883c-52101b57791f +TCGA-18-3415,squamous,Site-97,TCGA-18-3415-01Z-00-DX1.8C62F2CD-4A2F-4D1E-A662-D7D5AFE557AB +TCGA-43-2576,squamous,Site-29,TCGA-43-2576-01Z-00-DX1.779df209-95e1-4303-9c32-4083e8088d8e +TCGA-33-4533,squamous,Site-69,TCGA-33-4533-01Z-00-DX1.ee36717d-0571-40b3-8ab5-5465d2cca920 +TCGA-NC-A5HT,squamous,Site-177,TCGA-NC-A5HT-01Z-00-DX1.9295B0E3-37FE-4914-AFB3-78B56C893B6D +TCGA-56-8628,squamous,Site-67,TCGA-56-8628-01Z-00-DX1.AAC57164-E0F9-4DF0-87EA-5C50FB201895 +TCGA-63-A5MM,squamous,Site-93,TCGA-63-A5MM-01Z-00-DX1.F385687A-3741-4E73-87F1-D9B00B1B6186 +TCGA-21-1081,squamous,Site-40,TCGA-21-1081-01Z-00-DX1.fce8927a-2c5f-4a64-8414-da66424b3859 +TCGA-60-2707,squamous,Site-103,TCGA-60-2707-01Z-00-DX1.4aafd76b-eb0e-4ab9-a740-682c169a3c3d \ No newline at end of file diff --git a/datasets/thyroid_brs/thyroid_brs.json b/datasets/thyroid_brs/thyroid_brs.json new file mode 100644 index 000000000..fb32f520c --- /dev/null +++ b/datasets/thyroid_brs/thyroid_brs.json @@ -0,0 +1,4 @@ +{ + "name": "TCGA_THCA", + "annotations": "./thyroid_labels.csv" +} \ No newline at end of file diff --git a/datasets/thyroid_brs/thyroid_labels.csv b/datasets/thyroid_brs/thyroid_labels.csv new file mode 100644 index 000000000..3e8d3cc01 --- /dev/null +++ b/datasets/thyroid_brs/thyroid_labels.csv @@ -0,0 +1,370 @@ +patient,site,brs_class,brs,slide +TCGA-ET-A25G,ET,Braf-like,-0.893408,TCGA-ET-A25G-01Z-00-DX1.379CE9E3-B8D5-4685-80F0-CF09389187B5 +TCGA-DJ-A2PU,DJ,Braf-like,-0.919824,TCGA-DJ-A2PU-01Z-00-DX1.CC7B3820-A0B4-439C-9655-D02C8A3ADCCF +TCGA-DJ-A3UX,DJ,Braf-like,-0.366275,TCGA-DJ-A3UX-01Z-00-DX1.2A0F7F44-1001-42BB-AC2D-6A29C18C9579 +TCGA-EL-A3CU,EL,Braf-like,-0.981478,TCGA-EL-A3CU-01Z-00-DX1.84ACFD07-06AA-4A3E-AF78-DBB5E6A7D8D3 +TCGA-EL-A3MW,EL,Braf-like,-0.7927,TCGA-EL-A3MW-01Z-00-DX1.9DB87308-75E4-433F-B356-04AA6B5B15D3 +TCGA-EM-A3O7,EM,Braf-like,-0.727123,TCGA-EM-A3O7-01Z-00-DX1.D330057C-AC41-4701-88DC-6D5835B7F8EB +TCGA-ET-A3BS,ET,Braf-like,-0.789875,TCGA-ET-A3BS-01Z-00-DX1.2D1A6BE1-99C5-40C3-BD7C-8C064566368E +TCGA-EM-A2OX,EM,Braf-like,-0.853937,TCGA-EM-A2OX-01Z-00-DX1.A290BD1E-9956-46A2-B5AE-06DA950391BE +TCGA-BJ-A18Z,BJ,Braf-like,-0.964513,TCGA-BJ-A18Z-01Z-00-DX1.F7A94C3D-C0E3-4F82-BC69-5B89497B3E40 +TCGA-DJ-A13U,DJ,Braf-like,-0.474075,TCGA-DJ-A13U-01Z-00-DX1.A5687750-5446-47B4-BC08-E15DB34286A6 +TCGA-MK-A4N6,MK,Braf-like,-0.968534,TCGA-MK-A4N6-01Z-00-DX1.CC83778C-73A1-4B17-8FB3-06446682F20A +TCGA-ET-A3BQ,ET,Braf-like,-0.900744,TCGA-ET-A3BQ-01Z-00-DX1.77513C07-5135-4C46-AE68-6F844096A947 +TCGA-EL-A4KG,EL,Braf-like,-0.986305,TCGA-EL-A4KG-01Z-00-DX1.8B2BA254-DCF3-45ED-BACB-28822906B7A5 +TCGA-BJ-A45J,BJ,Braf-like,-0.901585,TCGA-BJ-A45J-01Z-00-DX1.F3646444-749B-4583-A45D-17C580FCB866 +TCGA-EM-A2P1,EM,Braf-like,-0.886163,TCGA-EM-A2P1-01Z-00-DX1.FDE3FF4D-8EA8-411A-AF7D-165CFCFBFAD4 +TCGA-E8-A44K,E8,Braf-like,-0.851178,TCGA-E8-A44K-01Z-00-DX1.990DA204-B846-4FF8-B0A7-C47C2AF5FE0E +TCGA-ET-A39P,ET,Braf-like,-0.84908,TCGA-ET-A39P-01Z-00-DX1.DB1669F1-194B-45E5-AEE1-256ABCFE1544 +TCGA-ET-A3BX,ET,Braf-like,-0.874599,TCGA-ET-A3BX-01Z-00-DX1.237EA6E4-435A-4918-9BC0-1471352FB1A9 +TCGA-EL-A3GP,EL,Braf-like,-0.273283,TCGA-EL-A3GP-01Z-00-DX1.BD923F73-E189-4F82-B233-345A3C1B42F3 +TCGA-BJ-A4O9,BJ,Braf-like,-0.541191,TCGA-BJ-A4O9-01Z-00-DX1.E815BF13-A03C-4687-BE1A-C874567A1450 +TCGA-BJ-A0ZB,BJ,Braf-like,-0.872257,TCGA-BJ-A0ZB-01Z-00-DX1.B4D4627A-2727-47E9-B1A6-1FFCAFAAE303 +TCGA-FE-A22Z,FE,Braf-like,-0.948756,TCGA-FE-A22Z-01Z-00-DX1.0ABEAB66-697F-4F56-9C5A-7505A66B71B7 +TCGA-DJ-A2Q5,DJ,Braf-like,-0.892712,TCGA-DJ-A2Q5-01Z-00-DX1.EB4657EF-E3DF-42FC-AEBF-3358E4B142C2 +TCGA-ET-A3BO,ET,Braf-like,-0.906416,TCGA-ET-A3BO-01Z-00-DX1.4A864B23-7323-4009-BF35-34E3BA399138 +TCGA-EL-A3ZT,EL,Braf-like,-0.809832,TCGA-EL-A3ZT-01Z-00-DX1.289FA1E1-B2FA-49A6-90DE-83EE38C84DEA +TCGA-EM-A3FO,EM,Braf-like,-0.716374,TCGA-EM-A3FO-01Z-00-DX1.84929069-A2FF-4588-98AF-77DE5D66F1C9 +TCGA-BJ-A2NA,BJ,Braf-like,-0.973946,TCGA-BJ-A2NA-01Z-00-DX1.E1EEB1B7-D419-447F-864D-4F2B1111585F +TCGA-DJ-A2PN,DJ,Braf-like,-0.802997,TCGA-DJ-A2PN-01Z-00-DX1.FB5701AA-F9C8-4CF9-9ABE-943E8CE614F2 +TCGA-EL-A3GX,EL,Braf-like,-0.991931,TCGA-EL-A3GX-01Z-00-DX1.AB2B045E-E8EC-46AF-B606-5C47A37B1938 +TCGA-ET-A2MY,ET,Braf-like,-0.695716,TCGA-ET-A2MY-01Z-00-DX1.85287036-F72C-4C39-AE7C-1163B8921DBB +TCGA-EL-A3N2,EL,Braf-like,-0.89154,TCGA-EL-A3N2-01Z-00-DX1.BB63A917-16BC-40C5-B440-1EB272648188 +TCGA-EM-A22P,EM,Braf-like,-0.793809,TCGA-EM-A22P-01Z-00-DX1.9F479C1F-2F35-42F4-95D8-BA1E0D37B9BF +TCGA-ET-A25K,ET,Braf-like,-0.835366,TCGA-ET-A25K-01Z-00-DX1.EB013E6B-E39A-4BD8-9866-43FEC9E45D82 +TCGA-E8-A419,E8,Braf-like,-0.973538,TCGA-E8-A419-01Z-00-DX1.0C4735A8-1231-4CE4-A2C2-844C033C1B48 +TCGA-BJ-A290,BJ,Braf-like,-0.719254,TCGA-BJ-A290-01Z-00-DX1.46379CAE-4849-4AAC-A439-1B2A7D541241 +TCGA-BJ-A0ZH,BJ,Braf-like,-0.757276,TCGA-BJ-A0ZH-01Z-00-DX1.F7128B8F-FC4F-405F-BF7B-E4F9D7225760 +TCGA-EM-A22O,EM,Braf-like,-0.664753,TCGA-EM-A22O-01Z-00-DX1.54357B5E-FDE4-4C7A-B2D9-3E26B4D9DDD2 +TCGA-ET-A3DO,ET,Braf-like,-0.916653,TCGA-ET-A3DO-01Z-00-DX1.1AD06D3D-E50E-4724-8554-F05F329EBFFC +TCGA-KS-A4I5,KS,Braf-like,-0.945451,TCGA-KS-A4I5-01Z-00-DX1.2D6DF7C9-9798-4DB4-B73E-511F9E2981C9 +TCGA-DJ-A2PW,DJ,Braf-like,-0.861761,TCGA-DJ-A2PW-01Z-00-DX1.57FC77A8-DDCC-40DC-9B08-79DE5EFD02F0 +TCGA-EL-A3CR,EL,Braf-like,-0.911079,TCGA-EL-A3CR-01Z-00-DX1.2CF29E90-83B4-442E-A3D7-CB3DF0A66CB6 +TCGA-EM-A3AK,EM,Braf-like,-0.759437,TCGA-EM-A3AK-01Z-00-DX1.7E48B297-0CAB-4A56-8C60-B2E55CE50295 +TCGA-EL-A3GU,EL,Braf-like,-0.948626,TCGA-EL-A3GU-01Z-00-DX1.D4950D69-FE48-4273-BD87-55258D0249BA +TCGA-DJ-A4UW,DJ,Braf-like,-0.636283,TCGA-DJ-A4UW-01Z-00-DX1.14066FEA-5232-4891-81BB-CB3A7CB6D65F +TCGA-BJ-A3EZ,BJ,Braf-like,-0.884384,TCGA-BJ-A3EZ-01Z-00-DX1.2BB3F70F-4395-43CB-B0C9-84AD68369DFB +TCGA-DJ-A3VE,DJ,Braf-like,-0.966657,TCGA-DJ-A3VE-01Z-00-DX1.5994BC7B-5E99-497F-A2CE-F19532041AD3 +TCGA-ET-A2MZ,ET,Braf-like,-0.813789,TCGA-ET-A2MZ-01Z-00-DX1.EB533122-18F3-40DD-AF7B-B49629F089BC +TCGA-EL-A3D6,EL,Braf-like,-0.959818,TCGA-EL-A3D6-01Z-00-DX1.7495CC76-097C-4C31-989E-C557EEBC2C6B +TCGA-ET-A3BT,ET,Braf-like,-0.983568,TCGA-ET-A3BT-01Z-00-DX1.3BF0EE2E-27FA-4E9D-819E-E6F22B29D877 +TCGA-DJ-A4V2,DJ,Braf-like,-0.957361,TCGA-DJ-A4V2-01Z-00-DX1.5C8BADF7-031B-472D-8B8C-83C833D3EC4D +TCGA-ET-A25J,ET,Braf-like,-0.671524,TCGA-ET-A25J-01Z-00-DX1.93E413F5-DDAE-4C04-B3C0-699A189893CC +TCGA-DJ-A2PV,DJ,Braf-like,-0.781557,TCGA-DJ-A2PV-01Z-00-DX1.78F6C030-472B-4053-BF19-8C7A98BC671A +TCGA-EL-A3CT,EL,Braf-like,-0.841709,TCGA-EL-A3CT-01Z-00-DX1.B72AA981-135A-48D0-8E43-01720BC2E1DB +TCGA-E8-A436,E8,Braf-like,-0.900369,TCGA-E8-A436-01Z-00-DX1.46EA840B-403D-466A-8596-52B29F6AE5FA +TCGA-E8-A2EA,E8,Braf-like,-0.786704,TCGA-E8-A2EA-01Z-00-DX1.12EF1A77-B87D-4B85-9356-3836F0459EDE +TCGA-FK-A3SB,FK,Braf-like,-0.852314,TCGA-FK-A3SB-01Z-00-DX1.F8150243-853F-4B0C-A1B7-A4F8A8AA8E77 +TCGA-EL-A3D0,EL,Braf-like,-0.717953,TCGA-EL-A3D0-01Z-00-DX1.BCD4DC0B-A1B4-48EA-BEB3-EB2D5120F81B +TCGA-DE-A0XZ,DE,Braf-like,-0.78269,TCGA-DE-A0XZ-01Z-00-DX1.BC86FEE6-AC9D-4911-ACDB-16CD95812161 +TCGA-EL-A3MZ,EL,Braf-like,-0.936245,TCGA-EL-A3MZ-01Z-00-DX1.17AF6075-AB95-4A2F-936C-44624151384F +TCGA-MK-A4N9,MK,Braf-like,-0.913604,TCGA-MK-A4N9-01Z-00-DX1.3F93373F-10BD-4CE6-9864-D8D23DCCD422 +TCGA-DJ-A1QO,DJ,Braf-like,-0.826551,TCGA-DJ-A1QO-01Z-00-DX1.97439F9B-2A25-4CCF-B478-B27B389F6704 +TCGA-FE-A234,FE,Braf-like,-0.974811,TCGA-FE-A234-01Z-00-DX1.BF4731E1-E4CC-4E85-8EB7-A2CBE3DD9066 +TCGA-EM-A1CT,EM,Braf-like,-0.786888,TCGA-EM-A1CT-01Z-00-DX1.1196BDE9-C901-4EDF-845E-B64AB399D52E +TCGA-KS-A4IB,KS,Braf-like,-0.512277,TCGA-KS-A4IB-01Z-00-DX1.26B2EF79-A42A-493A-B7B9-F1A6450041EE +TCGA-E8-A433,E8,Braf-like,-0.828123,TCGA-E8-A433-01Z-00-DX1.151B861C-33DF-45C4-8F14-E1468A40DDB6 +TCGA-BJ-A3PU,BJ,Braf-like,-0.942972,TCGA-BJ-A3PU-01Z-00-DX1.CD5EC67F-EA55-47F7-B6E5-C0A525C883F1 +TCGA-DJ-A1QD,DJ,Braf-like,-0.8797,TCGA-DJ-A1QD-01Z-00-DX1.69F164F2-DC35-473E-9DF6-142F83CD424F +TCGA-DJ-A3UW,DJ,Braf-like,-0.904424,TCGA-DJ-A3UW-01Z-00-DX1.55E1DC4B-A4B8-4D2A-A1DD-8EAC9D11C59F +TCGA-IM-A3EB,IM,Braf-like,-0.888465,TCGA-IM-A3EB-01Z-00-DX1.2E9C1C0E-99BC-4BA3-BF97-2AD737D37E4B +TCGA-E3-A3E5,E3,Braf-like,-0.627178,TCGA-E3-A3E5-01Z-00-DX1.E7E8AB8B-695F-4158-A3C0-E2B801E07D2A +TCGA-BJ-A28X,BJ,Braf-like,-0.858294,TCGA-BJ-A28X-01Z-00-DX1.20CE2F8C-3775-4176-B14B-E1D9E7835A53 +TCGA-EL-A4K0,EL,Braf-like,-0.830363,TCGA-EL-A4K0-01Z-00-DX1.BE70226D-E2AB-4EA9-9EC3-AEB5C7A42DCC +TCGA-DJ-A2PY,DJ,Braf-like,-0.966951,TCGA-DJ-A2PY-01Z-00-DX1.205D876C-9E47-4421-9461-592D452AB13E +TCGA-ET-A39M,ET,Braf-like,-0.642037,TCGA-ET-A39M-01Z-00-DX1.C6BB562B-64D4-4977-8631-04C548CC4067 +TCGA-DJ-A3UR,DJ,Braf-like,-0.72756,TCGA-DJ-A3UR-01Z-00-DX1.5A6B0168-DEB6-45FF-A4CC-E36172B741C7 +TCGA-EM-A3FM,EM,Braf-like,-0.891201,TCGA-EM-A3FM-01Z-00-DX1.3EF4F9FB-1FD7-4434-A3DC-BDBE4551399E +TCGA-EM-A4FV,EM,Braf-like,-0.440775,TCGA-EM-A4FV-01Z-00-DX1.AC4D700A-246E-4569-A443-4DDFEB4B4FCC +TCGA-EL-A3H8,EL,Braf-like,-0.988366,TCGA-EL-A3H8-01Z-00-DX1.CD331473-DF2A-4C61-9647-6097DA3E0C89 +TCGA-DJ-A3VF,DJ,Braf-like,-0.956348,TCGA-DJ-A3VF-01Z-00-DX1.96025060-08FB-457A-838B-008C081AA90C +TCGA-FY-A3BL,FY,Braf-like,-0.797235,TCGA-FY-A3BL-01Z-00-DX1.66EA8860-B2BC-47E0-87B7-3D3ACC9D5835 +TCGA-IM-A3ED,IM,Braf-like,-0.938517,TCGA-IM-A3ED-01Z-00-DX1.B02872A6-BA1B-4647-A5D3-474316E8AD5B +TCGA-DJ-A1QQ,DJ,Braf-like,-0.846099,TCGA-DJ-A1QQ-01Z-00-DX1.767572EE-6D88-4D45-B421-B02B5EDCBCCA +TCGA-L6-A4EU,L6,Braf-like,-0.873687,TCGA-L6-A4EU-01Z-00-DX1.4678DBF4-2471-4F17-8088-966F79772837 +TCGA-EM-A4FO,EM,Braf-like,-0.493643,TCGA-EM-A4FO-01Z-00-DX1.2B587AF2-5DE5-48A2-8E27-ABE9E1091F35 +TCGA-EL-A3CN,EL,Braf-like,-0.937594,TCGA-EL-A3CN-01Z-00-DX1.D02466E1-F323-446A-846F-67DE1253870C +TCGA-FY-A3NN,FY,Braf-like,-0.657767,TCGA-FY-A3NN-01Z-00-DX1.05A99BEA-DCB0-4358-8AF7-CC49ED8F71E1 +TCGA-EL-A3TA,EL,Braf-like,-0.552805,TCGA-EL-A3TA-01Z-00-DX1.272C4B72-AB1F-4ADB-83FF-98D301B22977 +TCGA-FE-A230,FE,Braf-like,-0.87619,TCGA-FE-A230-01Z-00-DX1.0800310D-CBF4-44B3-9F3C-AAA1B88B0574 +TCGA-IM-A3U3,IM,Braf-like,-0.922026,TCGA-IM-A3U3-01Z-00-DX1.D4723099-93F2-416C-9905-D12BB1F2AD85 +TCGA-DJ-A2PQ,DJ,Braf-like,-0.888841,TCGA-DJ-A2PQ-01Z-00-DX1.392C1B22-8D97-4924-9968-61747529C9AD +TCGA-EL-A3GZ,EL,Braf-like,-0.864168,TCGA-EL-A3GZ-01Z-00-DX1.D042AE25-2A1B-40AB-A7DE-2BF17BD230B7 +TCGA-MK-A4N7,MK,Braf-like,-0.895786,TCGA-MK-A4N7-01Z-00-DX1.830E125F-78B8-4D18-8523-5A82175030A0 +TCGA-EL-A3GQ,EL,Braf-like,-0.734238,TCGA-EL-A3GQ-01Z-00-DX1.2CD53F67-3FD1-4F26-A3AF-0DE8F79DFC39 +TCGA-FE-A3PC,FE,Braf-like,-0.868136,TCGA-FE-A3PC-01Z-00-DX1.EBF30F16-EDC4-4324-BB4E-2F5A9F9AF05E +TCGA-ET-A39K,ET,Braf-like,-0.875385,TCGA-ET-A39K-01Z-00-DX1.FAB5071F-9BC6-4FA3-ACC4-2FB0B4AFFE5F +TCGA-EM-A2P3,EM,Braf-like,-0.937071,TCGA-EM-A2P3-01Z-00-DX1.7A8DC9FF-F03F-4273-8DCE-EAB412118945 +TCGA-DJ-A13V,DJ,Braf-like,-0.988934,TCGA-DJ-A13V-01Z-00-DX1.88661EBD-6B7A-4EF3-95B1-9CAF38B4BCF2 +TCGA-ET-A3DP,ET,Braf-like,-0.883102,TCGA-ET-A3DP-01Z-00-DX1.D334E83B-E0DC-47EB-A38D-55DEC94C689B +TCGA-EL-A3CV,EL,Braf-like,-0.889018,TCGA-EL-A3CV-01Z-00-DX1.08D2DB43-A49B-42F9-B484-D7028E29DBE3 +TCGA-FY-A3I4,FY,Braf-like,-0.632563,TCGA-FY-A3I4-01Z-00-DX1.9DDE959C-EB74-4BDB-8CAE-D72E6694FF05 +TCGA-BJ-A45I,BJ,Braf-like,-0.972833,TCGA-BJ-A45I-01Z-00-DX1.29C9DC34-228A-4712-A028-07A396B4D1BD +TCGA-EL-A3H7,EL,Braf-like,-0.978796,TCGA-EL-A3H7-01Z-00-DX1.BBCE50FD-96BF-4AEB-AA57-072C5FE2A1DF +TCGA-ET-A25O,ET,Braf-like,-0.697294,TCGA-ET-A25O-01Z-00-DX1.74BDAA2B-FD15-4B49-A809-804846DF619D +TCGA-FE-A3PB,FE,Braf-like,-0.869745,TCGA-FE-A3PB-01Z-00-DX1.FBB5CA1C-6B8D-4F71-A4D2-382B055CF008 +TCGA-EM-A3AR,EM,Braf-like,-0.275776,TCGA-EM-A3AR-01Z-00-DX1.80FCE1EC-6D5C-4762-8AB4-B0384AA69965 +TCGA-BJ-A4O8,BJ,Braf-like,-0.973757,TCGA-BJ-A4O8-01Z-00-DX1.3E719E3C-DE12-4893-AEEA-7CF11A2633DA +TCGA-E3-A3DY,E3,Braf-like,-0.874333,TCGA-E3-A3DY-01Z-00-DX1.5FFB8A1E-3AC8-494D-8112-5F1FBDC3F597 +TCGA-EL-A3GS,EL,Braf-like,-0.795377,TCGA-EL-A3GS-01Z-00-DX1.7C41E4B8-0245-4F3D-BA24-9F10147C4C19 +TCGA-DJ-A1QI,DJ,Braf-like,-0.909918,TCGA-DJ-A1QI-01Z-00-DX1.EBF4808E-746B-460F-88D2-19D03FAB54A5 +TCGA-FY-A3R7,FY,Braf-like,-0.497794,TCGA-FY-A3R7-01Z-00-DX1.C1049915-5AB3-44B9-858E-14DFB23F6BD1 +TCGA-EM-A4FQ,EM,Braf-like,-0.897698,TCGA-EM-A4FQ-01Z-00-DX1.EDAA6AF4-8736-46C8-AF49-F4901AC5D0C2 +TCGA-EL-A3GR,EL,Braf-like,-0.777184,TCGA-EL-A3GR-01Z-00-DX1.EA842796-37C5-4868-A278-C16ED66CF1CE +TCGA-ET-A2N4,ET,Braf-like,-0.346263,TCGA-ET-A2N4-01Z-00-DX1.726A5148-0065-4494-9352-3CBFB9E58536 +TCGA-ET-A3BP,ET,Braf-like,-0.524359,TCGA-ET-A3BP-01Z-00-DX1.31AF28CB-251A-4DE8-9A82-8C1BDDAF2438 +TCGA-DJ-A1QE,DJ,Braf-like,-0.973941,TCGA-DJ-A1QE-01Z-00-DX1.B8EB5453-9B8D-49B2-9AAF-2145C61D493D +TCGA-FE-A23A,FE,Braf-like,-0.893298,TCGA-FE-A23A-01Z-00-DX1.87FE71D9-087F-45B2-8B1B-935566E8489A +TCGA-DJ-A3UQ,DJ,Braf-like,-0.97541,TCGA-DJ-A3UQ-01Z-00-DX1.36B3728D-E0AB-42B3-8C2F-3F2D9BAED50D +TCGA-DJ-A2QA,DJ,Braf-like,-0.504226,TCGA-DJ-A2QA-01Z-00-DX1.0CC34156-B3F5-43DD-B740-A6F51BF45693 +TCGA-DJ-A2PZ,DJ,Braf-like,-0.843451,TCGA-DJ-A2PZ-01Z-00-DX1.9CD11DC2-DE3F-4C05-846B-AE9E4BBFC000 +TCGA-ET-A3DW,ET,Braf-like,-0.798633,TCGA-ET-A3DW-01Z-00-DX1.02B1407D-BD72-42C1-A8D4-85D7F14CC3F1 +TCGA-EL-A4JX,EL,Braf-like,-0.91825,TCGA-EL-A4JX-01Z-00-DX1.818961CB-8328-484A-A1D8-274D3A7F5D28 +TCGA-ET-A2N0,ET,Braf-like,-0.697888,TCGA-ET-A2N0-01Z-00-DX1.C134C410-A5CB-4CBB-A9FC-F6C16DDD66DF +TCGA-FY-A3ON,FY,Braf-like,-0.786769,TCGA-FY-A3ON-01Z-00-DX1.972E576B-DA72-4DA1-BCE2-0873118C1ACD +TCGA-EM-A3FK,EM,Braf-like,-0.937697,TCGA-EM-A3FK-01Z-00-DX1.BED5487B-9BE8-49DA-BE00-CC94A40A7B2F +TCGA-DJ-A1QF,DJ,Braf-like,-0.782804,TCGA-DJ-A1QF-01Z-00-DX1.31091089-431C-477E-BBAD-774432E05449 +TCGA-EL-A3H4,EL,Braf-like,-0.828942,TCGA-EL-A3H4-01Z-00-DX1.CB5C38FE-EBF3-449D-8608-683744AAC5A4 +TCGA-GE-A2C6,GE,Braf-like,-0.717834,TCGA-GE-A2C6-01Z-00-DX1.CD093DB7-F4AA-4D24-AEB6-7B353FF0A9A4 +TCGA-EL-A3GY,EL,Braf-like,-1,TCGA-EL-A3GY-01Z-00-DX1.77EB800B-4CE0-4AD3-816B-EA6D945C1156 +TCGA-DJ-A2PR,DJ,Braf-like,-0.8304,TCGA-DJ-A2PR-01Z-00-DX1.CE0FA852-C086-421C-A19A-BAF52A25E8DD +TCGA-EL-A4JZ,EL,Braf-like,-0.707075,TCGA-EL-A4JZ-01Z-00-DX1.E7D26ECB-3EE9-4364-95C8-9C2419467CFB +TCGA-ET-A25R,ET,Braf-like,-0.962697,TCGA-ET-A25R-01Z-00-DX1.1D13E4C2-31FB-4A97-B99C-42A857522301 +TCGA-DJ-A2Q7,DJ,Braf-like,-0.763196,TCGA-DJ-A2Q7-01Z-00-DX1.AA5FC72B-4C57-4235-8C90-D17FE2FA5D80 +TCGA-DJ-A3UK,DJ,Braf-like,-0.701099,TCGA-DJ-A3UK-01Z-00-DX1.E216A23C-1F6F-4898-807D-9373EEC736FE +TCGA-DJ-A13L,DJ,Braf-like,-0.938449,TCGA-DJ-A13L-01Z-00-DX1.38ECFDE5-B5A8-4E90-A296-3F17A8D61C32 +TCGA-DJ-A2PO,DJ,Braf-like,-0.772315,TCGA-DJ-A2PO-01Z-00-DX1.75B88F27-FE4F-47AE-85AA-9291BC387205 +TCGA-EL-A3CS,EL,Braf-like,-0.506641,TCGA-EL-A3CS-01Z-00-DX1.C54BAF21-0565-4AD8-AEBA-E7C6BF1BA3BA +TCGA-FE-A231,FE,Braf-like,-0.708029,TCGA-FE-A231-01Z-00-DX1.4507C446-ECBC-40CB-ADBE-61EC5005E794 +TCGA-DJ-A2PT,DJ,Braf-like,-0.934725,TCGA-DJ-A2PT-01Z-00-DX1.8C28F7F7-426A-4AAC-8AC6-D082F85C4D34 +TCGA-DO-A1K0,DO,Braf-like,-0.927687,TCGA-DO-A1K0-01Z-00-DX1.5ED4011C-6AAA-4197-8044-1F69D55CEAEE +TCGA-FY-A40K,FY,Braf-like,-0.855699,TCGA-FY-A40K-01Z-00-DX1.58543A84-2621-4FAA-BC1F-5B5267F2BA13 +TCGA-DJ-A3VB,DJ,Braf-like,-0.94818,TCGA-DJ-A3VB-01Z-00-DX1.061A9E26-FFC3-4811-83CC-AC5CCE6DCEC2 +TCGA-L6-A4ET,L6,Braf-like,-0.815732,TCGA-L6-A4ET-01Z-00-DX1.C1AFDAD4-C9D5-423D-9647-ADBBEB693611 +TCGA-DJ-A3UU,DJ,Braf-like,-0.876604,TCGA-DJ-A3UU-01Z-00-DX1.3A1C8C64-9C18-4276-93B6-09A33C376A61 +TCGA-EM-A1CV,EM,Braf-like,-0.660165,TCGA-EM-A1CV-01Z-00-DX1.BF664F6E-7B7B-4333-95CA-C5FC79BE39BB +TCGA-EL-A3T8,EL,Braf-like,-0.86836,TCGA-EL-A3T8-01Z-00-DX1.1D627E88-11CB-4118-973F-FE7573CF91B6 +TCGA-DO-A2HM,DO,Braf-like,-0.698948,TCGA-DO-A2HM-01Z-00-DX1.20D6ED5E-AD06-464B-9311-8D470E236931 +TCGA-DJ-A2QC,DJ,Braf-like,-0.660874,TCGA-DJ-A2QC-01Z-00-DX1.877FA32E-9E78-4ABF-A7C2-A61C875749A1 +TCGA-EL-A3CL,EL,Braf-like,-0.790992,TCGA-EL-A3CL-01Z-00-DX1.CBC51645-8820-4806-9794-2C0B4D38D897 +TCGA-DJ-A4UL,DJ,Braf-like,-0.83713,TCGA-DJ-A4UL-01Z-00-DX1.60C4EF70-91EB-4F50-BE60-1ABB40AA2524 +TCGA-EL-A3T1,EL,Braf-like,-0.95923,TCGA-EL-A3T1-01Z-00-DX1.628A003F-51F6-457E-B5E5-CD5B6F6E00D6 +TCGA-J8-A3YE,J8,Braf-like,-0.562045,TCGA-J8-A3YE-01Z-00-DX1.83286B2F-6D9C-4C11-8224-24D86BF517FA +TCGA-E8-A242,E8,Braf-like,-0.985567,TCGA-E8-A242-01Z-00-DX1.9DDBB5BB-696E-4C61-BF4A-464062403F04 +TCGA-E8-A418,E8,Braf-like,-0.853855,TCGA-E8-A418-01Z-00-DX1.53DB7DD7-9A4D-4990-8667-71C0E3225686 +TCGA-FE-A235,FE,Braf-like,-0.763714,TCGA-FE-A235-01Z-00-DX1.65E15569-F4B8-4A36-AF7A-9C9DB128A22B +TCGA-DJ-A2PS,DJ,Braf-like,-0.681399,TCGA-DJ-A2PS-01Z-00-DX1.9740CBCF-6650-455F-9CC2-80BEA0A1B033 +TCGA-DJ-A3UN,DJ,Braf-like,-0.506253,TCGA-DJ-A3UN-01Z-00-DX1.D9D7B498-7FF4-4F21-8843-92202398E89A +TCGA-DJ-A13T,DJ,Braf-like,-0.759873,TCGA-DJ-A13T-01Z-00-DX1.E66E0E98-535C-485E-9A26-9FA5D8A6766C +TCGA-BJ-A28R,BJ,Braf-like,-0.860162,TCGA-BJ-A28R-01Z-00-DX1.604684C7-A61F-4EA5-9E6E-545CE8420B73 +TCGA-DE-A0Y3,DE,Braf-like,-0.834052,TCGA-DE-A0Y3-01Z-00-DX1.DEB10F5F-4EF9-4346-8909-89659D20451C +TCGA-FE-A237,FE,Braf-like,-0.968668,TCGA-FE-A237-01Z-00-DX1.4D6E9100-0586-43A9-9070-B24F1BD1E7A9 +TCGA-ET-A39S,ET,Braf-like,-0.626978,TCGA-ET-A39S-01Z-00-DX1.078746BC-33B7-47A2-A6D8-F5B2E6180E30 +TCGA-DJ-A2Q6,DJ,Braf-like,-0.910275,TCGA-DJ-A2Q6-01Z-00-DX1.105881C5-0EB7-4602-AC1A-B061595CDF90 +TCGA-ET-A39T,ET,Braf-like,-0.623684,TCGA-ET-A39T-01Z-00-DX1.7006B750-108F-4C2F-A934-80A2434FA029 +TCGA-EL-A4K4,EL,Braf-like,-0.923064,TCGA-EL-A4K4-01Z-00-DX1.77AEEDD8-FFA3-468C-9925-E38348A00381 +TCGA-EM-A4FM,EM,Braf-like,-0.970594,TCGA-EM-A4FM-01Z-00-DX1.B48D42B6-53F0-4270-82F0-143EE1573F69 +TCGA-KS-A41J,KS,Braf-like,-0.924332,TCGA-KS-A41J-01Z-00-DX1.BD832F56-8236-4B75-9F27-C0CD57C27BB2 +TCGA-E3-A3E2,E3,Braf-like,-0.881561,TCGA-E3-A3E2-01Z-00-DX1.E21626F3-0840-40ED-B853-E3E2992EBB24 +TCGA-EL-A3T7,EL,Braf-like,-0.951634,TCGA-EL-A3T7-01Z-00-DX1.7697FAF4-272A-44C3-B1E2-D88C2A0B3D05 +TCGA-EL-A3ZQ,EL,Braf-like,-0.738984,TCGA-EL-A3ZQ-01Z-00-DX1.7E275EC2-1F61-4639-B88D-97FAEE877846 +TCGA-EL-A3CP,EL,Braf-like,-0.88999,TCGA-EL-A3CP-01Z-00-DX1.D0F4C535-AF19-4F30-9440-5B6656B0C3F0 +TCGA-ET-A3BU,ET,Braf-like,-0.888346,TCGA-ET-A3BU-01Z-00-DX1.AEBD5734-FBDA-44ED-A173-751904429C8B +TCGA-EL-A3CM,EL,Braf-like,-0.865545,TCGA-EL-A3CM-01Z-00-DX1.7AC2A490-AEF9-456E-9B29-CB3E252ED947 +TCGA-BJ-A0Z9,BJ,Braf-like,-0.831401,TCGA-BJ-A0Z9-01Z-00-DX1.35FB4049-9A3D-4C72-9969-148B5402A090 +TCGA-EL-A3GV,EL,Braf-like,-0.911532,TCGA-EL-A3GV-01Z-00-DX1.45E8B373-5E7D-42BA-A2FB-A918A465AD08 +TCGA-KS-A4I9,KS,Braf-like,-0.97727,TCGA-KS-A4I9-01Z-00-DX1.D40436B2-5EDA-42C8-9AC5-EF8FF36DADC9 +TCGA-IM-A3U2,IM,Braf-like,-0.984396,TCGA-IM-A3U2-01Z-00-DX1.CECE491E-95AF-4D3A-8579-358280639B0B +TCGA-EM-A3FJ,EM,Braf-like,-0.93615,TCGA-EM-A3FJ-01Z-00-DX1.EAC876AD-5163-4412-B49A-C7FBDA332AF6 +TCGA-DJ-A13P,DJ,Braf-like,-0.734512,TCGA-DJ-A13P-01Z-00-DX1.3F10C782-D5D3-4B7C-A898-F6F061C78EB8 +TCGA-DJ-A13O,DJ,Braf-like,-0.530847,TCGA-DJ-A13O-01Z-00-DX1.2BABDA0A-47B6-4B79-B0C1-F98112867FD3 +TCGA-EL-A3H5,EL,Braf-like,-0.893705,TCGA-EL-A3H5-01Z-00-DX1.7712A1C9-A8BB-48C9-A57F-30E44D258CE5 +TCGA-EM-A22I,EM,Braf-like,-0.965868,TCGA-EM-A22I-01Z-00-DX1.D356C061-F2F9-4519-B50F-332CDCEFCEAE +TCGA-EM-A2P0,EM,Braf-like,-0.725985,TCGA-EM-A2P0-01Z-00-DX1.35368BFA-CBC0-403D-9A5A-10E9AAA8E23E +TCGA-EM-A2OZ,EM,Braf-like,-0.704196,TCGA-EM-A2OZ-01Z-00-DX1.3146F96A-8BFE-46D2-AFFA-8593A56B58CD +TCGA-E8-A413,E8,Braf-like,-0.752532,TCGA-E8-A413-01Z-00-DX1.6307C66E-527C-4F67-AF4B-11953E3CE6DB +TCGA-FK-A3SH,FK,Braf-like,-0.80378,TCGA-FK-A3SH-01Z-00-DX1.2A218FE8-CA0B-4780-AB2C-03AD4569D10C +TCGA-ET-A39J,ET,Braf-like,-0.911863,TCGA-ET-A39J-01Z-00-DX1.AEAA1775-236A-437A-9561-2C21906BB444 +TCGA-DJ-A3UO,DJ,Braf-like,-0.798929,TCGA-DJ-A3UO-01Z-00-DX1.4113A0BA-FD12-4DE9-8DC0-911BBF5DBC89 +TCGA-DJ-A2Q3,DJ,Braf-like,-0.783963,TCGA-DJ-A2Q3-01Z-00-DX1.F6B4372C-8371-463A-9BFC-F9702DA80AEA +TCGA-E8-A415,E8,Braf-like,-0.694544,TCGA-E8-A415-01Z-00-DX1.99DCF277-DF60-456C-AD70-003C08D9FA01 +TCGA-FE-A236,FE,Braf-like,-0.635745,TCGA-FE-A236-01Z-00-DX1.1F62866A-9D4C-4FE3-AE63-E26C39BB518A +TCGA-FY-A3RA,FY,Braf-like,-0.862337,TCGA-FY-A3RA-01Z-00-DX1.E57FFC9F-1F66-4AA2-A2A9-2D2CD160B2E5 +TCGA-DJ-A3UY,DJ,Braf-like,-0.630139,TCGA-DJ-A3UY-01Z-00-DX1.18014546-386C-4EBE-95FA-590B1BA9AA50 +TCGA-H2-A421,H2,Braf-like,-0.927033,TCGA-H2-A421-01Z-00-DX1.1D6462C1-2B95-4BC4-A563-45CFE5C0C17E +TCGA-FY-A3R8,FY,Braf-like,-0.933308,TCGA-FY-A3R8-01Z-00-DX1.400E7E59-3895-4FC0-9A22-CE43C45BC8E7 +TCGA-J8-A3NZ,J8,Braf-like,-0.965077,TCGA-J8-A3NZ-01Z-00-DX1.56D16F1E-FF2A-403D-9E40-1F69C33B0CAF +TCGA-ET-A3DT,ET,Braf-like,-0.886036,TCGA-ET-A3DT-01Z-00-DX1.068E4B9B-BC5E-413D-B2C4-28A102BE019D +TCGA-DJ-A2Q4,DJ,Braf-like,-0.950607,TCGA-DJ-A2Q4-01Z-00-DX1.A815116F-298A-41D9-B6A8-ADCD0A0DCC01 +TCGA-DJ-A3VJ,DJ,Braf-like,-0.911663,TCGA-DJ-A3VJ-01Z-00-DX1.84EE18A7-A597-4D0E-80E6-FB4A9361D512 +TCGA-EL-A3T6,EL,Braf-like,-0.882245,TCGA-EL-A3T6-01Z-00-DX1.F60B9CA1-1EA7-4D28-8117-820597C290B0 +TCGA-DJ-A3UM,DJ,Braf-like,-0.981724,TCGA-DJ-A3UM-01Z-00-DX1.4D56B74F-3F8E-4A4F-B7EA-B91DC97936BD +TCGA-EM-A1CU,EM,Braf-like,-0.961017,TCGA-EM-A1CU-01Z-00-DX1.3867DBBD-1507-4C14-A504-8AF2DEC28077 +TCGA-DJ-A3VA,DJ,Braf-like,-0.9598,TCGA-DJ-A3VA-01Z-00-DX1.FA6843C7-BB03-4B4D-B6CE-FA93800F9351 +TCGA-EL-A3MX,EL,Braf-like,-0.526882,TCGA-EL-A3MX-01Z-00-DX1.4E7683C9-AA35-4A12-BE04-948A609446FB +TCGA-H2-A3RI,H2,Braf-like,-0.801542,TCGA-H2-A3RI-01Z-00-DX1.AD845C14-4A49-4322-8AA4-6BD565CDFD90 +TCGA-EL-A3T3,EL,Braf-like,-0.957673,TCGA-EL-A3T3-01Z-00-DX1.E36CE749-0277-4CB5-9FC4-E219B991D054 +TCGA-DJ-A4V4,DJ,Braf-like,-0.80648,TCGA-DJ-A4V4-01Z-00-DX1.EC5A9BA5-DEF4-494A-8190-3835ECEABAF3 +TCGA-DJ-A3V7,DJ,Braf-like,-0.851342,TCGA-DJ-A3V7-01Z-00-DX1.113E18FD-C19B-44FD-BA76-650802222B47 +TCGA-EL-A4KH,EL,Braf-like,-0.706837,TCGA-EL-A4KH-01Z-00-DX1.76D56F70-3279-43C3-A53D-BF2D3CF5051A +TCGA-DJ-A1QM,DJ,Braf-like,-0.394584,TCGA-DJ-A1QM-01Z-00-DX1.3A58434F-A1A0-4FF4-B126-214C02CCEC7A +TCGA-DJ-A1QH,DJ,Braf-like,-0.970023,TCGA-DJ-A1QH-01Z-00-DX1.20353932-D2BA-4A54-A1C1-834D0841606A +TCGA-FE-A233,FE,Braf-like,-0.888174,TCGA-FE-A233-01Z-00-DX1.76314C1C-D914-4B1D-B72C-70E75874487A +TCGA-E3-A3E1,E3,Braf-like,-0.823903,TCGA-E3-A3E1-01Z-00-DX1.03D78A51-DE27-4E64-A69F-A04AF2570F77 +TCGA-CE-A3ME,CE,Braf-like,-0.790653,TCGA-CE-A3ME-01Z-00-DX1.BCF7C127-B617-43C3-9D54-E10310EE9DA5 +TCGA-E8-A437,E8,Braf-like,-0.662654,TCGA-E8-A437-01Z-00-DX1.46C70FF7-302E-4AF3-B589-54EA6FE65C20 +TCGA-EM-A3O3,EM,Braf-like,-0.718779,TCGA-EM-A3O3-01Z-00-DX1.F1D69AEB-CBBD-49EE-A2F4-411A11E3B3AC +TCGA-EL-A3N3,EL,Braf-like,-0.902508,TCGA-EL-A3N3-01Z-00-DX1.7EBD18A4-3A24-4AFD-8CD4-F2BAF60BE32A +TCGA-E3-A3E3,E3,Braf-like,-0.911362,TCGA-E3-A3E3-01Z-00-DX1.81DD51D2-1888-4BF7-B1F6-B75A132DE424 +TCGA-EL-A3MY,EL,Braf-like,-0.946879,TCGA-EL-A3MY-01Z-00-DX1.3EAC1ABA-C2BB-4045-A561-9792D30BC438 +TCGA-ET-A39O,ET,Braf-like,-0.983016,TCGA-ET-A39O-01Z-00-DX1.8036EEAA-916A-48F0-A47E-7A4EDFD5066B +TCGA-DJ-A1QN,DJ,Braf-like,-0.890174,TCGA-DJ-A1QN-01Z-00-DX1.92ECB15F-0DF0-4678-B569-663B674113C6 +TCGA-CE-A482,CE,Braf-like,-0.48547,TCGA-CE-A482-01Z-00-DX1.0F46244F-5C28-41EA-B189-EF93D8FC8B3B +TCGA-ET-A40S,ET,Braf-like,-0.345056,TCGA-ET-A40S-01Z-00-DX1.CA1AE742-654A-4C5C-A5BC-D467AC8B287B +TCGA-J8-A4HW,J8,Braf-like,-0.614643,TCGA-J8-A4HW-01Z-00-DX1.51EEBB5E-DD1E-4071-9885-C100581AD924 +TCGA-EL-A3T0,EL,Braf-like,-0.307026,TCGA-EL-A3T0-01Z-00-DX1.CB54731B-689F-473D-B7DE-2109197CC1CA +TCGA-EL-A3ZK,EL,Braf-like,-0.585901,TCGA-EL-A3ZK-01Z-00-DX1.2A501355-DD58-40E1-B31A-A412AEEAFD4C +TCGA-E8-A44M,E8,Braf-like,-0.33961,TCGA-E8-A44M-01Z-00-DX1.270C42A2-EA51-4937-9106-ECBBB5B7681D +TCGA-DE-A3KN,DE,Braf-like,-0.177356,TCGA-DE-A3KN-01Z-00-DX1.DD8CFAA4-96DD-4940-8405-CA1BBE641B47 +TCGA-DJ-A2PX,DJ,Braf-like,-0.43636,TCGA-DJ-A2PX-01Z-00-DX1.3C7F4F1E-8D23-4F10-B6FA-FD557F608EBB +TCGA-CE-A13K,CE,Braf-like,-0.329836,TCGA-CE-A13K-01Z-00-DX1.7D293931-179E-4C5E-ACE5-D87D7ECAC0A4 +TCGA-ET-A3DR,ET,Braf-like,-0.551962,TCGA-ET-A3DR-01Z-00-DX1.702323F9-77A9-4D26-8236-F009C6CFADE4 +TCGA-EM-A1CS,EM,Braf-like,-0.73176,TCGA-EM-A1CS-01Z-00-DX1.37618D38-55A6-4D9C-AE95-4B42239ECC5E +TCGA-BJ-A3PR,BJ,Braf-like,-0.52556,TCGA-BJ-A3PR-01Z-00-DX1.BD741B06-31EC-48ED-871E-55B2378CB547 +TCGA-DO-A1JZ,DO,Braf-like,-0.609057,TCGA-DO-A1JZ-01Z-00-DX1.CD2D8D52-8E7A-4187-A5BA-DF3E0FA7984A +TCGA-EM-A3AQ,EM,Braf-like,-0.518428,TCGA-EM-A3AQ-01Z-00-DX1.3B5348E7-FD6B-49A9-B211-E3934E0BDC92 +TCGA-CE-A485,CE,Braf-like,-0.591397,TCGA-CE-A485-01Z-00-DX1.F3AF4627-CD5F-441E-80E6-1C9E1FB502D7 +TCGA-DJ-A4UT,DJ,Braf-like,-0.535886,TCGA-DJ-A4UT-01Z-00-DX1.45F8A6C0-E432-4350-8211-BC1D65CC148F +TCGA-DJ-A4V5,DJ,Braf-like,-0.839449,TCGA-DJ-A4V5-01Z-00-DX1.3AE4DEBF-2DE1-42E4-9A29-118A1A8159BC +TCGA-FK-A3S3,FK,Braf-like,-0.254331,TCGA-FK-A3S3-01Z-00-DX1.27050211-E584-4001-B256-AC113B08FAA0 +TCGA-DJ-A4UP,DJ,Braf-like,-0.12368,TCGA-DJ-A4UP-01Z-00-DX1.46F90739-541F-4EBA-9564-DF0FC2C8D2B9 +TCGA-BJ-A28T,BJ,Braf-like,-0.107335,TCGA-BJ-A28T-01Z-00-DX1.99800524-4762-48C3-BC79-E409ED4034DC +TCGA-EM-A3AO,EM,Braf-like,-0.312015,TCGA-EM-A3AO-01Z-00-DX1.1D622410-D778-4BDC-84F5-90958A9D8488 +TCGA-EL-A3T9,EL,Braf-like,-0.687835,TCGA-EL-A3T9-01Z-00-DX1.E2795595-E82F-4DFC-A6AC-A4AA45753A00 +TCGA-BJ-A0ZJ,BJ,Braf-like,-0.2757,TCGA-BJ-A0ZJ-01Z-00-DX1.4C019A61-37AC-411E-8C19-C4FE369A99D5 +TCGA-DJ-A2Q1,DJ,Braf-like,-0.640941,TCGA-DJ-A2Q1-01Z-00-DX1.F9F8C711-07DF-4726-9D27-0C6C648EB261 +TCGA-DJ-A2Q9,DJ,Braf-like,-0.601835,TCGA-DJ-A2Q9-01Z-00-DX1.75717084-0627-47F0-91DF-66B5BE73F667 +TCGA-EL-A4KD,EL,Braf-like,-0.077944,TCGA-EL-A4KD-01Z-00-DX1.D2B50901-4108-453B-8022-CBAE12ADD621 +TCGA-BJ-A28Z,BJ,Braf-like,-0.716829,TCGA-BJ-A28Z-01Z-00-DX1.628686AD-2E17-49F4-BF26-F21750E5E578 +TCGA-FY-A3R6,FY,Braf-like,-0.135262,TCGA-FY-A3R6-01Z-00-DX1.B51917D1-3A4C-4374-A22C-03A339C7582C +TCGA-EL-A3CW,EL,Braf-like,-0.213847,TCGA-EL-A3CW-01Z-00-DX1.7D2D5FD8-0588-4556-9DC2-9FB06107F99C +TCGA-DE-A0Y2,DE,Braf-like,-0.761516,TCGA-DE-A0Y2-01Z-00-DX1.E662700E-7C41-4467-8629-C5309C3CD722 +TCGA-EL-A3TB,EL,Braf-like,-0.501318,TCGA-EL-A3TB-01Z-00-DX1.15F169A5-CC16-4DBD-94BD-E4EB982E52BE +TCGA-ET-A39R,ET,Braf-like,-0.466172,TCGA-ET-A39R-01Z-00-DX1.C43C669B-49FF-4583-B6C7-0BB4FC721D1A +TCGA-EM-A3AN,EM,Braf-like,-0.654739,TCGA-EM-A3AN-01Z-00-DX1.ED0EA679-333D-4990-89FB-42459708A036 +TCGA-ET-A4KN,ET,Braf-like,-0.653121,TCGA-ET-A4KN-01Z-00-DX1.E6A2CA33-9ED9-45E7-B9C1-9B89DDE78337 +TCGA-J8-A3O1,J8,Braf-like,-0.34327,TCGA-J8-A3O1-01Z-00-DX1.0BDEF136-AA2A-424F-B8E4-61B6C8C4F93A +TCGA-EL-A3H3,EL,Braf-like,-0.340655,TCGA-EL-A3H3-01Z-00-DX1.573AA976-FCD7-400B-B99A-6C87A9549F94 +TCGA-EM-A2CU,EM,Braf-like,-0.163783,TCGA-EM-A2CU-01Z-00-DX1.7939ADB1-8224-47AC-900B-FFFCB294275D +TCGA-ET-A3BN,ET,Braf-like,-0.664205,TCGA-ET-A3BN-01Z-00-DX1.10409ECA-0CB5-4081-9AEC-9B1F3F10B364 +TCGA-FE-A3PD,FE,Braf-like,-0.260311,TCGA-FE-A3PD-01Z-00-DX1.92071070-3B46-499C-9721-FB6677F20182 +TCGA-EL-A3ZN,EL,Braf-like,-0.089849,TCGA-EL-A3ZN-01Z-00-DX1.F3BA0497-1C01-4A26-86C7-1B7BA603AEDC +TCGA-E3-A3E0,E3,Braf-like,-0.470741,TCGA-E3-A3E0-01Z-00-DX1.E3F4B935-37E1-43B1-AFD9-202222201CC7 +TCGA-FK-A3SE,FK,Braf-like,-0.65092,TCGA-FK-A3SE-01Z-00-DX1.D32CC876-F322-4C96-A3A6-232197DFFEFB +TCGA-EM-A3FQ,EM,Braf-like,-0.625594,TCGA-EM-A3FQ-01Z-00-DX2.139276A5-880A-4CBD-AD7C-9B4E51D9E71C +TCGA-FK-A3SG,FK,Braf-like,-0.735658,TCGA-FK-A3SG-01Z-00-DX1.79234024-8AD3-49F7-88FD-7E488CE77793 +TCGA-DJ-A3VL,DJ,Ras-like,0.685421,TCGA-DJ-A3VL-01Z-00-DX1.B4873C68-3405-4944-AFF4-C87AED853BDE +TCGA-BJ-A28S,BJ,Ras-like,0.91873,TCGA-BJ-A28S-01Z-00-DX1.2BD77575-3791-4AF8-917C-4AED156954E9 +TCGA-EM-A3O8,EM,Ras-like,1,TCGA-EM-A3O8-01Z-00-DX1.B164A20B-7433-420A-B947-C78CEB49B7D2 +TCGA-DJ-A3VK,DJ,Ras-like,0.526318,TCGA-DJ-A3VK-01Z-00-DX1.27EC85CB-86A8-4E18-A6F6-0B1540E9B7F0 +TCGA-EM-A2P2,EM,Ras-like,0.637899,TCGA-EM-A2P2-01Z-00-DX1.832D5A5A-D75D-4BBB-8C22-45A64D42569F +TCGA-BJ-A0ZG,BJ,Ras-like,0.864665,TCGA-BJ-A0ZG-01Z-00-DX1.99FBAFA8-F009-4291-8217-26C64A1A470B +TCGA-EM-A3FN,EM,Ras-like,0.925728,TCGA-EM-A3FN-01Z-00-DX1.B93AB1DC-1F4A-46FE-A269-0BEC3256E76F +TCGA-BJ-A0Z2,BJ,Ras-like,0.945972,TCGA-BJ-A0Z2-01Z-00-DX1.A3B544CF-A1DE-4887-9CBA-69EE3486CDFD +TCGA-EM-A22N,EM,Ras-like,0.929506,TCGA-EM-A22N-01Z-00-DX1.DB853528-DCF5-4463-B196-AFA8670D67FF +TCGA-EM-A3OA,EM,Ras-like,0.961311,TCGA-EM-A3OA-01Z-00-DX1.6EDE47B4-8926-4598-AB68-439590DE7CA8 +TCGA-EL-A3CX,EL,Ras-like,0.168972,TCGA-EL-A3CX-01Z-00-DX1.C99D49E4-5982-4F65-A4A4-E83C5CF8E693 +TCGA-EM-A2OY,EM,Ras-like,0.926467,TCGA-EM-A2OY-01Z-00-DX1.A5DD274E-0EBD-4859-809E-AE5CC8D75016 +TCGA-EM-A2CL,EM,Ras-like,0.965824,TCGA-EM-A2CL-01Z-00-DX1.32F1BCF4-7F07-4653-A859-056DE18DE6CE +TCGA-EM-A2CO,EM,Ras-like,0.934784,TCGA-EM-A2CO-01Z-00-DX1.D4390B8C-50DD-414C-906A-3338C97BAD10 +TCGA-BJ-A2N7,BJ,Ras-like,0.24223,TCGA-BJ-A2N7-01Z-00-DX1.40C11DD3-6FF0-4D0D-8229-33DE2127686D +TCGA-BJ-A0ZA,BJ,Ras-like,0.908714,TCGA-BJ-A0ZA-01Z-00-DX1.D68C0A32-C99B-442B-BA7B-4CC39E810115 +TCGA-BJ-A0YZ,BJ,Ras-like,0.532719,TCGA-BJ-A0YZ-01Z-00-DX1.FFDC24FB-D8A1-40F8-89CD-DDB0ED010870 +TCGA-EL-A3ZR,EL,Ras-like,0.855064,TCGA-EL-A3ZR-01Z-00-DX1.290FBBAA-BD0E-4010-AA05-68341C606678 +TCGA-EM-A1CW,EM,Ras-like,0.863256,TCGA-EM-A1CW-01Z-00-DX1.741F4189-C8C2-4FE7-B9B8-9B7E782CB91A +TCGA-DJ-A2QB,DJ,Ras-like,0.864233,TCGA-DJ-A2QB-01Z-00-DX1.3C514B32-31C5-4BFF-8DDD-9BA583B33E19 +TCGA-EM-A3AP,EM,Ras-like,0.76704,TCGA-EM-A3AP-01Z-00-DX1.1CB5C0D0-FBDA-4B42-928B-937D096235E7 +TCGA-EL-A3D5,EL,Ras-like,0.816008,TCGA-EL-A3D5-01Z-00-DX1.C83AC393-806B-4B3F-BD7F-05D753E179A1 +TCGA-DJ-A13S,DJ,Ras-like,0.823449,TCGA-DJ-A13S-01Z-00-DX1.D92F9C56-8477-4C3E-848F-3948AD224015 +TCGA-EM-A2CN,EM,Ras-like,0.682923,TCGA-EM-A2CN-01Z-00-DX1.8A82A101-19A6-4809-B873-B67014DD9E0A +TCGA-ET-A39L,ET,Ras-like,0.043379,TCGA-ET-A39L-01Z-00-DX1.B006F8A2-B64C-4052-90B5-953BF543E290 +TCGA-H2-A2K9,H2,Ras-like,0.860725,TCGA-H2-A2K9-01Z-00-DX1.4FA56AC1-CC13-43B4-A8D8-06EDC96D0968 +TCGA-EM-A1YD,EM,Ras-like,0.772551,TCGA-EM-A1YD-01Z-00-DX1.476300DE-BEE0-4B66-8128-2C06A2A25D0A +TCGA-BJ-A0Z0,BJ,Ras-like,0.828065,TCGA-BJ-A0Z0-01Z-00-DX1.65834934-8E06-426D-A20A-B0D20A0C7C7A +TCGA-EM-A4G1,EM,Ras-like,0.508692,TCGA-EM-A4G1-01Z-00-DX1.285286C5-567C-48AE-B741-F5433241E096 +TCGA-BJ-A191,BJ,Ras-like,0.463801,TCGA-BJ-A191-01Z-00-DX1.E65056B8-C7D0-46A6-98B8-067E25548912 +TCGA-EL-A3GW,EL,Ras-like,0.664622,TCGA-EL-A3GW-01Z-00-DX1.006A8444-7894-4864-A181-D57EE97B4303 +TCGA-ET-A2N5,ET,Ras-like,0.936133,TCGA-ET-A2N5-01Z-00-DX1.07BFF458-3765-4256-96DA-63DEA49A475D +TCGA-EM-A3AL,EM,Ras-like,0.91961,TCGA-EM-A3AL-01Z-00-DX1.BF94702E-FEE4-4684-A26E-3D405119B3D5 +TCGA-ET-A39N,ET,Ras-like,0.508438,TCGA-ET-A39N-01Z-00-DX1.26D1D111-73BF-476F-8DCA-82BD5A56061C +TCGA-EM-A4FK,EM,Ras-like,0.941166,TCGA-EM-A4FK-01Z-00-DX1.3D8685DA-332D-4F12-A33E-F23E33E837D8 +TCGA-DJ-A3VM,DJ,Ras-like,0.686611,TCGA-DJ-A3VM-01Z-00-DX1.C2613B37-AE47-42CD-809C-28B9F2204247 +TCGA-CE-A3MD,CE,Ras-like,0.061702,TCGA-CE-A3MD-01Z-00-DX1.0CCB9625-0F74-433F-A59D-EFEB8AB7D55B +TCGA-EL-A3CZ,EL,Ras-like,0.846831,TCGA-EL-A3CZ-01Z-00-DX1.F88B74C8-28FF-4163-9461-FD2A12F07AA0 +TCGA-EL-A4K6,EL,Ras-like,0.860847,TCGA-EL-A4K6-01Z-00-DX1.264DBA6F-C203-4273-9E8A-D414E180909F +TCGA-DJ-A3UP,DJ,Ras-like,0.716826,TCGA-DJ-A3UP-01Z-00-DX1.D418AE3C-66C4-40DA-9EC7-5A8F4FB73A08 +TCGA-BJ-A45K,BJ,Ras-like,0.926735,TCGA-BJ-A45K-01Z-00-DX1.3074A445-BE1B-491D-918B-AB65B54DBAD4 +TCGA-EL-A4K2,EL,Ras-like,0.60103,TCGA-EL-A4K2-01Z-00-DX1.19EF765A-512F-4229-8702-AD56D1C48128 +TCGA-EL-A3T2,EL,Ras-like,0.773815,TCGA-EL-A3T2-01Z-00-DX1.D632E0A6-9363-4935-A86B-29AF98C2BD37 +TCGA-EL-A3CO,EL,Ras-like,0.085781,TCGA-EL-A3CO-01Z-00-DX1.7BF5F004-E7E6-4320-BA89-39D05657BBCB +TCGA-BJ-A28V,BJ,Ras-like,0.926256,TCGA-BJ-A28V-01Z-00-DX1.97B52F9D-2A71-4405-8E76-833BF738B399 +TCGA-FY-A3WA,FY,Ras-like,0.903838,TCGA-FY-A3WA-01Z-00-DX1.1DFC59E4-7114-4EA5-8038-224E353F6499 +TCGA-EM-A3FP,EM,Ras-like,0.874122,TCGA-EM-A3FP-01Z-00-DX1.AE63796E-37CE-41B8-B15C-FDCA8BA4DF2D +TCGA-EM-A2CK,EM,Ras-like,0.974134,TCGA-EM-A2CK-01Z-00-DX1.DAB0A75D-45B9-41F3-8358-CAC283D10EA9 +TCGA-BJ-A18Y,BJ,Ras-like,0.697406,TCGA-BJ-A18Y-01Z-00-DX1.6985A7F1-2791-413B-B60F-51A334E47634 +TCGA-DJ-A4V0,DJ,Ras-like,0.287804,TCGA-DJ-A4V0-01Z-00-DX1.2D217B1F-07C9-4DAE-B055-5285EE5F667B +TCGA-CE-A484,CE,Ras-like,0.635489,TCGA-CE-A484-01Z-00-DX1.985833BD-7C94-4B91-A6CB-617E3962CCCA +TCGA-EM-A22K,EM,Ras-like,0.670533,TCGA-EM-A22K-01Z-00-DX1.7E55DD32-B9F2-40ED-B231-4839A68647CD +TCGA-EM-A3O9,EM,Ras-like,0.613097,TCGA-EM-A3O9-01Z-00-DX1.D875DF72-EBDE-43FD-9FAE-3F696D4C0F03 +TCGA-EL-A3H1,EL,Ras-like,0.886898,TCGA-EL-A3H1-01Z-00-DX1.85FD4C4E-A333-41D7-B945-67A1CECAAA20 +TCGA-ET-A25I,ET,Ras-like,0.880331,TCGA-ET-A25I-01Z-00-DX1.4FCAEF61-6919-4F97-BC38-4D6C27B8FA53 +TCGA-EM-A2OW,EM,Ras-like,0.927822,TCGA-EM-A2OW-01Z-00-DX1.8495B3A9-2FD1-4ADF-868C-816C897FAD14 +TCGA-EM-A2CT,EM,Ras-like,0.924334,TCGA-EM-A2CT-01Z-00-DX1.717B3037-9B23-42A2-BB3E-6710FCEF8D58 +TCGA-EM-A4FR,EM,Ras-like,0.313948,TCGA-EM-A4FR-01Z-00-DX1.40D1A347-D2EE-4708-8DC2-742B2E674B8B +TCGA-BJ-A0ZC,BJ,Ras-like,0.955108,TCGA-BJ-A0ZC-01Z-00-DX1.1E22C86F-CE70-4D3C-8D13-9DAAE07DAE12 +TCGA-EL-A3ZH,EL,Ras-like,0.874972,TCGA-EL-A3ZH-01Z-00-DX1.AE63AF56-1691-435C-85A6-DC1F9F0A6800 +TCGA-DJ-A2Q2,DJ,Ras-like,0.632689,TCGA-DJ-A2Q2-01Z-00-DX1.3CA0FEAB-FFA1-46C6-9558-E98F7746B451 +TCGA-EM-A3FR,EM,Ras-like,0.88844,TCGA-EM-A3FR-01Z-00-DX1.93C92E7F-C656-4777-AAAC-646833D83C65 +TCGA-EM-A3AJ,EM,Ras-like,0.482084,TCGA-EM-A3AJ-01Z-00-DX1.20774305-2AB9-41A6-93C4-ACF1F8D55BAA +TCGA-EM-A3O6,EM,Ras-like,0.976137,TCGA-EM-A3O6-01Z-00-DX1.D6CB7107-7A56-4C76-A65C-033FD214EBA5 +TCGA-DJ-A2Q0,DJ,Ras-like,0.882789,TCGA-DJ-A2Q0-01Z-00-DX1.C6E15B18-FF8B-4E89-A59A-3116233567D3 +TCGA-EM-A1YC,EM,Ras-like,0.776,TCGA-EM-A1YC-01Z-00-DX1.B44A38E2-6D85-4519-A187-4E3142916E5A +TCGA-EM-A22Q,EM,Ras-like,0.907169,TCGA-EM-A22Q-01Z-00-DX1.C9DC2543-BAA0-4930-A871-0BAFE5A391F7 +TCGA-ET-A39I,ET,Ras-like,0.905774,TCGA-ET-A39I-01Z-00-DX1.1720ECA0-1C10-4796-96FA-261C32BA67F1 +TCGA-BJ-A2N8,BJ,Ras-like,0.143012,TCGA-BJ-A2N8-01Z-00-DX1.14ADDBE0-A61A-4572-8CD9-7768DEB9BF76 +TCGA-DJ-A1QG,DJ,Ras-like,0.83257,TCGA-DJ-A1QG-01Z-00-DX1.C51220F0-9C8A-46D5-99E9-F66B7BCB67DD +TCGA-BJ-A3F0,BJ,Ras-like,0.916076,TCGA-BJ-A3F0-01Z-00-DX1.BCFBD53F-BA83-4C96-9A2E-D99B3220E6B1 +TCGA-DE-A2OL,DE,Ras-like,0.548634,TCGA-DE-A2OL-01Z-00-DX1.A7E6EC42-A184-415A-9531-D1E83436FAE2 +TCGA-EM-A2CR,EM,Ras-like,0.571588,TCGA-EM-A2CR-01Z-00-DX1.EC73EBFF-0EC3-4153-9F9B-CDEF918C89DF +TCGA-DJ-A3UT,DJ,Ras-like,0.967352,TCGA-DJ-A3UT-01Z-00-DX1.781CB82B-24FE-468A-8087-C5CB4D71B94B +TCGA-DJ-A13W,DJ,Ras-like,0.819932,TCGA-DJ-A13W-01Z-00-DX1.02059A44-7DF1-420D-BA48-587D611F34F5 +TCGA-H2-A3RH,H2,Ras-like,0.380757,TCGA-H2-A3RH-01Z-00-DX1.444D8191-ADFF-4A59-BC33-B08F246594AE +TCGA-DJ-A2PP,DJ,Ras-like,0.935955,TCGA-DJ-A2PP-01Z-00-DX1.5BC2A5F2-1918-44E9-9544-1972974BA7BC +TCGA-BJ-A45D,BJ,Ras-like,0.602317,TCGA-BJ-A45D-01Z-00-DX1.671AA845-0931-4830-B837-5E121339A7AB +TCGA-EM-A2CP,EM,Ras-like,0.557796,TCGA-EM-A2CP-01Z-00-DX1.85D99C40-97C0-45AC-92AE-975F6FF847FE +TCGA-EM-A3OB,EM,Ras-like,0.894003,TCGA-EM-A3OB-01Z-00-DX1.CCD73BD9-C5C0-4429-9F74-701DC8B54860 +TCGA-ET-A3DS,ET,Ras-like,0.880515,TCGA-ET-A3DS-01Z-00-DX1.AEF2D46B-B024-4ED9-AFD6-5CE00EACF330 +TCGA-DJ-A3US,DJ,Ras-like,0.151637,TCGA-DJ-A3US-01Z-00-DX1.8863BFC5-E606-4C6B-B4A7-5E1525E126BE +TCGA-EM-A1YA,EM,Ras-like,0.87619,TCGA-EM-A1YA-01Z-00-DX1.6CEACBCA-9D05-4BF4-BFF2-725F869ABFA8 +TCGA-DJ-A1QL,DJ,Ras-like,0.908372,TCGA-DJ-A1QL-01Z-00-DX1.A4A16266-C153-47AB-A53F-40C015FF088D +TCGA-E8-A416,E8,Ras-like,0.39298,TCGA-E8-A416-01Z-00-DX1.F09B2F6F-B7BA-4E05-AB88-187080F35CC7 +TCGA-FY-A3NM,FY,Ras-like,0.860873,TCGA-FY-A3NM-01Z-00-DX1.5CC1829A-1C03-46A0-8D57-E68F5D31D7A6 +TCGA-BJ-A2P4,BJ,Ras-like,0.807677,TCGA-BJ-A2P4-01Z-00-DX1.897FBD3D-9988-4A04-863C-710928929D5A +TCGA-EL-A4JV,EL,Ras-like,0.971214,TCGA-EL-A4JV-01Z-00-DX1.A3DD7ACA-CA51-429D-A10E-3EB5062160E5 +TCGA-ET-A3DQ,ET,Ras-like,0.171704,TCGA-ET-A3DQ-01Z-00-DX1.A9648057-5AC1-4A77-9A82-846BF4D5804E +TCGA-FY-A3W9,FY,Ras-like,0.933162,TCGA-FY-A3W9-01Z-00-DX1.14465C4E-0D39-4742-91B8-FEB6F5B11807 +TCGA-BJ-A3PT,BJ,Ras-like,0.461707,TCGA-BJ-A3PT-01Z-00-DX1.A307F39F-AE85-42F4-B705-11AF06F391D9 +TCGA-BJ-A0ZE,BJ,Ras-like,0.858861,TCGA-BJ-A0ZE-01Z-00-DX1.8D51B2CD-3AB1-425D-AC80-C22B592A847F +TCGA-EL-A3H2,EL,Ras-like,0.865174,TCGA-EL-A3H2-01Z-00-DX1.D71C7B5D-1DCB-4343-AA27-F558E24933B1 +TCGA-FY-A2QD,FY,Ras-like,0.908047,TCGA-FY-A2QD-01Z-00-DX1.6E1E148E-4A94-4F4D-AFA4-3AE22473A960 +TCGA-EL-A3GO,EL,Ras-like,0.835283,TCGA-EL-A3GO-01Z-00-DX1.BD32876E-09B7-4457-98FD-F018E7E44DA0 +TCGA-BJ-A45G,BJ,Ras-like,0.805651,TCGA-BJ-A45G-01Z-00-DX1.044854D2-C2A7-4011-97BD-F42214B9031B +TCGA-BJ-A2N9,BJ,Ras-like,0.775214,TCGA-BJ-A2N9-01Z-00-DX1.CFCB1FA9-7890-4B1B-93AB-4066E160FBF5 +TCGA-FK-A3SD,FK,Ras-like,0.990031,TCGA-FK-A3SD-01Z-00-DX1.62417D7B-3565-49FD-964F-F3C0C4C01CA1 +TCGA-EM-A2OV,EM,Ras-like,0.89583,TCGA-EM-A2OV-01Z-00-DX1.9E2596F8-5380-443C-888E-270809144429 +TCGA-EM-A22J,EM,Ras-like,0.967266,TCGA-EM-A22J-01Z-00-DX1.F07E49C3-58CF-423F-A887-93AD0DBAFF61 +TCGA-BJ-A45F,BJ,Ras-like,0.928693,TCGA-BJ-A45F-01Z-00-DX1.33739120-5AAA-409E-98A1-6B0F72AFEF80 +TCGA-EM-A3FL,EM,Ras-like,0.93053,TCGA-EM-A3FL-01Z-00-DX1.8B0828AC-4EFF-4C9A-8B11-B249C97C6F59 +TCGA-EL-A4KI,EL,Ras-like,0.763466,TCGA-EL-A4KI-01Z-00-DX1.D906569B-AA2A-46FE-B094-AE05F8CDEF4C +TCGA-FY-A3NP,FY,Ras-like,0.966956,TCGA-FY-A3NP-01Z-00-DX1.AD8D020E-2B02-4A62-B9F2-1DBDCA8DB8A1 +TCGA-EM-A1YE,EM,Ras-like,0.916881,TCGA-EM-A1YE-01Z-00-DX1.63939AFE-8917-42DC-B393-94FEE3AA3C29 diff --git a/docs-source/pytorch_sphinx_theme/.circleci/config.yml b/docs-source/pytorch_sphinx_theme/.circleci/config.yml new file mode 100644 index 000000000..54ad90932 --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/.circleci/config.yml @@ -0,0 +1,39 @@ +version: 2 +jobs: + build: + docker: + - image: circleci/node:7.10 + + working_directory: ~/repo + + steps: + - add_ssh_keys: + fingerprints: + - "e0:f1:7b:8c:b1:4c:49:6f:b9:bd:af:84:6d:dd:93:cb" + - checkout + + - restore_cache: + keys: + - v1-dependencies-{{ checksum "package.json" }} + - v1-dependencies- + + - run: yarn install + + - save_cache: + paths: + - node_modules + key: v1-dependencies-{{ checksum "package.json" }} + - run: cp .circleci/mock.env.json .env.json + - run: grunt build + - run: git config credential.helper 'cache --timeout=120' + - run: git config user.email "ericnakagawa@gmail.com" + - run: git config user.name "CircleCI Bot" + - run: git add . + - run: git commit -m "Deploying theme build via CircleCI" + - run: git push -q git@github.com:pytorch/pytorch_sphinx_theme.git master + +workflows: + version: 2 + commit-and-build: + jobs: + - build diff --git a/docs-source/pytorch_sphinx_theme/.circleci/mock.env.json b/docs-source/pytorch_sphinx_theme/.circleci/mock.env.json new file mode 100644 index 000000000..273a22b55 --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/.circleci/mock.env.json @@ -0,0 +1,4 @@ +{ + "TUTORIALS_DIR": "../tutorials", + "DOCS_DIR": "../pytorch/docs/source" +} diff --git a/docs-source/pytorch_sphinx_theme/.gitattributes b/docs-source/pytorch_sphinx_theme/.gitattributes new file mode 100644 index 000000000..2be5a44cb --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/.gitattributes @@ -0,0 +1,15 @@ +# Document global line endings settings +# https://help.github.com/articles/dealing-with-line-endings/ +* text eol=lf + + +# Denote all files that are truly binary and should not be modified. +*.ai binary +*.jpg binary +*.otf binary +*.png binary +*.eot binary +*.ttf binary +*.whl binary +*.woff binary +*.woff2 binary diff --git a/docs-source/pytorch_sphinx_theme/.gitignore b/docs-source/pytorch_sphinx_theme/.gitignore new file mode 100644 index 000000000..ebcff98a3 --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/.gitignore @@ -0,0 +1,12 @@ +*build/ +*.DS_Store +*.map +node_modules +npm-debug.log +yarn-error.log +package-lock.json +__pycache__ +.env.json +dist/ +*.egg-info/ +scss/vendor/* diff --git a/docs-source/pytorch_sphinx_theme/.nvmrc b/docs-source/pytorch_sphinx_theme/.nvmrc new file mode 100644 index 000000000..a2f28f43b --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/.nvmrc @@ -0,0 +1 @@ +8.4.0 diff --git a/docs-source/pytorch_sphinx_theme/Gruntfile.js b/docs-source/pytorch_sphinx_theme/Gruntfile.js index 6b9101405..eeac919ed 100644 --- a/docs-source/pytorch_sphinx_theme/Gruntfile.js +++ b/docs-source/pytorch_sphinx_theme/Gruntfile.js @@ -39,8 +39,8 @@ module.exports = function(grunt) { { expand: true, flatten: true, - src: ['fonts/FreightSans/*'], - dest: 'pytorch_sphinx_theme/static/fonts/FreightSans', + src: ['fonts/IBMPlexSans/*'], + dest: 'pytorch_sphinx_theme/static/fonts/IBMPlexSans', filter: 'isFile' }, diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff deleted file mode 100755 index e31724842..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff2 deleted file mode 100755 index cec2dc94f..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold-italic.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff deleted file mode 100755 index de46625ed..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff2 deleted file mode 100755 index dc05cd82b..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-bold.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff deleted file mode 100755 index a50e5038a..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff2 deleted file mode 100755 index fe284db66..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book-italic.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff deleted file mode 100755 index 6ab8775f0..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff2 deleted file mode 100755 index 2688739f1..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-book.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff deleted file mode 100755 index beda58d4e..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff2 deleted file mode 100755 index e2fa0134b..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light-italic.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff deleted file mode 100755 index 226a0bf83..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff2 deleted file mode 100755 index 6d8ff2c04..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-light.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff deleted file mode 100644 index a42115d63..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff2 deleted file mode 100644 index 16a7713a4..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium-italic.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff deleted file mode 100755 index 5ea34539c..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff2 b/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff2 deleted file mode 100755 index c58b6a528..000000000 Binary files a/docs-source/pytorch_sphinx_theme/fonts/FreightSans/freight-sans-medium.woff2 and /dev/null differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 new file mode 100644 index 000000000..40c56d40f Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 new file mode 100644 index 000000000..32dea0b48 Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 new file mode 100644 index 000000000..c17e30da3 Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 new file mode 100644 index 000000000..277251d49 Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 new file mode 100644 index 000000000..6f54993aa Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 new file mode 100644 index 000000000..55c7f5fbc Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 new file mode 100644 index 000000000..b2e190faa Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 new file mode 100644 index 000000000..3149ce5d7 Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 new file mode 100644 index 000000000..f2669835a Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 new file mode 100644 index 000000000..e7eb57b77 Binary files /dev/null and b/docs-source/pytorch_sphinx_theme/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 differ diff --git a/docs-source/pytorch_sphinx_theme/images/slideflow-banner.png b/docs-source/pytorch_sphinx_theme/images/slideflow-banner.png index de3656990..30e3dac6c 100644 Binary files a/docs-source/pytorch_sphinx_theme/images/slideflow-banner.png and b/docs-source/pytorch_sphinx_theme/images/slideflow-banner.png differ diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/fonts.html b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/fonts.html index 7b32d32e8..3e38fec6c 100644 --- a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/fonts.html +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/fonts.html @@ -1,10 +1,10 @@ - - + + - - + + diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/layout.html b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/layout.html index 047c39114..5c25e2480 100644 --- a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/layout.html +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/layout.html @@ -78,14 +78,7 @@ {%- block extrahead %} {% if theme_analytics_id %} - - + {% endif %} {% endblock %} @@ -95,15 +88,7 @@ {% include "fonts.html" %} - - - +
@@ -286,7 +271,7 @@ {% endif %} {% endif %} - + @@ -338,7 +323,7 @@ diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/css/theme.css b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/css/theme.css index 46a5dabed..bfda1d3ae 100644 --- a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/css/theme.css +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/css/theme.css @@ -6321,6 +6321,9 @@ button.bg-dark:focus { height: 100%; border: 0; } +video { + width: 100%; +} .embed-responsive-21by9::before { padding-top: 42.8571428571%; @@ -9450,52 +9453,52 @@ a.text-dark:hover, a.text-dark:focus { } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 700; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-bold.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-bold.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Bold.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Bold.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 700; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-bold-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-bold-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-BoldItalic.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 500; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-medium.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-medium.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Regular.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Regular.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 500; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-medium-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-medium-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Italic.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Italic.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 100; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-light.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-light.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Light.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Light.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 100; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-light-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-light-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-LightItalic.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 400; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-book-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-book-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Italic.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-BookItalic.woff") format("woff"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 400; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-book.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-book.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Regular.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Book.woff") format("woff"); } @font-face { font-family: IBMPlexMono; @@ -9542,7 +9545,7 @@ html { } body { - font-family: FreightSans, Helvetica Neue, Helvetica, Arial, sans-serif; + font-family: IBMPlexSans, Helvetica Neue, Helvetica, Arial, sans-serif; } a:link, @@ -10995,7 +10998,7 @@ article.pytorch-article .admonition > p:last-of-type { color: #262626; } .pytorch-article div.sphx-glr-download a code, .pytorch-article div.sphx-glr-download a kbd, .pytorch-article div.sphx-glr-download a pre, .pytorch-article div.sphx-glr-download a samp, .pytorch-article div.sphx-glr-download a span.pre { - font-family: FreightSans, Helvetica Neue, Helvetica, Arial, sans-serif; + font-family: IBMPlexSans, Helvetica Neue, Helvetica, Arial, sans-serif; } .pytorch-article p.sphx-glr-script-out { diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/logo-icon.svg b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/logo-icon.svg index d9c2b2d6d..8dababf4e 100644 --- a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/logo-icon.svg +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/logo-icon.svg @@ -1 +1,27 @@ - \ No newline at end of file + + + + + + + + + + + + + + + + + + + + diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/slideflow-logo-name-large.png b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/slideflow-logo-name-large.png index 23bb07023..4d122173a 100644 Binary files a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/slideflow-logo-name-large.png and b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/images/slideflow-logo-name-large.png differ diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/theme.js b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/theme.js index 798a8fe3c..0da7237b5 100644 --- a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/theme.js +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/theme.js @@ -515,7 +515,6 @@ window.sideMenus = { "#pytorch-right-menu #pytorch-side-scroll-right \ > ul > li > a.reference.internal" ); - for (var i = 0; i < titleLinks.length; i++) { var link = titleLinks[i]; @@ -534,7 +533,6 @@ window.sideMenus = { var menuLinks = document.querySelectorAll( "#pytorch-right-menu ul li ul li a.reference.internal" ); - for (var i = 0; i < menuLinks.length; i++) { if ( menuLinks[i].nextElementSibling && @@ -633,16 +631,7 @@ window.sideMenus = { }, handleLeftMenu: function () { - var windowHeight = utilities.windowHeight(); - var topOfFooterRelativeToWindow = document.getElementById("docs-tutorials-resources").getBoundingClientRect().top; - - if (topOfFooterRelativeToWindow >= windowHeight) { - document.getElementById("pytorch-left-menu").style.height = "100%"; - } else { - var howManyPixelsOfTheFooterAreInTheWindow = windowHeight - topOfFooterRelativeToWindow; - var leftMenuDifference = howManyPixelsOfTheFooterAreInTheWindow; - document.getElementById("pytorch-left-menu").style.height = (windowHeight - leftMenuDifference) + "px"; - } + document.getElementById("pytorch-left-menu").style.height = "100%"; }, handleRightMenu: function() { diff --git a/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/vendor/jquery-3.6.3.min.js b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/vendor/jquery-3.6.3.min.js new file mode 100644 index 000000000..b5329e9ae --- /dev/null +++ b/docs-source/pytorch_sphinx_theme/pytorch_sphinx_theme/static/js/vendor/jquery-3.6.3.min.js @@ -0,0 +1,2 @@ +/*! jQuery v3.6.3 | (c) OpenJS Foundation and other contributors | jquery.org/license */ +!function(e,t){"use strict";"object"==typeof module&&"object"==typeof module.exports?module.exports=e.document?t(e,!0):function(e){if(!e.document)throw new Error("jQuery requires a window with a document");return t(e)}:t(e)}("undefined"!=typeof window?window:this,function(C,e){"use strict";var t=[],r=Object.getPrototypeOf,s=t.slice,g=t.flat?function(e){return t.flat.call(e)}:function(e){return t.concat.apply([],e)},u=t.push,i=t.indexOf,n={},o=n.toString,y=n.hasOwnProperty,a=y.toString,l=a.call(Object),v={},m=function(e){return"function"==typeof e&&"number"!=typeof e.nodeType&&"function"!=typeof e.item},x=function(e){return null!=e&&e===e.window},S=C.document,c={type:!0,src:!0,nonce:!0,noModule:!0};function b(e,t,n){var r,i,o=(n=n||S).createElement("script");if(o.text=e,t)for(r in c)(i=t[r]||t.getAttribute&&t.getAttribute(r))&&o.setAttribute(r,i);n.head.appendChild(o).parentNode.removeChild(o)}function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof e?n[o.call(e)]||"object":typeof e}var f="3.6.3",E=function(e,t){return new E.fn.init(e,t)};function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e)&&("array"===n||0===t||"number"==typeof t&&0+~]|"+M+")"+M+"*"),U=new RegExp(M+"|>"),X=new RegExp(F),V=new RegExp("^"+I+"$"),G={ID:new RegExp("^#("+I+")"),CLASS:new RegExp("^\\.("+I+")"),TAG:new RegExp("^("+I+"|[*])"),ATTR:new RegExp("^"+W),PSEUDO:new RegExp("^"+F),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\\)|)","i"),bool:new RegExp("^(?:"+R+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+M+"*((?:-\\d)?\\d*)"+M+"*\\)|)(?=[^-]|$)","i")},Y=/HTML$/i,Q=/^(?:input|select|textarea|button)$/i,J=/^h\d$/i,K=/^[^{]+\{\s*\[native \w/,Z=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ee=/[+~]/,te=new RegExp("\\\\[\\da-fA-F]{1,6}"+M+"?|\\\\([^\\r\\n\\f])","g"),ne=function(e,t){var n="0x"+e.slice(1)-65536;return t||(n<0?String.fromCharCode(n+65536):String.fromCharCode(n>>10|55296,1023&n|56320))},re=/([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g,ie=function(e,t){return t?"\0"===e?"\ufffd":e.slice(0,-1)+"\\"+e.charCodeAt(e.length-1).toString(16)+" ":"\\"+e},oe=function(){T()},ae=be(function(e){return!0===e.disabled&&"fieldset"===e.nodeName.toLowerCase()},{dir:"parentNode",next:"legend"});try{H.apply(t=O.call(p.childNodes),p.childNodes),t[p.childNodes.length].nodeType}catch(e){H={apply:t.length?function(e,t){L.apply(e,O.call(t))}:function(e,t){var n=e.length,r=0;while(e[n++]=t[r++]);e.length=n-1}}}function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeType:9;if(n=n||[],"string"!=typeof t||!t||1!==p&&9!==p&&11!==p)return n;if(!r&&(T(e),e=e||C,S)){if(11!==p&&(u=Z.exec(t)))if(i=u[1]){if(9===p){if(!(a=e.getElementById(i)))return n;if(a.id===i)return n.push(a),n}else if(f&&(a=f.getElementById(i))&&v(e,a)&&a.id===i)return n.push(a),n}else{if(u[2])return H.apply(n,e.getElementsByTagName(t)),n;if((i=u[3])&&d.getElementsByClassName&&e.getElementsByClassName)return H.apply(n,e.getElementsByClassName(i)),n}if(d.qsa&&!N[t+" "]&&(!y||!y.test(t))&&(1!==p||"object"!==e.nodeName.toLowerCase())){if(c=t,f=e,1===p&&(U.test(t)||z.test(t))){(f=ee.test(t)&&ve(e.parentNode)||e)===e&&d.scope||((s=e.getAttribute("id"))?s=s.replace(re,ie):e.setAttribute("id",s=E)),o=(l=h(t)).length;while(o--)l[o]=(s?"#"+s:":scope")+" "+xe(l[o]);c=l.join(",")}try{if(d.cssSupportsSelector&&!CSS.supports("selector(:is("+c+"))"))throw new Error;return H.apply(n,f.querySelectorAll(c)),n}catch(e){N(t,!0)}finally{s===E&&e.removeAttribute("id")}}}return g(t.replace(B,"$1"),e,n,r)}function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cacheLength&&delete e[r.shift()],e[t+" "]=n}}function le(e){return e[E]=!0,e}function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(e){return!1}finally{t.parentNode&&t.parentNode.removeChild(t),t=null}}function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[r]]=t}function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourceIndex-t.sourceIndex;if(r)return r;if(n)while(n=n.nextSibling)if(n===t)return-1;return e?1:-1}function de(t){return function(e){return"input"===e.nodeName.toLowerCase()&&e.type===t}}function he(n){return function(e){var t=e.nodeName.toLowerCase();return("input"===t||"button"===t)&&e.type===n}}function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e.disabled?"label"in e?"label"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&ae(e)===t:e.disabled===t:"label"in e&&e.disabled===t}}function ye(a){return le(function(o){return o=+o,le(function(e,t){var n,r=a([],e.length,o),i=r.length;while(i--)e[n=r[i]]&&(e[n]=!(t[n]=e[n]))})})}function ve(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}for(e in d=se.support={},i=se.isXML=function(e){var t=e&&e.namespaceURI,n=e&&(e.ownerDocument||e).documentElement;return!Y.test(t||n&&n.nodeName||"HTML")},T=se.setDocument=function(e){var t,n,r=e?e.ownerDocument||e:p;return r!=C&&9===r.nodeType&&r.documentElement&&(a=(C=r).documentElement,S=!i(C),p!=C&&(n=C.defaultView)&&n.top!==n&&(n.addEventListener?n.addEventListener("unload",oe,!1):n.attachEvent&&n.attachEvent("onunload",oe)),d.scope=ce(function(e){return a.appendChild(e).appendChild(C.createElement("div")),"undefined"!=typeof e.querySelectorAll&&!e.querySelectorAll(":scope fieldset div").length}),d.cssSupportsSelector=ce(function(){return CSS.supports("selector(*)")&&C.querySelectorAll(":is(:jqfake)")&&!CSS.supports("selector(:is(*,:jqfake))")}),d.attributes=ce(function(e){return e.className="i",!e.getAttribute("className")}),d.getElementsByTagName=ce(function(e){return e.appendChild(C.createComment("")),!e.getElementsByTagName("*").length}),d.getElementsByClassName=K.test(C.getElementsByClassName),d.getById=ce(function(e){return a.appendChild(e).id=E,!C.getElementsByName||!C.getElementsByName(E).length}),d.getById?(b.filter.ID=function(e){var t=e.replace(te,ne);return function(e){return e.getAttribute("id")===t}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&S){var n=t.getElementById(e);return n?[n]:[]}}):(b.filter.ID=function(e){var n=e.replace(te,ne);return function(e){var t="undefined"!=typeof e.getAttributeNode&&e.getAttributeNode("id");return t&&t.value===n}},b.find.ID=function(e,t){if("undefined"!=typeof t.getElementById&&S){var n,r,i,o=t.getElementById(e);if(o){if((n=o.getAttributeNode("id"))&&n.value===e)return[o];i=t.getElementsByName(e),r=0;while(o=i[r++])if((n=o.getAttributeNode("id"))&&n.value===e)return[o]}return[]}}),b.find.TAG=d.getElementsByTagName?function(e,t){return"undefined"!=typeof t.getElementsByTagName?t.getElementsByTagName(e):d.qsa?t.querySelectorAll(e):void 0}:function(e,t){var n,r=[],i=0,o=t.getElementsByTagName(e);if("*"===e){while(n=o[i++])1===n.nodeType&&r.push(n);return r}return o},b.find.CLASS=d.getElementsByClassName&&function(e,t){if("undefined"!=typeof t.getElementsByClassName&&S)return t.getElementsByClassName(e)},s=[],y=[],(d.qsa=K.test(C.querySelectorAll))&&(ce(function(e){var t;a.appendChild(e).innerHTML="",e.querySelectorAll("[msallowcapture^='']").length&&y.push("[*^$]="+M+"*(?:''|\"\")"),e.querySelectorAll("[selected]").length||y.push("\\["+M+"*(?:value|"+R+")"),e.querySelectorAll("[id~="+E+"-]").length||y.push("~="),(t=C.createElement("input")).setAttribute("name",""),e.appendChild(t),e.querySelectorAll("[name='']").length||y.push("\\["+M+"*name"+M+"*="+M+"*(?:''|\"\")"),e.querySelectorAll(":checked").length||y.push(":checked"),e.querySelectorAll("a#"+E+"+*").length||y.push(".#.+[+~]"),e.querySelectorAll("\\\f"),y.push("[\\r\\n\\f]")}),ce(function(e){e.innerHTML="";var t=C.createElement("input");t.setAttribute("type","hidden"),e.appendChild(t).setAttribute("name","D"),e.querySelectorAll("[name=d]").length&&y.push("name"+M+"*[*^$|!~]?="),2!==e.querySelectorAll(":enabled").length&&y.push(":enabled",":disabled"),a.appendChild(e).disabled=!0,2!==e.querySelectorAll(":disabled").length&&y.push(":enabled",":disabled"),e.querySelectorAll("*,:x"),y.push(",.*:")})),(d.matchesSelector=K.test(c=a.matches||a.webkitMatchesSelector||a.mozMatchesSelector||a.oMatchesSelector||a.msMatchesSelector))&&ce(function(e){d.disconnectedMatch=c.call(e,"*"),c.call(e,"[s!='']:x"),s.push("!=",F)}),d.cssSupportsSelector||y.push(":has"),y=y.length&&new RegExp(y.join("|")),s=s.length&&new RegExp(s.join("|")),t=K.test(a.compareDocumentPosition),v=t||K.test(a.contains)?function(e,t){var n=9===e.nodeType&&e.documentElement||e,r=t&&t.parentNode;return e===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):e.compareDocumentPosition&&16&e.compareDocumentPosition(r)))}:function(e,t){if(t)while(t=t.parentNode)if(t===e)return!0;return!1},j=t?function(e,t){if(e===t)return l=!0,0;var n=!e.compareDocumentPosition-!t.compareDocumentPosition;return n||(1&(n=(e.ownerDocument||e)==(t.ownerDocument||t)?e.compareDocumentPosition(t):1)||!d.sortDetached&&t.compareDocumentPosition(e)===n?e==C||e.ownerDocument==p&&v(p,e)?-1:t==C||t.ownerDocument==p&&v(p,t)?1:u?P(u,e)-P(u,t):0:4&n?-1:1)}:function(e,t){if(e===t)return l=!0,0;var n,r=0,i=e.parentNode,o=t.parentNode,a=[e],s=[t];if(!i||!o)return e==C?-1:t==C?1:i?-1:o?1:u?P(u,e)-P(u,t):0;if(i===o)return pe(e,t);n=e;while(n=n.parentNode)a.unshift(n);n=t;while(n=n.parentNode)s.unshift(n);while(a[r]===s[r])r++;return r?pe(a[r],s[r]):a[r]==p?-1:s[r]==p?1:0}),C},se.matches=function(e,t){return se(e,null,null,t)},se.matchesSelector=function(e,t){if(T(e),d.matchesSelector&&S&&!N[t+" "]&&(!s||!s.test(t))&&(!y||!y.test(t)))try{var n=c.call(e,t);if(n||d.disconnectedMatch||e.document&&11!==e.document.nodeType)return n}catch(e){N(t,!0)}return 0":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(e){return e[1]=e[1].replace(te,ne),e[3]=(e[3]||e[4]||e[5]||"").replace(te,ne),"~="===e[2]&&(e[3]=" "+e[3]+" "),e.slice(0,4)},CHILD:function(e){return e[1]=e[1].toLowerCase(),"nth"===e[1].slice(0,3)?(e[3]||se.error(e[0]),e[4]=+(e[4]?e[5]+(e[6]||1):2*("even"===e[3]||"odd"===e[3])),e[5]=+(e[7]+e[8]||"odd"===e[3])):e[3]&&se.error(e[0]),e},PSEUDO:function(e){var t,n=!e[6]&&e[2];return G.CHILD.test(e[0])?null:(e[3]?e[2]=e[4]||e[5]||"":n&&X.test(n)&&(t=h(n,!0))&&(t=n.indexOf(")",n.length-t)-n.length)&&(e[0]=e[0].slice(0,t),e[2]=n.slice(0,t)),e.slice(0,3))}},filter:{TAG:function(e){var t=e.replace(te,ne).toLowerCase();return"*"===e?function(){return!0}:function(e){return e.nodeName&&e.nodeName.toLowerCase()===t}},CLASS:function(e){var t=m[e+" "];return t||(t=new RegExp("(^|"+M+")"+e+"("+M+"|$)"))&&m(e,function(e){return t.test("string"==typeof e.className&&e.className||"undefined"!=typeof e.getAttribute&&e.getAttribute("class")||"")})},ATTR:function(n,r,i){return function(e){var t=se.attr(e,n);return null==t?"!="===r:!r||(t+="","="===r?t===i:"!="===r?t!==i:"^="===r?i&&0===t.indexOf(i):"*="===r?i&&-1:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i;function j(e,n,r){return m(n)?E.grep(e,function(e,t){return!!n.call(e,t,e)!==r}):n.nodeType?E.grep(e,function(e){return e===n!==r}):"string"!=typeof n?E.grep(e,function(e){return-1)[^>]*|#([\w-]+))$/;(E.fn.init=function(e,t,n){var r,i;if(!e)return this;if(n=n||D,"string"==typeof e){if(!(r="<"===e[0]&&">"===e[e.length-1]&&3<=e.length?[null,e,null]:q.exec(e))||!r[1]&&t)return!t||t.jquery?(t||n).find(e):this.constructor(t).find(e);if(r[1]){if(t=t instanceof E?t[0]:t,E.merge(this,E.parseHTML(r[1],t&&t.nodeType?t.ownerDocument||t:S,!0)),N.test(r[1])&&E.isPlainObject(t))for(r in t)m(this[r])?this[r](t[r]):this.attr(r,t[r]);return this}return(i=S.getElementById(r[2]))&&(this[0]=i,this.length=1),this}return e.nodeType?(this[0]=e,this.length=1,this):m(e)?void 0!==n.ready?n.ready(e):e(E):E.makeArray(e,this)}).prototype=E.fn,D=E(S);var L=/^(?:parents|prev(?:Until|All))/,H={children:!0,contents:!0,next:!0,prev:!0};function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}E.fn.extend({has:function(e){var t=E(e,this),n=t.length;return this.filter(function(){for(var e=0;e\x20\t\r\n\f]*)/i,he=/^$|^module$|\/(?:java|ecma)script/i;ce=S.createDocumentFragment().appendChild(S.createElement("div")),(fe=S.createElement("input")).setAttribute("type","radio"),fe.setAttribute("checked","checked"),fe.setAttribute("name","t"),ce.appendChild(fe),v.checkClone=ce.cloneNode(!0).cloneNode(!0).lastChild.checked,ce.innerHTML="",v.noCloneChecked=!!ce.cloneNode(!0).lastChild.defaultValue,ce.innerHTML="",v.option=!!ce.lastChild;var ge={thead:[1,"","
"],col:[2,"","
"],tr:[2,"","
"],td:[3,"","
"],_default:[0,"",""]};function ye(e,t){var n;return n="undefined"!=typeof e.getElementsByTagName?e.getElementsByTagName(t||"*"):"undefined"!=typeof e.querySelectorAll?e.querySelectorAll(t||"*"):[],void 0===t||t&&A(e,t)?E.merge([e],n):n}function ve(e,t){for(var n=0,r=e.length;n",""]);var me=/<|&#?\w+;/;function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),p=[],d=0,h=e.length;d\s*$/g;function je(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"tr")&&E(e).children("tbody")[0]||e}function De(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}function qe(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.slice(5):e.removeAttribute("type"),e}function Le(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=Y.get(e).events))for(i in Y.remove(t,"handle events"),s)for(n=0,r=s[i].length;n").attr(n.scriptAttrs||{}).prop({charset:n.scriptCharset,src:n.url}).on("load error",i=function(e){r.remove(),i=null,e&&t("error"===e.type?404:200,e.type)}),S.head.appendChild(r[0])},abort:function(){i&&i()}}});var Ut,Xt=[],Vt=/(=)\?(?=&|$)|\?\?/;E.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var e=Xt.pop()||E.expando+"_"+Ct.guid++;return this[e]=!0,e}}),E.ajaxPrefilter("json jsonp",function(e,t,n){var r,i,o,a=!1!==e.jsonp&&(Vt.test(e.url)?"url":"string"==typeof e.data&&0===(e.contentType||"").indexOf("application/x-www-form-urlencoded")&&Vt.test(e.data)&&"data");if(a||"jsonp"===e.dataTypes[0])return r=e.jsonpCallback=m(e.jsonpCallback)?e.jsonpCallback():e.jsonpCallback,a?e[a]=e[a].replace(Vt,"$1"+r):!1!==e.jsonp&&(e.url+=(St.test(e.url)?"&":"?")+e.jsonp+"="+r),e.converters["script json"]=function(){return o||E.error(r+" was not called"),o[0]},e.dataTypes[0]="json",i=C[r],C[r]=function(){o=arguments},n.always(function(){void 0===i?E(C).removeProp(r):C[r]=i,e[r]&&(e.jsonpCallback=t.jsonpCallback,Xt.push(r)),o&&m(i)&&i(o[0]),o=i=void 0}),"script"}),v.createHTMLDocument=((Ut=S.implementation.createHTMLDocument("").body).innerHTML="
",2===Ut.childNodes.length),E.parseHTML=function(e,t,n){return"string"!=typeof e?[]:("boolean"==typeof t&&(n=t,t=!1),t||(v.createHTMLDocument?((r=(t=S.implementation.createHTMLDocument("")).createElement("base")).href=S.location.href,t.head.appendChild(r)):t=S),o=!n&&[],(i=N.exec(e))?[t.createElement(i[1])]:(i=xe([e],t,o),o&&o.length&&E(o).remove(),E.merge([],i.childNodes)));var r,i,o},E.fn.load=function(e,t,n){var r,i,o,a=this,s=e.indexOf(" ");return-1").append(E.parseHTML(e)).find(r):e)}).always(n&&function(e,t){a.each(function(){n.apply(this,o||[e.responseText,t,e])})}),this},E.expr.pseudos.animated=function(t){return E.grep(E.timers,function(e){return t===e.elem}).length},E.offset={setOffset:function(e,t,n){var r,i,o,a,s,u,l=E.css(e,"position"),c=E(e),f={};"static"===l&&(e.style.position="relative"),s=c.offset(),o=E.css(e,"top"),u=E.css(e,"left"),("absolute"===l||"fixed"===l)&&-1<(o+u).indexOf("auto")?(a=(r=c.position()).top,i=r.left):(a=parseFloat(o)||0,i=parseFloat(u)||0),m(t)&&(t=t.call(e,n,E.extend({},s))),null!=t.top&&(f.top=t.top-s.top+a),null!=t.left&&(f.left=t.left-s.left+i),"using"in t?t.using.call(e,f):c.css(f)}},E.fn.extend({offset:function(t){if(arguments.length)return void 0===t?this:this.each(function(e){E.offset.setOffset(this,t,e)});var e,n,r=this[0];return r?r.getClientRects().length?(e=r.getBoundingClientRect(),n=r.ownerDocument.defaultView,{top:e.top+n.pageYOffset,left:e.left+n.pageXOffset}):{top:0,left:0}:void 0},position:function(){if(this[0]){var e,t,n,r=this[0],i={top:0,left:0};if("fixed"===E.css(r,"position"))t=r.getBoundingClientRect();else{t=this.offset(),n=r.ownerDocument,e=r.offsetParent||n.documentElement;while(e&&(e===n.body||e===n.documentElement)&&"static"===E.css(e,"position"))e=e.parentNode;e&&e!==r&&1===e.nodeType&&((i=E(e).offset()).top+=E.css(e,"borderTopWidth",!0),i.left+=E.css(e,"borderLeftWidth",!0))}return{top:t.top-i.top-E.css(r,"marginTop",!0),left:t.left-i.left-E.css(r,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var e=this.offsetParent;while(e&&"static"===E.css(e,"position"))e=e.offsetParent;return e||re})}}),E.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(t,i){var o="pageYOffset"===i;E.fn[t]=function(e){return B(this,function(e,t,n){var r;if(x(e)?r=e:9===e.nodeType&&(r=e.defaultView),void 0===n)return r?r[i]:e[t];r?r.scrollTo(o?r.pageXOffset:n,o?n:r.pageYOffset):e[t]=n},t,e,arguments.length)}}),E.each(["top","left"],function(e,n){E.cssHooks[n]=_e(v.pixelPosition,function(e,t){if(t)return t=Be(e,n),Pe.test(t)?E(e).position()[n]+"px":t})}),E.each({Height:"height",Width:"width"},function(a,s){E.each({padding:"inner"+a,content:s,"":"outer"+a},function(r,o){E.fn[o]=function(e,t){var n=arguments.length&&(r||"boolean"!=typeof e),i=r||(!0===e||!0===t?"margin":"border");return B(this,function(e,t,n){var r;return x(e)?0===o.indexOf("outer")?e["inner"+a]:e.document.documentElement["client"+a]:9===e.nodeType?(r=e.documentElement,Math.max(e.body["scroll"+a],r["scroll"+a],e.body["offset"+a],r["offset"+a],r["client"+a])):void 0===n?E.css(e,t,i):E.style(e,t,n,i)},s,n?e:void 0,n)}})}),E.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(e,t){E.fn[t]=function(e){return this.on(t,e)}}),E.fn.extend({bind:function(e,t,n){return this.on(e,null,t,n)},unbind:function(e,t){return this.off(e,null,t)},delegate:function(e,t,n,r){return this.on(t,e,n,r)},undelegate:function(e,t,n){return 1===arguments.length?this.off(e,"**"):this.off(t,e||"**",n)},hover:function(e,t){return this.mouseenter(e).mouseleave(t||e)}}),E.each("blur focus focusin focusout resize scroll click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup contextmenu".split(" "),function(e,n){E.fn[n]=function(e,t){return 0 1). If desired, the core model is initialized with pre-trained weights, either from ImageNet or from a pre-trained model specified by the user. - -The model core is then optionally connected to an additional set of fully-connected hidden layers as specified in the hyperparameter options, which then connects to outputs with softmax (categorical models) or linear (linear models) activations. - -.. _balancing: - -A Note on Input Balancing -************************* - -When training, it is important to consider whether category-level balancing should be performed on your input in order to reduce bias against sparse categories. There is no established best practice for input balancing when training on histology images; the balancing method you choose to use is up to you. - -Suppose you have five slides, labeled A through E. Slides A and B belong to category 1, while C, D, E belong to category 2. Let's suppose tumors in all the slides are roughly the same physical size, except for B which is three times as large. - -You perform tile extraction, and all the patients except B produce roughly the same number of image tiles. The training optimizer is ready for the next batch of images. Let’s say the batch size is 32. How does it select the next 32 images? - -If **tile-level balancing** ("tile") is used, tiles will be selected randomly. Because slide B has so many more tiles than the other slides, B will be over-represented in the batch. This means that the model will inherently learn a bias towards patient B. If patients like patient B are truly of greater prevalence in the real-world population, this is fine; the model is learning an appropriate bias. Otherwise, it is learning a bias which will hurt the model’s generalizability, which will result in poor performance on our test set. - -If **patient-based balancing** ("patient") is used, the input stream will balance tiles in a given batch across the patients. Now the model has no bias towards any given patient. However, you’ll notice that category 1 (patients A and B) only has 13 tiles, whereas category 2 (patients C, D, and E) has 19 tiles. With this type of balancing, models will learn bias towards categories with more patients (in this case category 2). - -If **category-based balancing** ("category") is used, the input stream balances tiles based on the category. There are now an equal number of tiles from category 1 and category 2, 16 from both. We are still unbalanced within category 1, as slide B has more tiles than slide A. However, because this unbalance is not occurring between categories, which is what the algorithm is training on, the bias effect is less prominent. The algorithm will expect category 1 to look more like slide B than slide A, but it is not clear whether this is avoidable. Unless you dispose of excess tiles, your model will be exposed to more tiles from B than from A, whether it happens on a per-batch basis or throughout its training across epochs. diff --git a/docs-source/source/att_heatmap.jpg b/docs-source/source/att_heatmap.jpg new file mode 100644 index 000000000..37abb737d Binary files /dev/null and b/docs-source/source/att_heatmap.jpg differ diff --git a/docs-source/source/balancing_case.png b/docs-source/source/balancing_case.png index e5e8a4c12..a0ef4fa7c 100644 Binary files a/docs-source/source/balancing_case.png and b/docs-source/source/balancing_case.png differ diff --git a/docs-source/source/balancing_category.png b/docs-source/source/balancing_category.png index 76a08e99e..95b8f10e3 100644 Binary files a/docs-source/source/balancing_category.png and b/docs-source/source/balancing_category.png differ diff --git a/docs-source/source/balancing_extract.png b/docs-source/source/balancing_extract.png index 9feda9128..8b0d00bde 100644 Binary files a/docs-source/source/balancing_extract.png and b/docs-source/source/balancing_extract.png differ diff --git a/docs-source/source/balancing_none.png b/docs-source/source/balancing_none.png index 1b052cf91..772642ef0 100644 Binary files a/docs-source/source/balancing_none.png and b/docs-source/source/balancing_none.png differ diff --git a/docs-source/source/biscuit.rst b/docs-source/source/biscuit.rst new file mode 100644 index 000000000..3d1ddeea3 --- /dev/null +++ b/docs-source/source/biscuit.rst @@ -0,0 +1,66 @@ +.. currentmodule:: slideflow.biscuit + +slideflow.biscuit +================= + +This module contains an official implementation of `BISCUIT `__, an uncertainty quantification and confidence thresholding algorithm for whole-slide images. The original implementation, which includes instructions for reproducing experimental results reported in the manuscript, is available on `GitHub `__. + +This module is requires the ``slideflow-noncommercial`` package, which can be installed with: + +.. code-block:: bash + + pip install slideflow-noncommercial + +See :ref:`uncertainty` for more information. + +.. autofunction:: find_cv +.. autofunction:: get_model_results + +biscuit.Experiment +****************** +.. autoclass:: Experiment +.. autofunction:: slideflow.biscuit.Experiment.display +.. autofunction:: slideflow.biscuit.Experiment.plot_uq_calibration +.. autofunction:: slideflow.biscuit.Experiment.results +.. autofunction:: slideflow.biscuit.Experiment.thresholds_from_nested_cv +.. autofunction:: slideflow.biscuit.Experiment.train +.. autofunction:: slideflow.biscuit.Experiment.train_nested_cv + +biscuit.hp +********** + +.. autofunction:: slideflow.biscuit.hp.nature2022 + +biscuit.threshold +***************** +.. autofunction:: slideflow.biscuit.threshold.apply +.. autofunction:: slideflow.biscuit.threshold.detect +.. autofunction:: slideflow.biscuit.threshold.from_cv +.. autofunction:: slideflow.biscuit.threshold.plot_uncertainty +.. autofunction:: slideflow.biscuit.threshold.process_group_predictions +.. autofunction:: slideflow.biscuit.threshold.process_tile_predictions + +biscuit.utils +************* + +.. autofunction:: slideflow.biscuit.utils.auc +.. autofunction:: slideflow.biscuit.utils.auc_and_threshold +.. autofunction:: slideflow.biscuit.utils.df_from_cv +.. autofunction:: slideflow.biscuit.utils.eval_exists +.. autofunction:: slideflow.biscuit.utils.find_cv +.. autofunction:: slideflow.biscuit.utils.find_cv_early_stop +.. autofunction:: slideflow.biscuit.utils.find_eval +.. autofunction:: slideflow.biscuit.utils.find_model +.. autofunction:: slideflow.biscuit.utils.get_model_results +.. autofunction:: slideflow.biscuit.utils.get_eval_results +.. autofunction:: slideflow.biscuit.utils.model_exists +.. autofunction:: slideflow.biscuit.utils.prediction_metrics +.. autofunction:: slideflow.biscuit.utils.read_group_predictions +.. autofunction:: slideflow.biscuit.utils.truncate_colormap + +biscuit.delong +************** + +.. autofunction:: slideflow.biscuit.delong.fastDeLong +.. autofunction:: slideflow.biscuit.delong.delong_roc_variance +.. autofunction:: slideflow.biscuit.delong.delong_roc_test diff --git a/docs-source/source/blur.png b/docs-source/source/blur.png new file mode 100644 index 000000000..2424eeb90 --- /dev/null +++ b/docs-source/source/blur.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ecbccd7c14cada309e67ae92efe4e36a39cfb7073723550c2946529a3f81fff +size 567214 diff --git a/docs-source/source/boxplot_example.png b/docs-source/source/boxplot_example.png index fa8eb3f19..b0ee036ef 100644 Binary files a/docs-source/source/boxplot_example.png and b/docs-source/source/boxplot_example.png differ diff --git a/docs-source/source/cell_masked.png b/docs-source/source/cell_masked.png new file mode 100644 index 000000000..b015a17b3 --- /dev/null +++ b/docs-source/source/cell_masked.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:296adb68984a3739d11e595fa6166c147db74108668d32a8b0bae7778b39055b +size 3777 diff --git a/docs-source/source/cell_segmentation.png b/docs-source/source/cell_segmentation.png new file mode 100644 index 000000000..ae47041fd --- /dev/null +++ b/docs-source/source/cell_segmentation.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:847a121a426953788a0953a6624b97b81324e54ce584d0e1c7fd79abfa9ec463 +size 71302 diff --git a/docs-source/source/cell_unmasked.png b/docs-source/source/cell_unmasked.png new file mode 100644 index 000000000..6752617fe --- /dev/null +++ b/docs-source/source/cell_unmasked.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f5249151446e398ba0a62d98825343e31400028bed0054ee21871314d191edb +size 12341 diff --git a/docs-source/source/cellseg.rst b/docs-source/source/cellseg.rst new file mode 100644 index 000000000..b10794725 --- /dev/null +++ b/docs-source/source/cellseg.rst @@ -0,0 +1,292 @@ +.. currentmodule:: slideflow.cellseg + +.. _cellseg: + +Cell Segmentation +================= + +Many tasks in digital pathology rely on analysis of cellular features, as opposed to higher-level architectural features. Slideflow supports whole-slide analysis of cellular features with a cell detection and segmentation pipeline based on `Cellpose `_. To start, ensure ``cellpose`` has been installed via pip: + +.. code-block:: bash + + pip install cellpose + +Approach +******** + +.. figure:: cell_segmentation.png + +The general approach for cell detection and segmentation in Slideflow is illustrated above, and will be discussed in the following sections. In short, the general approach is to tune the cell segmentation parameters on a single slide, use these parameters to detect cells in all of your slides, then extract cell images at these locations. + +Slideflow Studio +***************** + +Cellpose models have several configurable parameters which will affect the quality of your segmentation masks, namely the **pretrained model** and **cell diameter**. The best way to determine the optimal parameters to use for your dataset is through interactive visualization using :ref:`Slideflow Studio `. + +Use Cellpose-based cell segmentation in Slideflow Studio by :ref:`enabling the extension `, or start Studio with the ``--cellpose`` flag: + +.. code-block:: bash + + python -m slideflow.studio --cellpose + +Control panel +------------- + +Open the Cell Segmentation section in the control panel to access the segmentation controls. + +.. figure:: cellseg_workbench_panel.png + +The **Model & Cell Diameter** subsection is used to customize the segmentation model (defaults to +'cyto2') and cell diameter (defaults to 10 microns). Selecting "Auto-detect diameter" then +clicking "Preview" will perform cell segmentation on the portion of the slide currently in view. Once complete, the diameter text box will be updated with the detected cell diameter. Any `user-trained models `_ will be listed in the model dropdown selection. + +Viewing cell segmentations +-------------------------- + +.. figure:: cellseg_workbench_masks.png + +The **View Controls** subsection provides options for customizing how cell segmentations are displayed. By default, cell segmentation masks are shown in cyan on a black background. The black +background can be removed by unchecking "Black BG". You can add a green dot at each cell's detected centroid by selecting the "Centroid option." The "Alpha" slider controls transparency for the mask overlay. + +You can also choose to view the segmentation masks as outlines. The "Outline" button will +convert any masks currently in view to outlines, allowing you to more easily see how the +masks match cells visible on the slide. + +.. figure:: cellseg_workbench_outlines.png + +Finally, the "gradXY" option will show the flow gradients calculated during cell segmentation. + +.. figure:: cellseg_workbench_flows.png + +Preparing WSI segmentation +-------------------------- + +Once you are satisifed with a chosen model and cell diameter, set the cell diameter to a +manual value in microns. Once the cell diameter has been set, the middle control panel will +activate, allowing you to perform whole-slide segmentation. + +The **Otsu threshold** option will perform strict Otsu's thresholding on the whole slide image, +only performing cell segmentation in non-background areas (reducing computational time). +You can preview the Otsu's thresholding algorithm in the :ref:`Slide section `. This option is disabled by default, as Otsu's thresholding does not +work well for all slides (particularly cytology slides). + +The **Save flows** option saves gradients during cell segmentation, allowing you to generate +visualizations as shown with the **gradXY** option above. This is disabled by default, as +calculation requires high RAM usage and may not be practical on all systems. + +.. list-table:: + :widths: 60 40 + + * - The **Advanced** subsection provides additional options for controlling the cell segmentation process. + + **Window** controls the window size during cell segmentation; cell segmentation is performed + on images of this pixel size and then stitched together. The **Tile** option permits further sub- + tiling of each window, reducing GPU and CPU memory utilization. + + **Downscale** will scale down the final generated cell segmentation mask, reducing memory + utilization (both RAM and disk). **Enable spawn workers** enables a multiprocessing technique that improves cell segmentation speed at the cost of higher RAM usage. + + - .. image:: cellseg_workbench_advanced.png + :width: 245 + :align: right + +Running WSI segmentation +------------------------ + +Once you are satisifed with the settings, whole-slide cell segmentation can be initialized by +clicking **Segment**. You will see a notification in the bottom-right corner of the screen when +segmentation is complete. In the meantime, a progress bar will be shown in the terminal +along with ETA. + +Exporting results +----------------- + +Once segmentation is complete, masks can be saved to disk for later use with **Export**. +Masks are saved in \*.zip format, and can be loaded in Studio with drag-and-drop. + +Segmenting cells +**************** + +Single slide segmentation +------------------------- + +Once the segmentation parameters have been determined, you can run segmentation for a single slide using :func:`slideflow.cellseg.segment_slide`. + +.. code-block:: + + import slideflow as sf + from slideflow.cellseg import segment_slide + + segmentation = segment_slide( + '.../slide.svs', + model='cyto2', + diam_um=10, + ... + ) + segmentation.save('...masks.zip') + +Project-wide segmentation +------------------------- + +Cell segmentation can also be performed automatically for all slides in a Slideflow project. +Cell segmentation masks (and associated cell centroids) are calculated for all slides in the project using :meth:`slideflow.Project.cell_segmentation`. + +.. code-block:: + + import slideflow as sf + + # Load a slideflow project + P = sf.Project(...) + + # Perform cell segmentation + P.cell_segmentation( + model='cyto2', + diam_um=10 + ) + +Relevant arguments for this function include: + +- ``model`` : Cell segmentation model. All cellpose models are supported, including 'cyto', + 'cyto2', 'nuclei', and more. +- ``diam_um`` : Cell diameter, in microns. +- ``buffer`` : Path to a buffer, significantly speeds up segmentation if running from a HDD + (same as P.extract_tiles()) +- ``window_size`` : Integer. Defaults to 256. Increasing this to 512 will make things slightly + faster, but will use a bit more GPU memory. +- ``downscale`` : Factor by which to downscale the masks, to save memory. Defaults to 1 + (no downscaling, full resolution). Downscale of 2 is a nice balance between memory + size and fidelity. + +Depending on the size of the slide, this may take between 5-25 minutes per slide. + +Masks will be saved in the project subfolder ``masks/`` . As described above, +these masks can be loaded in Studio for interactive visualization via drag-and-drop. +They can also be used for downstream analysis and cell extraction, as described in the next +section. + +Accessing segmentation masks +---------------------------- + +Saved cell segmentation masks (in \*.zip format) can be loaded with :class:`slideflow.cellseg.Segmentation`. + +.. code-block:: python + + from slideflow.cellseg import Segmentation + seg = Segmentation.load('.../slide-masks.zip') + +The mask array, ``Segmentation.masks`` , is a ``np.ndarray`` with dtype of np.uint32. Zero values are background, and masks for each cell are represented by a unique integer. Flows/gradients, +if calculated, will be available in ``Segmentation.flows``. + +Centroids for detected cells can be calculated with Segmentation.centroids(), returning an array of centroid locations. By default, coordinates are returned in mask dimension space. With the argument ``wsi_dim=True``, centroid coordinates will be in the slide dimension space. + +Caveats +------- + +There are some caveats to the cell segmentation process, including: + +- **Memory usage**: Cell segmentation requires at minimum 32 GB of RAM. Larger slides (particularly cytology) may require up to 64 GB of RAM. +- **Stitching artifacts**: At present, due to the algorithm by which whole-slide cell segmentations are stitched together, you may see some cells that are not detected, missing in a grid-like pattern. Work is ongoing to reduce these stitching artifacts. +- **Cell diameter**: The quality of cell segmentation results is highly dependent on an appropriately chosen cell diameter. Use Slideflow Studio to find the best cell diameter for your application. + +Extracting cells from slides +**************************** + +Once segmentation masks have been calculated, images of individual cells can be extracted from a whole-slide image. This can be performed for either a single slide, or all slides in a project. + +From a single slide +------------------- + +Start by loading the saved segmentation, as described above. Then, use :meth:`slideflow.WSI.apply_segmentation`, followed by :meth:`slideflow.WSI.extract_cells`. + +.. code-block:: python + + import slideflow as sf + from slideflow.cellseg import Segmentation + + # Load WSI. + wsi = sf.WSI('../slide.svs', tile_px=96, tile_um='40x') + + # Load cell segmentations. + seg = Segmentation.load('.../slide-masks.zip') + + # Apply segmentations to the slide. + wsi.apply_segmentation(seg) + + # Extract images of cells. + wsi.extract_cells(tiles_dir=...) + + +.. list-table:: + :widths: 80 20 + + * - By default, segmentation masks will be applied to the extracted cell images: + + - .. image:: cell_masked.png + + * - However, you can choose not to apply masks by using the argument ``apply_masks=False``. + + + - .. image:: cell_unmasked.png + +Tile extraction is then performed as usual. Cell images (tiles) can either be saved as loose images or in TFRecord format. See :meth:`slideflow.WSI.extract_cells` for more information. + +From all slides +--------------- + +Additionally, cell images can be extracted from all slides in a project. This should only be +done after :meth:`slideflow.Project.cell_segmentation`. + +.. code-block:: python + + P.extract_cells( + tile_px=96, + tile_um='40x', + apply_masks=True + ) + +Extracted cell images are saved by default in TFRecord format, and are otherwise handled +identically to tile images generated through :meth:`slideflow.Project.extract_tiles`. + +Complete example +**************** + +An example of a complete cell segmentation pipeline is shown below, from parameter tuning +to final tile extraction from detected cells. + +1. Slideflow Studio +------------------- + +Determine optimal cell segmenation parameters using Studio, as described above: + +.. code-block:: bash + + python -m slideflow.studio --cellpose + +2. Cell segmentation +-------------------- + +Segment cells for all slides in a Slideflow project. + +.. code-block:: python + + P = sf.Project(...) + P.cell_segmentation( + model='cyto2', + diam_um=10, + window_size=512, + downscale=2 + ) + +3. Cell image extraction +------------------------ + +Extract image tiles of segmented cells, in this case using segmentation masks. + +.. code-block:: python + + P.extract_cells( + tile_px=96, + tile_um='40x', + apply_masks=True, + grayspace_fraction=1 + ) diff --git a/docs-source/source/cellseg_workbench_advanced.png b/docs-source/source/cellseg_workbench_advanced.png new file mode 100644 index 000000000..dd97790a7 --- /dev/null +++ b/docs-source/source/cellseg_workbench_advanced.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03e7c0cbf448ecab3ee6f5aa03bf45c6cd6f91dafc4c319c595505ff9cf87d59 +size 14255 diff --git a/docs-source/source/cellseg_workbench_flows.png b/docs-source/source/cellseg_workbench_flows.png new file mode 100644 index 000000000..dc87fe412 --- /dev/null +++ b/docs-source/source/cellseg_workbench_flows.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e687ab61bdff26f65b987a58876764ca14292d1a03ae92494402f7f36176148 +size 2594693 diff --git a/docs-source/source/cellseg_workbench_masks.png b/docs-source/source/cellseg_workbench_masks.png new file mode 100644 index 000000000..236a3a57f --- /dev/null +++ b/docs-source/source/cellseg_workbench_masks.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b311f58427e31cba4376b62339f331da9ab000b6ff1425bd98fb6cd25f495640 +size 288187 diff --git a/docs-source/source/cellseg_workbench_outlines.png b/docs-source/source/cellseg_workbench_outlines.png new file mode 100644 index 000000000..4ecbf1528 --- /dev/null +++ b/docs-source/source/cellseg_workbench_outlines.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5686e4875b6e7d5f7bae0cea98e8af0849cd31de35e60614be03b1f416fa01b9 +size 3523146 diff --git a/docs-source/source/cellseg_workbench_panel.png b/docs-source/source/cellseg_workbench_panel.png new file mode 100644 index 000000000..e42fb2c6d --- /dev/null +++ b/docs-source/source/cellseg_workbench_panel.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8964e4841939be4ee80546ec78c6197eeb7c7408545784ae035d75328f51873 +size 839179 diff --git a/docs-source/source/clam.rst b/docs-source/source/clam.rst deleted file mode 100644 index b9dcc0453..000000000 --- a/docs-source/source/clam.rst +++ /dev/null @@ -1,56 +0,0 @@ -CLAM -==== - -In addition to standard Tensorflow/Keras model applications, slideflow supports training models with `CLAM `_. A slightly modified version of CLAM which supports slideflow dataset and input pipelines is included in ``slideflow.clam``. - -Creating slide activations -************************** - -The first step in the CLAM pipeline is generating tile-level activations across whole-slide images. While the original `CLAM paper `_ used features generated from an imagenet-trained model, we have found it useful to generate feature activations from models pretrained with histology images. To this end, the project function :func:`slideflow.Project.generate_features_for_clam` accepts any model as input and will generate feature vectors from the specified intermediate layers. For example: - -.. code-block:: python - - P.generate_features_for_clam( - model='/path/to/saved/model', - outdir='/clam/path', - layers=['postconv'] - ) - -Training -******** - -To train a CLAM model, use the project function :func:`slideflow.Project.train_clam`. Clam arguments are configured with :func:`slideflow.clam.get_args`: - -.. code-block:: python - - dataset = P.dataset(tile_px=299, tile_um=302) - P.generate_features_for_clam(..., outdir='/clam/path') - - clam_args = sf.clam.get_args(k=3, bag_loss='svm', ...) - - P.train_clam( - exp_name='test_experiment', - pt_files='/clam/path', - outcomes='category1', - dataset=dataset, - clam_args=clam_args - ) - -The training function will, by default, save heatmaps of the attention layers for each of the validation slides. This behavior can be disabled by passing ``attention_heatmaps=False``. - -Evaluation -********** - -To evaluate a saved CLAM model on an external dataset, first extract features from this dataset, then use the project function :func:`slideflow.Project.evaluate_clam`: - -.. code-block:: python - - P.generate_features_for_clam(..., outdir='/eval/clam/path') - - P.evaluate_clam( - exp_name='evaluation', - pt_files='/eval/clam/path', - outcomes='category1', - tile_px=299, - tile_um=302 - ) \ No newline at end of file diff --git a/docs-source/source/conf.py b/docs-source/source/conf.py index b9333cfa2..71ec115d5 100644 --- a/docs-source/source/conf.py +++ b/docs-source/source/conf.py @@ -19,8 +19,8 @@ # import os import sys +sys.path.insert(0, os.path.abspath('../../')) import slideflow as sf -sys.path.insert(0, os.path.abspath('../../source/')) # -- General configuration ------------------------------------------------ @@ -32,11 +32,21 @@ # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. -extensions = ['sphinx.ext.autodoc', +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.autosummary', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.mathjax', - 'sphinx.ext.napoleon' ] + 'sphinx.ext.napoleon', + 'sphinx.ext.viewcode', + 'sphinx_markdown_tables', + 'sphinxcontrib.video' +] + +autoclass_content = 'both' +autosummary_generate = False +add_module_names = False # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] @@ -52,7 +62,7 @@ # General information about the project. project = 'slideflow' -copyright = '2021, James M Dolezal' +copyright = '2023, James M Dolezal' author = 'James M Dolezal' # The version info for the project you're documenting, acts as replacement for @@ -60,16 +70,16 @@ # built documents. # # The short X.Y version. -version = '1.1' +version = '.'.join(sf.__version__.split('.')[:2]) # The full version, including alpha/beta/rc tags. -release = sf.__version__ +release = sf.__version__.split('+')[0] # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. -language = None +language = 'en' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. @@ -113,6 +123,9 @@ ] } +# Remove the .html from links +#html_link_suffix = '' + # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. @@ -171,7 +184,7 @@ # Example configuration for intersphinx: refer to the Python standard library. -intersphinx_mapping = {'https://docs.python.org/': None} +intersphinx_mapping = {'python': ('https://docs.python.org/', None)} def skip(app, what, name, obj, would_skip, options): if name == "__init__": @@ -179,4 +192,4 @@ def skip(app, what, name, obj, would_skip, options): return would_skip def setup(app): - app.connect("autodoc-skip-member", skip) \ No newline at end of file + app.connect("autodoc-skip-member", skip) diff --git a/docs-source/source/custom_extractors.rst b/docs-source/source/custom_extractors.rst new file mode 100644 index 000000000..200f30245 --- /dev/null +++ b/docs-source/source/custom_extractors.rst @@ -0,0 +1,274 @@ +.. _custom_extractors: + +Custom Feature Extractors +========================= + +Slideflow includes several :ref:`pretrained feature extractors ` for converting image tiles into feature vectors as well as tools to assist with building your own feature extractor. In this note, we'll walk through the process of building a custom feature extractor from both a PyTorch and Tensorflow model. + +PyTorch +******* + +Feature extractors are implemented as a subclass of :class:`slideflow.model.extractors._factory_torch.TorchFeatureExtractor`. The base class provides core functionality and helper methods for generating features from image tiles (dtype uint8) or whole-slide images (type :class:`slideflow.WSI`). + +The initializer should create the feature extraction model and move it to the appropriate device (*i.e.* GPU). The model should be a :class:`torch.nn.Module` that accepts an image tensor as input and returns a feature tensor as output. + +.. code-block:: python + + # Import your custom torch.nn.Module, + # which generates features from an image. + from my_module import MyModel + + from slideflow.model.extractors._factory_torch import TorchFeatureExtractor + + class MyFeatureExtractor(TorchFeatureExtractor): + + tag = 'my_feature_extractor' # Human-readable identifier + + def __init__(self): + super().__init__() + + # Create the device, move to GPU, and set in evaluation mode. + self.model = MyModel() + self.model.to('cuda') + self.model.eval() + +Next, the initializer should set the number of features expected to be returned by the model. + +.. code-block:: python + + ... + + def __init__(self): + ... + + self.num_features = 1024 + +The initializer is also responsible for registering image preprocessing. The image preprocessing transformation, a function which converts a raw ``uint8`` image to a ``float32`` tensor for model input, should be stored in ``self.transform``. If the transformation standardizes the images, then the parameter ``self.preprocess_kwargs`` should be set to ``{'standardize': False}``, indicating that Slideflow should not perform any additional standardization. You can use the class method ``.build_transform()`` to use the standard preprocessing pipeline. + +.. code-block:: python + + from torchvision import transforms + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.transform = self.build_transform(img_size=256) + # Disable Slideflow standardization, + # as we are standardizing with transforms.Normalize + self.preprocess_kwargs = {'standardize': False} + +The final required method is ``.dump_config()``, which returns a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with ``"class"`` and ``"kwargs"`` attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training. + +.. code-block:: python + + ... + + def dump_config(self): + return self._dump_config( + class_name='my_module.MyFeatureExtractor' + ) + +The final class should look like this: + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._factory_torch import TorchFeatureExtractor + from torchvision import transforms + + class MyFeatureExtractor(TorchFeatureExtractor): + + tag = 'my_feature_extractor' # Human-readable identifier + + def __init__(self): + super().__init__() + + # Create the device, move to GPU, and set in evaluation mode. + self.model = MyModel() + self.model.to('cuda') + self.model.eval() + self.num_features = 1024 + + # Image preprocessing. + self.transform = self.build_transform(img_size=256) + # Disable Slideflow standardization, + # as we are standardizing with transforms.Normalize + self.preprocess_kwargs = {'standardize': False} + + def dump_config(self): + return self._dump_config( + class_name='my_module.MyFeatureExtractor' + ) + +You can then use the feature extractor for generating bags for MIL training, as described in :ref:`mil`. + +.. code-block:: python + + # Build the feature extractor. + myfeatures = MyFeatureExtractor() + + # Load a dataset. + project = slideflow.load_project(...) + dataset = project.dataset(...) + + # Generate bags. + project.generate_feature_bags(myfeatures, dataset) + +You can also generate features across whole-slide images, returning a grid of features for each slide. The size of the returned grid reflects the slide's tile grid. For example, for a slide with 24 columns and 33 rows of tiles, the returned grid will have shape ``(24, 33, n_features)``. + +.. code-block:: python + + >>> myfeatures = MyFeatureExtractor() + >>> wsi = sf.WSI('path/to/wsi', tile_px=256, tile_um=302) + >>> features = myfeatures(wsi) + >>> features.shape + (24, 33, 1024) + +Finally, the feature extractor can also be used to perform latent space analysis and generate mosaic maps, as described in :ref:`activations`. + +Slideflow includes a registration system for keeping track of all available feature extractors. To register your feature extractor, use the :func:`slideflow.model.extractors.register_torch` decorator. + +.. code-block:: python + + from slideflow.model.extractors import register_torch + + @register_torch + def my_feature_extractor(**kwargs): + return MyFeatureExtractor(**kwargs) + +Once registered, a feature extractor can be built by name: + +.. code-block:: python + + import slideflow as sf + extractor = sf.build_feature_extractor('my_feature_extractor') + + +Tensorflow +********** + +Tensorflow feature extractors are implemented very similarly to PyTorch feature extractors, extended from :class:`slideflow.model.extractors._tensorflow_base.TensorflowFeatureExtractor`. + +The initializer should create the model and set the expected number of features. + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor + + class MyFeatureExtractor(TensorflowFeatureExtractor): + + tag = 'my_feature_extractor' # Unique identifier + + def __init__(self): + super().__init__() + + # Create the model. + self.model = MyModel() + self.num_features = 1024 + +.. |per_image_standardization| replace:: ``tf.image.per_image_standardization`` +.. _per_image_standardization: https://www.tensorflow.org/api_docs/python/tf/image/per_image_standardization + + +The initializer is also responsible for registering image preprocessing and transformations. Preprocessing steps are stored in the ``.preprocess_kwargs`` dictionary, which should have the keys ``standardize`` and ``transform``. If ``standardize=True``, images will be standardized using |per_image_standardization|_. If ``transform`` is not None, it should be a callable that accepts a single image tensor and returns a transformed image tensor. + +For example, to only perform standardization and no further preprocessing: + +.. code-block:: python + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': None + } + +To perform standardization and resize images to 256x256: + +.. code-block:: python + + import tensorflow as tf + + @tf.function + def resize_256(x): + return = tf.image.resize(x, (resize_px, resize_px)) + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': resize_256 + } + +The ``.dump_config()`` method should then be set, which is expected to return a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with ``"class"`` and ``"kwargs"`` attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training. + +.. code-block:: python + + ... + + def dump_config(self): + return { + 'class': 'MyFeatureExtractor', + 'kwargs': {} + } + +The final class should look like this: + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor + + class MyFeatureExtractor(TensorflowFeatureExtractor): + + tag = 'my_feature_extractor' # Unique identifier + + def __init__(self): + super().__init__() + + # Create the model. + self.model = MyModel() + self.num_features = 1024 + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': None + } + + def dump_config(self): + return { + 'class': 'MyFeatureExtractor', + 'kwargs': {} + } + +As described above, this feature extractor can then be used to create bags for MIL training, generate features across whole-slide images, or perform feature space analysis across a dataset. + +To register your feature extractor, use the :func:`slideflow.model.extractors.register_tensorflow` decorator. + +.. code-block:: python + + from slideflow.model.extractors import register_tf + + @register_tf + def my_feature_extractor(**kwargs): + return MyFeatureExtractor(**kwargs) + +...which will allow the feature extractor to be built by name: + +.. code-block:: python + + import slideflow as sf + extractor = sf.build_feature_extractor('my_feature_extractor') \ No newline at end of file diff --git a/docs-source/source/custom_loops.rst b/docs-source/source/custom_loops.rst index 83d143645..da3b1bf5b 100644 --- a/docs-source/source/custom_loops.rst +++ b/docs-source/source/custom_loops.rst @@ -1,4 +1,4 @@ -Custom training loops +Custom Training Loops ===================== To use ``*.tfrecords`` from extracted tiles in a custom training loop or entirely separate architecture (such as `StyleGAN2 `_ or `YoloV5 `_), Tensorflow ``tf.data.Dataset`` or PyTorch ``torch.utils.data.DataLoader`` objects can be created for easily serving processed images to your custom trainer. @@ -15,7 +15,7 @@ The :class:`slideflow.Dataset` class includes functions to prepare a Tensorflow P = Project('/project/path', ...) dts = P.dataset(tile_px=299, tile_um=302) -If you want to perform any balancing, use the ``.balance()`` method: +If you want to perform any mini-batch balancing, use the ``.balance()`` method: .. code-block:: python @@ -53,4 +53,4 @@ or the :meth:`slideflow.Dataset.tensorflow` method to create a ``tf.data.Dataset standardize = True, # Standardize images ) -The returned dataloaders can then be used directly with your external applications. \ No newline at end of file +The returned dataloaders can then be used directly with your external applications. Read more about :ref:`creating and using dataloaders `. \ No newline at end of file diff --git a/docs-source/source/dataloaders.rst b/docs-source/source/dataloaders.rst new file mode 100644 index 000000000..f09de8855 --- /dev/null +++ b/docs-source/source/dataloaders.rst @@ -0,0 +1,441 @@ +.. _dataloaders: + +Dataloaders: Sampling and Augmentation +====================================== + +With support for both Tensorflow and PyTorch, Slideflow provides several options for dataset sampling, processing, and augmentation. Here, we'll review the options for creating dataloaders - objects that read and process TFRecord data and return images and labels - in each framework. In all cases, data are read from TFRecords generated through :ref:`filtering`. The TFRecord data format is discussed in more detail in the :ref:`tfrecords` note. + +Tensorflow +********** + +.. |TFRecordDataset| replace:: ``tf.data.TFRecordDataset`` +.. _TFRecordDataset: https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset + +The :meth:`slideflow.Dataset.tensorflow()` method provides an easy interface for creating a ``tf.data.Dataset`` that reads and interleaves from tfrecords in a Slideflow dataset. Behind the scenes, this method uses the |TFRecordDataset|_ class for reading and parsing each TFRecord. + +The returned ``tf.data.Dataset`` object is an iterable-only dataset whose returned values depend on the arguments provided to the ``.tensorflow()`` function. + +If no arguments are provided, the returned dataset will yield a tuple of ``(image, None)``, where the image is a ``tf.Tensor`` of shape ``[tile_height, tile_width, num_channels]`` and type ``tf.uint8``. + +If the ``labels`` argument is provided (dictionary mapping slide names to a numeric label), the returned dataset will yield a tuple of ``(image, label)``, where the label is a ``tf.Tensor`` with a shape and type that matches the provided labels. + +.. code-block:: python + + import slideflow as sf + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Get the labels + labels, unique_labels = dataset.labels('HPV_status') + + # Create a tensorflow dataset + # that yields (image, label) tuples + tf_dataset = dataset.tensorflow(labels=labels) + + for image, label in tf_dataset: + # Do something with the image and label... + ... + +Slide names and tile locations +------------------------------ + +Dataloaders can be configured to return slide names and tile locations in addition to the image and label. This is done by providing the ``incl_slidenames`` and ``incl_loc`` arguments to the ``.tensorflow()`` method. Both arguments are boolean values and default to ``False``. + +Setting ``incl_slidenames=True`` will return the slidename as a Tensor (dtype=string) after the label. Setting ``incl_loc=True`` will return the x and y locations, both as Tensors (dtype=int64), as the last two values of the tuple. + +.. code-block:: python + + tf_dataset = dataset.tensorflow(incl_slidenames=True, incl_loc=True) + + for image, label, slide, loc_x, loc_y in tf_dataset: + ... + +Image preprocessing +------------------- + +.. |per_image_standardization| replace:: ``tf.image.per_image_standardization()`` +.. _per_image_standardization: https://www.tensorflow.org/api_docs/python/tf/image/per_image_standardization + +Dataloaders created with ``.tensorflow()`` include several image preprocessing options. These options are provided as keyword arguments to the ``.tensorflow()`` method and are executed in the order listed below: + +- **crop_left** (int): Crop images to this top-left x/y coordinate. Default is ``None``. +- **crop_width** (int): Crop images to this width. Default is ``None``. +- **resize_target** (int): Resize images to this width/height. Default is ``None``. +- **resize_method** (str): Resize method. Default is ``"lanczos3"``. +- **resize_aa** (bool): Enable antialiasing if resizing. Defaults to ``True``. +- **normalizer** (``StainNormalizer``): Perform stain normalization. +- **augment** (str): Perform augmentations based on the provided string. Combine characters to perform multiple augmentations (e.g. ``'xyrj'``). Options include: + - ``'n'``: Perform :ref:`stain_augmentation` (done concurrently with stain normalization) + - ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + - ``'r'``: Random 90-degree rotation + - ``'x'``: Random horizontal flip + - ``'y'``: Random vertical flip + - ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) +- **transform** (Any): Arbitrary function to apply to each image. The function must accept a single argument (the image) and return a single value (the transformed image). +- **standardize** (bool): Standardize images with |per_image_standardization|_, returning a ``tf.float32`` image. Default is ``False``, returning a ``tf.uint8`` image. + +Dataset sharding +---------------- + +Tensorflow dataloaders can be sharded into multiple partitions, ensuring that data is not duplicated when performing distributed training across multiple processes or nodes. This is done by providing the ``shard_idx`` and ``num_shards`` arguments to the ``.tensorflow()`` method. The ``shard_idx`` argument is an integer specifying the shard number, and ``num_shards`` is an integer specifying the total number of shards. + +.. code-block:: python + + # Shard the dataset for GPU 1 of 4 + tf_dataset = dataset.tensorflow( + ..., + shard_idx=0, + num_shards=4 + ) + +PyTorch +******* + +.. |DataLoader| replace:: ``torch.utils.data.DataLoader`` +.. _DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader + +As with Tensorflow, the :meth:`slideflow.Dataset.torch()` method creates a |DataLoader|_ that reads images from TFRecords. In the backend, TFRecords are read using :func:`slideflow.tfrecord.torch.MultiTFRecordDataset` and processed as described in :ref:`tfrecords`. + +The returned |DataLoader|_ is an iterable-only dataloader whose returned values depend on the arguments provided to the ``.torch()`` function. An indexable, map-style dataset is also available when using PyTorch, as described in :ref:`indexable_dataloader`. + +If no arguments are provided, the returned dataloader will yield a tuple of ``(image, None)``, where the image is a ``torch.Tensor`` of shape ``[num_channels, tile_height, tile_width]`` and type ``torch.uint8``. Labels are assigned as described above. Slide names and tile location can also be returned, using the same arguments as `described above `_. + + +.. code-block:: python + + import slideflow as sf + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Create a tensorflow dataset + torch_dl = dataset.torch() + + for image, label in torch_dl: + # Do something with the image... + ... + +Image preprocessing +------------------- + +Dataloaders created with ``.torch()`` include several image preprocessing options, provided as keyword arguments to the ``.torch()`` method. These preprocessing steps are executed in the order listed below: + +- **normalizer** (``StainNormalizer``): Perform stain normalization. +- **augment** (str): Perform augmentations based on the provided string. Combine characters to perform multiple augmentations (e.g. ``'xyrj'``). Augmentations are executed in the order characters appear in the string. Options include: + - ``'n'``: Perform :ref:`stain_augmentation` (done concurrently with stain normalization) + - ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + - ``'r'``: Random 90-degree rotation + - ``'x'``: Random horizontal flip + - ``'y'``: Random vertical flip + - ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) +- **transform** (Any): Arbitrary function to apply to each image, including `torchvision transforms `_. The function must accept a single argument (the image, in ``(num_channels, height, width)`` format) and return a single value (the transformed image). +- **standardize** (bool): Standardize images with ``image / 127.5 - 1``, returning a ``torch.float32`` image. Default is ``False``, returning a ``torch.uint8`` image. + +Below is an example of using the ``transform`` argument to apply a torchvision transform to each image: + +.. code-block:: python + + import torchvision.transforms as T + + # Create a torch dataloader + torch_dataloader = dataset.torch( + transform=T.Compose([ + RandomResizedCrop(size=(224, 224), antialias=True), + Normalize(mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]), + ]) + ) + + for image, label in torch_dataloader: + # Do something with the image and label... + ... + +Dataset sharding +---------------- + +PyTorch Dataloaders can similarly be sharded into multiple partitions, ensuring that data is not duplicated when performing distributed training across multiple process or nodes. + +Sharding is done in two stages. First, dataloaders can be split into partitions using the ``rank`` and ``num_replicas`` arguments to the ``.torch()`` method. The ``rank`` argument is an integer specifying the rank of the current process, and ``num_replicas`` is an integer specifying the total number of processes. + +.. code-block:: python + + # Shard the dataset for GPU 1 of 4 + torch_dataloader = dataset.torch( + ..., + rank=0, + num_replicas=4 + ) + +The second stage of sharding happens in the background: if a dataloader is built with multiple worker processes (``Dataset.torch(num_workers=...)``), partitions will be automatically further subdivided into smaller chunks, ensuring that each worker process reads a unique subset of the data. + +Labeling +******** + +The ``label`` argument to the ``.tensorflow()`` and ``.torch()`` methods accept a dictionary mapping slide names to a numeric label. During TFRecord reading, the slide name is used to lookup the label from the provided dictionary. + +.. warning:: + + Labels are assigned to image tiles based on the slide names inside a :ref:`tfrecord ` file, not by the filename of the tfrecord. This means that renaming a TFRecord file will not change the label of the tiles inside the file. If you need to change the slide names associated with tiles inside a TFRecord, the TFRecord file must be regenerated. + +The most common way to generate labels is to use the :meth:`slideflow.Dataset.labels()` method, which returns a dictionary mapping slide names to numeric labels. For categorical labels, the numeric labels correspond to the index of the label in the ``unique_labels`` list. For example, if the ``unique_labels`` list is ``['HPV-', 'HPV+']``, then the mapping of numeric labels would be ``{ 'HPV-': 0, 'HPV+': 1 }``. + +.. code-block:: python + + >>> labels, unique_labels = dataset.labels('HPV_status') + >>> unique_labels + ['HPV-', 'HPV+'] + >>> labels + {'slide1': 0, + 'slide2': 1, + ... + } + >>> tf_dataset = dataset.tensorflow(labels=labels) + +.. _sampling: + +Sampling +******** + +Dataloaders created with ``.tensorflow()`` and ``.torch()`` are iterable-only dataloaders, meaning that they cannot be indexed directly. This is because the underlying TFRecords are sampled in a streaming fashion, and the dataloader does not know what the next record will be until it has been read. This is in contrast to the :ref:`indexable_dataloader` method described below, which creates an indexable, map-style dataset. + +Dataloaders created with ``.tensorflow()`` and ``.torch()`` can be configured to sample from TFRecords in several ways, with options for infinite vs. finite sampling, oversampling, and undersampling. These sampling methods are described below. + +Infinite dataloaders +-------------------- + +By default, dataloaders created with ``.tensorflow()`` and ``.torch()`` will sample from TFRecords in an infinite loop. This is useful for training, where the dataloader should continue to yield images until the training process is complete. By default, images are sampled from TFRecords with uniform sampling, meaning that each TFRecord has an equal chance of yielding an image. This sampling strategy can be configured, as described below. + +.. note:: + + When training :ref:`tile-based models `, a dataloader is considered to have yielded one "epoch" of data when it has yielded the number of images equal to the number of tiles in the dataset. Due to the random sampling from TFRecords, this means that some images will be overrepresented (images from TFRecords with fewer tiles) and some will be underrepresented (images from TFRecords with many tiles). + +Finite dataloaders +------------------ + +Dataloaders can also be configured with finite sampling, yielding tiles from TFRecords exactly once. This is accomplished by passing the argument ``infinite=False`` to the ``.tensorflow()`` or ``.torch()`` methods. + +.. _balancing: + +Oversampling with balancing +--------------------------- + +Oversampling methods control the probability that tiles are read from each TFRecord, affecting the balance of data across slides, patients, and outcome categories. Oversampling is configured at the Dataset level, using the :meth:`slideflow.Dataset.balance` method. This method returns a copy of the dataset with the specified oversampling strategy. + +**Slide-level balancing**: By default, images are sampled from TFRecords with uniform probability, meaning that each TFRecord has an equal chance of yielding an image. This is equivalent to both ``.balance(strategy='slide')`` and ``.balance(strategy=None)``. This strategy will oversample images from slides with fewer tiles, and undersample images from slides with more tiles. + +.. code-block:: python + + # Sample from TFRecords with equal probability + dataset = dataset.balance(strategy='slide') + +**Patient-level balancing**: To sample from TFRecords with probability proportional to the number of tiles in each patient, use ``.balance(strategy='patient')``. This strategy will oversample images from patients with fewer tiles, and undersample images from patients with more tiles. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each patient. + dataset = dataset.balance(strategy='patient') + +**Tile-level balancing**: To sample from TFRecords with uniform probability across image tiles, use ``.balance(strategy='tile')``. This strategy will sample from TFRecords with probability proportional to the number of tiles in the TFRecord, resulting in higher representation of slides with more tiles. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each TFRecord. + dataset = dataset.balance(strategy='tile') + +**Category-level balancing**: To sample from TFRecords with probability proportional to the number of tiles in each outcome category, use ``.balance(strategy='category')``. This strategy will oversample images from outcome categories with fewer tiles, and undersample images from outcome categories with more tiles. This strategy will also perform slide-level balancing within each category. Category-level balancing is only available when using categorical labels. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each category + # "HPV-" and "HPV+". + dataset = dataset.balance("HPV_status", strategy='category') + +**Custom balancing**: The ``.balance()`` method saves sampling probability weights to ``Dataset.prob_weights``, a dictionary mapping TFRecord paths to sampling weights. Custom balancing can be performed by overriding this dictionary with custom weights. + +.. code-block:: python + + >>> dataset = dataset.balance(strategy='slide') + >>> dataset.prob_weights + {'/path/to/tfrecord1': 0.002, + '/path/to/tfrecord2': 0.003, + ... + } + >>> dataset.prob_weights = {...} + +Balancing is automatically applied to dataloaders created with the ``.tensorflow()`` and ``.torch()`` methods. + +Undersampling with clipping +--------------------------- + +Datasets can also be configured to undersample TFRecords using :meth:`slideflow.Dataset.clip`. Several undersampling strategies are available. + +**Slide-level clipping**: TFRecords can be clipped to a maximum number of tiles per slide using ``.clip(max_tiles)``. This strategy will clip TFRecords with more tiles than the specified ``max_tiles`` value, resulting in a maximum of ``max_tiles`` tiles per slide. + +**Patient-level clipping**: TFRecords can be clipped to a maximum number of tiles per patient using ``.clip(max_tiles, strategy='patient')``. For patients with more than one slide/TFRecord, TFRecords will be clipped proportionally. + +**Outcome-level clipping**: TFRecords can also be clipped to a maximum number of tiles per outcome category using ``.clip(max_tiles, strategy='category', headers=...)``. The outcome category is specified by the ``headers`` argument, which can be a single header name or a list of header names. Within each category, TFRecords will be clipped proportionally. + +**Custom clipping**: The ``.clip()`` method saves clipping values to ``Dataset._clip``, a dictionary mapping TFRecord paths to counts of how many tiles should be sampled from the TFRecord. Custom clipping can be performed by overriding this dictionary with custom weights. + +.. code-block:: python + + >>> dataset = dataset.clip(100) + >>> dataset._clip + {'/path/to/tfrecord1': 76, + '/path/to/tfrecord2': 100, + ... + } + >>> dataset._clip = {...} + +Undersampling via dataset clipping is automatically applied to dataloaders created with ``.tensorflow()`` and ``.torch()``. + +During training +--------------- + +If you are training a Slideflow model by directly providing a training and validation dataset to the :meth:`slideflow.Project.train` method, you can configure the datasets to perform oversampling and undersampling as described above. For example: + +.. code-block:: python + + import slideflow as sf + + # Load a project + project = sf.load_project(...) + + # Configure a training dataset with tile-level balancing + # and clipping to max 100 tiles per TFRecord + train = project.dataset(...).balance(strategy='tile').clip(100) + + # Get a validation dataset + val = project.dataset(...) + + # Train a model + project.train( + ..., + dataset=train, + val_dataset=val, + ) + +Alternatively, you can configure oversampling during training through the ``training_balance`` and ``validation_balance`` hyperparameters, as described in the :ref:`ModelParams ` documentation. Undersampling with dataset clipping can be performed with the ``max_tiles`` argument. Configuring oversampling/undersampling with this method propagates the configuration to all datasets generated during cross-validation. + +.. code-block:: python + + import slideflow as sf + + # Load a project + project = sf.load_project(...) + + # Configure hyperparameters with tile-level + # balancing/oversampling for the training data + hp = sf.ModelParams( + ..., + training_balance='tile', + validation_balance=None, + ) + + # Train a model. + # Undersample/clip data to max 100 tiles per TFRecord. + project.train( + ..., + params=hp, + max_tiles=100 + ) + + +.. _indexable_dataloader: + +Direct indexing +*************** + +An indexable, map-style dataloader can be created for PyTorch using :class:`slideflow.io.torch.IndexedInterleaver`, which returns a ``torch.utils.data.Dataset``. Indexable datasets are only available for the PyTorch backend. + +This indexable dataset is created from a list of TFRecords and accepts many arguments for controlling labels, augmentation and image transformations. + +.. code-block:: python + + from slideflow.io.torch import IndexedInterleaver + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Get the TFRecords + tfrecords = dataset.tfrecords() + + # Assemble labels + labels, _ = dataset.labels("HPV_status") + + # Create an indexable dataset + dts = IndexedInterleaver( + tfrecords, + labels=labels, + augment="xyrj", + transform=T.Compose([ + T.RandomResizedCrop(size=(224, 224), + antialias=True), + ]), + normalizer=None, + standardize=True, + shuffle=True, + seed=42, + ) + +The returned dataset is indexable, meaning that it can be indexed directly to retrieve a single image and label. + +.. code-block:: python + + >>> len(dts) + 284114 + >>> image, label = dts[0] + >>> image.shape + torch.Size([3, 224, 224]) + >>> image.dtype + torch.float32 + +The dataset can be configured to return slide names and tile locations by setting the ``incl_slidenames`` and ``incl_loc`` arguments to ``True``, as described above. + +Dataset sharding is supported with the same ``rank`` and ``num_replicas`` arguments as described above. + +.. code-block:: python + + # Shard for GPU 1 of 4 + dts = IndexedInterleaver( + ..., + rank=0, + num_replicas=4 + ) + +:class:`slideflow.io.IndexedInterleaver` supports undersampling via the `clip` argument (array of clipping values for each TFRecord), but does not support oversampling or balancing. + +.. code-block:: python + + # Specify TFRecord clipping values + dts = IndexedInterleaver( + tfrecords=..., + clip=[100, 75, ...], # Same length as tfrecords + ... + ) + +A |DataLoader|_ can then be created from the indexable dataset using the ``torch.utils.data.DataLoader`` class, as described in the PyTorch documentation. + +.. code-block:: python + + from torch.utils.data import DataLoader + + # Create a dataloader + dl = DataLoader( + dts, + batch_size=32, + num_workers=4, + pin_memory=True, + drop_last=True, + ) + + for image, label in dl: + # Do something with the image and label... + ... diff --git a/docs-source/source/dataset.rst b/docs-source/source/dataset.rst index 0d60b5504..eee70bbd2 100644 --- a/docs-source/source/dataset.rst +++ b/docs-source/source/dataset.rst @@ -1,132 +1,80 @@ -.. currentmodule:: slideflow.dataset +.. currentmodule:: slideflow .. _dataset: -slideflow.dataset -===================== - -The :class:`Dataset` class in this module is used to organize dataset sources, ROI annotations, -clinical annotations, and dataset processing. - -Dataset Organization ---------------------- - -A *source* is a set of slides, corresponding Regions of Interest (ROI) annotations (if available), and any tiles -extracted from these slides, either as loose tiles or in the binary TFRecord format. Sources are defined in the -project dataset configuration JSON file, with the following format: - -.. code-block:: json - - { - "SOURCE": - { - "slides": "/directory", - "roi": "/directory", - "tiles": "/directory", - "tfrecords": "/directory", - } - } - -A single *dataset* can have multiple sources. One example of this might be if you were performing a pan-cancer analysis; -you would likely have a unique source for each cancer subtype, in order to keep each set of slides and tiles distinct. -Another example might be if you are analyzing slides from multiple institutions, and you want to ensure that you are -not mixing your training and evaluation datasets. - -The :class:`Dataset` class is initialized from a dataset configuration file, a list of source names -to include from the configuration file, and tile size parameters (``tile_px`` and ``tile_um``). Clinical annotations can be -provided to this object, which can then be used to filter slides according to outcomes and perform a variety of other -class-aware functions. - -Filtering ---------- - -Datasets can be filtered with several different filtering mechanisms: - -- **filters**: A dictionary can be passed via the ``filters`` argument to a Dataset to perform filtering. The keys of this dictionary should be annotation headers, and the values of this dictionary indicate the categorical outcomes which should be included. Any slides with an outcome other than what is provided by this dict will be excluded. -- **filter_blank**: A list of headers can be provided to the ``filter_blank`` argument; any slide with a blank annotation in one of these columns will be excluded. -- **min_tiles**: An int can be provided to ``min_tiles``; any tfrecords with fewer than this number of tiles will be excluded. - -Filters can be provided at the time of Dataset instantiation by passing to the initializer: - -.. code-block:: python - - dataset = Dataset(..., filters={'HPV_status': ['negative', 'positive']}) - -... or with the :meth:`Dataset.filter` method: - -.. code-block:: python - - dataset = dataset.filter(min_tiles=50) - -Once applied, all dataset functions and parameters will reflect this filtering criteria, including the :attr:`Dataset.num_tiles` parameter. - -Dataset Manipulation --------------------- - -A number of different functions can be applied to Datasets in order to manipulate filters (:meth:`Dataset.filter`, :meth:`Dataset.remove_filter`, :meth:`Dataset.clear_filters`), balance datasets (:meth:`Dataset.balance`), or clip tfrecords to a maximum number of tiles (:meth:`Dataset.clip`). The full documentation of these functions is given below. Note: these functions return a Dataset copy with the functions applied, not to the original dataset. Thus, for proper use, assign the result of the function to the original dataset variable: - -.. code-block:: python - - dataset = dataset.clip(50) - -This also means that these functions can be chained for simplicity: - -.. code-block:: python - - dataset = dataset.balance('HPV_status').clip(50) - - -Manifest --------- - -The Dataset manifest is a dictionary mapping tfrecords to both the total number of slides, as well as the number of slides after any clipping or balancing. For example, after clipping: - -.. code-block:: python - - dataset = dataset.clip(500) - -... the :meth:`Dataset.manifest` function would return something like: - -.. code-block:: json - - { - "/path/tfrecord1.tfrecords": - { - "total": 1526, - "clipped": 500 - }, - "/path/tfrecord2.tfrecords": - { - "total": 455, - "clipped": 455 - } - } - -Training/Validation Splitting ------------------------------ - -Datasets can be split into training and validation datasets with :meth:`Dataset.train_val_split`, with full documentation given below. The result of this function is two datasets - the first training, the second validation - each a separate instance of :class:`Dataset`. - -Tile and TFRecord Processing ----------------------------- - -Datasets can also be used to process and extract tiles. Some example methods support tile and tfrecord processing include: - -- :meth:`Dataset.extract_tiles`: Performs tile extraction for all slides in the dataset. -- :meth:`Dataset.extract_tiles_from_tfrecords`: Extract tiles from saved TFRecords, saving in loose .jpg or .png format to a folder. -- :meth:`Dataset.resize_tfrecords`: Resizes all images in TFRecords to a new size. -- :meth:`Dataset.split_tfrecords_by_roi`: Splits a set of extracted tfrecords according to whether tiles are inside or outside the slide's ROI. -- :meth:`Dataset.tfrecord_report`: Generates a PDF report of the tiles inside a collection of TFRecords. - -Tensorflow & PyTorch Datasets ------------------------------ - -Finally, Datasets can also return either a ``tf.data.Datasets`` or ``torch.utils.data.Dataloader`` object to quickly and easily create a deep learning dataset ready to be used as model input, with the :meth:`Dataset.tensorflow` and :meth:`Dataset.torch` methods, respectively. - -.. automodule: slideflow.dataset - -Dataset --------- - -.. autoclass:: slideflow.Dataset - :inherited-members: \ No newline at end of file +slideflow.Dataset +================= + +.. autoclass:: Dataset + +Attributes +---------- + +.. autosummary:: + + Dataset.annotations + + Dataset.filters + Dataset.filter_blank + Dataset.filtered_annotations + Dataset.img_format + Dataset.min_tiles + Dataset.num_tiles + +Methods +------- + +.. autofunction:: slideflow.Dataset.balance +.. autofunction:: slideflow.Dataset.build_index +.. autofunction:: slideflow.Dataset.cell_segmentation +.. autofunction:: slideflow.Dataset.check_duplicates +.. autofunction:: slideflow.Dataset.clear_filters +.. autofunction:: slideflow.Dataset.clip +.. autofunction:: slideflow.Dataset.convert_xml_rois +.. autofunction:: slideflow.Dataset.extract_cells +.. autofunction:: slideflow.Dataset.extract_tiles +.. autofunction:: slideflow.Dataset.extract_tiles_from_tfrecords +.. autofunction:: slideflow.Dataset.filter +.. autofunction:: slideflow.Dataset.find_slide +.. autofunction:: slideflow.Dataset.find_tfrecord +.. autofunction:: slideflow.Dataset.generate_feature_bags +.. autofunction:: slideflow.Dataset.get_tfrecord_locations +.. autofunction:: slideflow.Dataset.get_tile_dataframe +.. autofunction:: slideflow.Dataset.harmonize_labels +.. autofunction:: slideflow.Dataset.is_float +.. autofunction:: slideflow.Dataset.kfold_split +.. autofunction:: slideflow.Dataset.labels +.. autofunction:: slideflow.Dataset.load_annotations +.. autofunction:: slideflow.Dataset.load_indices +.. autofunction:: slideflow.Dataset.manifest +.. autofunction:: slideflow.Dataset.manifest_histogram +.. autofunction:: slideflow.Dataset.patients +.. autofunction:: slideflow.Dataset.get_bags +.. autofunction:: slideflow.Dataset.read_tfrecord_by_location +.. autofunction:: slideflow.Dataset.remove_filter +.. autofunction:: slideflow.Dataset.rebuild_index +.. autofunction:: slideflow.Dataset.resize_tfrecords +.. autofunction:: slideflow.Dataset.rois +.. autofunction:: slideflow.Dataset.slide_manifest +.. autofunction:: slideflow.Dataset.slide_paths +.. autofunction:: slideflow.Dataset.slides +.. autofunction:: slideflow.Dataset.split +.. autofunction:: slideflow.Dataset.split_tfrecords_by_roi +.. autofunction:: slideflow.Dataset.summary +.. autofunction:: slideflow.Dataset.tensorflow +.. autofunction:: slideflow.Dataset.tfrecord_report +.. autofunction:: slideflow.Dataset.tfrecord_heatmap +.. autofunction:: slideflow.Dataset.tfrecords +.. autofunction:: slideflow.Dataset.tfrecords_by_subfolder +.. autofunction:: slideflow.Dataset.tfrecords_folders +.. autofunction:: slideflow.Dataset.tfrecords_from_tiles +.. autofunction:: slideflow.Dataset.tfrecords_have_locations +.. autofunction:: slideflow.Dataset.transform_tfrecords +.. autofunction:: slideflow.Dataset.thumbnails +.. autofunction:: slideflow.Dataset.torch +.. autofunction:: slideflow.Dataset.unclip +.. autofunction:: slideflow.Dataset.update_manifest +.. autofunction:: slideflow.Dataset.update_annotations_with_slidenames +.. autofunction:: slideflow.Dataset.verify_annotations_slides +.. autofunction:: slideflow.Dataset.verify_img_format +.. autofunction:: slideflow.Dataset.verify_slide_names diff --git a/docs-source/source/dataset_features.rst b/docs-source/source/dataset_features.rst new file mode 100644 index 000000000..8d9a79bdd --- /dev/null +++ b/docs-source/source/dataset_features.rst @@ -0,0 +1,28 @@ +.. currentmodule:: slideflow + +slideflow.DatasetFeatures +========================= + +.. autoclass:: DatasetFeatures + +Methods +------- + +.. autofunction:: slideflow.DatasetFeatures.activations_by_category +.. autofunction:: slideflow.DatasetFeatures.box_plots +.. autofunction:: slideflow.DatasetFeatures.concat +.. autofunction:: slideflow.DatasetFeatures.from_df +.. autofunction:: slideflow.DatasetFeatures.load_cache +.. autofunction:: slideflow.DatasetFeatures.map_activations +.. autofunction:: slideflow.DatasetFeatures.map_predictions +.. autofunction:: slideflow.DatasetFeatures.merge +.. autofunction:: slideflow.DatasetFeatures.remove_slide +.. autofunction:: slideflow.DatasetFeatures.save_cache +.. autofunction:: slideflow.DatasetFeatures.save_example_tiles +.. autofunction:: slideflow.DatasetFeatures.softmax_mean +.. autofunction:: slideflow.DatasetFeatures.softmax_percent +.. autofunction:: slideflow.DatasetFeatures.softmax_predict +.. autofunction:: slideflow.DatasetFeatures.stats +.. autofunction:: slideflow.DatasetFeatures.to_csv +.. autofunction:: slideflow.DatasetFeatures.to_df +.. autofunction:: slideflow.DatasetFeatures.to_torch \ No newline at end of file diff --git a/docs-source/source/datasets_and_val.rst b/docs-source/source/datasets_and_val.rst new file mode 100644 index 000000000..b2644f498 --- /dev/null +++ b/docs-source/source/datasets_and_val.rst @@ -0,0 +1,290 @@ +.. currentmodule:: slideflow.dataset + +.. _datasets_and_validation: + +Datasets +======== + +Working with large-scale imaging data can be both challenging and messy, so Slideflow provides the :class:`Dataset` class to assist with managing, splitting, filtering, and transforming your data for easy downstream use. :class:`Dataset` organizes a set of image tiles extracted at a specific size, along with their associated slides and clinical annotations. Datasets are used for many Slideflow functions, and can quickly generate ``torch.utils.data.DataLoader`` and ``tf.data.Datasets`` objects that provide preprocessed slide images for external applications. + +Dataset Sources +*************** + +Datasets are comprised of one or more *sources*, which are a set of slides, Regions of Interest (if available), and any tiles extracted from these slides. You might choose to organize your data into separate sources if slides are organized into distinct locations on disk - for example, if you are using multiple sets of slides from different institutions, with data from each institution stored separately. + +Loading a Dataset +***************** + +Datasets can be created either from a :ref:`Project ` - using the project's dataset configuration file - or directly by providing paths to slides, annotations, and image tile destinations. In the next sections, we'll take a look at how to create a :class:`Dataset` with each method. + +From a project +-------------- + +If you are working in the context of a :ref:`Project `, a dataset can be quickly created using :meth:`Project.dataset`. A dataset can be loaded from a given ``Project`` with the following parameters: + +- ``tile_px`` is the tile size, in pixels +- ``tile_um`` is the tile size, in microns (``int``) or magnification (``'40x'``) +- ``sources`` is an optional list of dataset sources to use + +.. code-block:: python + + import slideflow as sf + + P = sf.load_project('/project/path') + dataset = P.dataset(tile_px=299, tile_um='10x', sources=['Source1']) + +If ``sources`` is not provided, all available sources will be used. + +Alternatively, you can accomplish the same by creating a :class:`Dataset` object directly, passing in the project :ref:`dataset configuration file ` to the ``config`` argument, and a path to the annotations file to ``annotations``: + +.. code-block:: python + + dataset = sf.Dataset( + config='config.json', + sources=['Source1'], + annotations='annotations.csv', + tile_px=299, + tile_um='10x' + ) + +Manually from paths +------------------- + +You can also create a dataset by manually supplying paths to slides, destination for image tiles, and clinical annotations. A single dataset source will be created from the provided arguments, which include: + +- ``tile_px`` is the tile size, in pixels +- ``tile_um`` is the size in microns or magnification +- ``slides`` is the directory containing whole-slide images +- ``roi`` is the directory containing Regions of Interest \*.csv files +- ``tfrecords`` is the path to where image tiles should be stored in TFRecords +- ``tiles`` is the path to where image tiles should be stored as \*.jpg images +- ``annotations`` is either an annotations file (CSV) or Pandas DataFrame. + +For example, to create a dataset from a set of slides, with a configured TFRecord directory and annotations provided via Pandas DataFrame: + +.. code-block:: python + + import pandas as pd + + # Create some clinical annotations + df = pd.DataFrame(...) + + # Create a dataset + dataset = sf.Dataset( + slides='/slides', + tfrecords='/tfrecords', + annotations=df, + tile_px=299, + tile_um='10x' + ) + +When creating a :class:`Dataset` manually from paths, tfrecords should be organized into subdirectories named according to tile size. Using the above example, the tfrecords directory should look like: + +.. code-block:: none + + /tfrecords + └── 299px_10x + ├── slide1.tfrecords + ├── slide2.tfrecords + ├── slide3.tfrecords + └── ... + + +Filtering +********* + +Datasets can be filtered through several mechanisms: + +- **filters**: A dictionary, where keys are clinical annotation headers and values are the variable states which should be included. All remaining slides are removed from the dataset. +- **filter_blank**: A list of headers; any slide with a blank value in the clinical annotations in one of these columns will be excluded. +- **min_tiles**: An ``int``; any tfrecords with fewer than this number of tiles will be excluded. + +Filters can be provided at the time of Dataset creation by passing to the initializer: + +.. code-block:: python + + dataset = Dataset(..., filters={'HPV_status': ['negative', 'positive']}) + +or by using the :meth:`Dataset.filter` method: + +.. code-block:: python + + dataset = dataset.filter(min_tiles=50) + +Dataset Manipulation +******************** + +A number of functions can be applied to Datasets to manipulate patient filters (:meth:`Dataset.filter`, :meth:`Dataset.remove_filter`, :meth:`Dataset.clear_filters`), clip tfrecords to a maximum number of tiles (:meth:`Dataset.clip`), or prepare mini-batch balancing (:meth:`Dataset.balance`). The full documentation for these functions is given :ref:`in the API `. Each of these manipulations return an altered copy of the dataset for easy chaining: + +.. code-block:: python + + dataset = dataset.balance('HPV_status').clip(50) + +Each of these manipulations is performed in memory and will not affect data stored on disk. + + +Dataset Inspection +****************** + +The fastest way to inspect a :class:`Dataset` and the dataset sources loaded, number of slides found, clinical annotation columns available, and number of tiles extracted into TFRecords is the :meth:`Dataset.summary` method. + +.. code-block:: python + + dataset.summary() + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 18354 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +Manifest +******** + +:meth:`Dataset.manifest` provides a dictionary mapping tfrecords to the total number of image tiles and the number of tiles after clipping or mini-batch balancing. For example, after clipping: + +.. code-block:: python + + dataset = dataset.clip(500) + +the manifest may look something like: + +.. code-block:: json + + { + "/path/tfrecord1.tfrecords": + { + "total": 1526, + "clipped": 500 + }, + "/path/tfrecord2.tfrecords": + { + "total": 455, + "clipped": 455 + } + } + +Inspecting a dataset's manifest may be useful to better understand the effects of dataset manipulations. + +.. _validation_planning: + +Training/Validation Splitting +***************************** + +An important step when planning an experiment is to determine your validation and testing data. In total, deep learning experiments should have three groups of data: + +1) **Training** - data used for learning during training +2) **Validation** - data used for validating training parameters and early stopping (if applicable) +3) **Evaluation** - held-out data used for final testing once all training and parameter tuning has completed. Preferably an external cohort. + +| + +Slideflow includes tools for flexible training, validation, and evaluation data planning as discussed in the next sections. + +Creating a split +---------------- + +Datasets can be split into training and validation or test datasets with :meth:`Dataset.split`. The result of this function is two datasets - the first training, the second validation - each a separate instance of :class:`Dataset`. + +Slideflow provides several options for preparing a validation plan, including: + +- **strategy**: ``'bootstrap'``, ``'k-fold'``, ``'k-fold-manual'``, ``'k-fold-preserved-site'``, ``'fixed'``, and ``'none'`` +- **fraction**: (float between 0-1) [not used for k-fold validation] +- **k_fold**: int + +The default validation strategy is three-fold cross-validation (``strategy='k-fold'`` and ``k=3``). + +.. code-block:: python + + # Split a dataset into training and validation + # using 5-fold cross-validation, with this being + # the first cross-fold. + train_dataset, test_dataset = dataset.split( + model_type='classification', # Categorical labels + labels='subtype', # Label to balance between datasets + k_fold=5, # Total number of crossfolds + k_fold_iter=1, # Cross-fold iteration + splits='splits.json' # Where to save/load crossfold splits + ) + +You can also use :meth:`Dataset.kfold_split` to iterate through cross-fold splits: + +.. code-block:: python + + # Split a dataset into training and validation + # using 5-fold cross-validation + for train, test in dataset.kfold_split(k=5, labels='subtype'): + ... + + +.. _validation_strategies: + +Validation strategies +--------------------- + +.. figure:: validation.png + :width: 100% + :align: center + +The ``strategy`` option determines how the validation data is selected. + +If **fixed**, a certain percentage of your training data is set aside for testing (determined by ``fraction``). + +If **bootstrap**, validation data will be selected at random (percentage determined by ``fraction``), and all training iterations will be repeated a number of times equal to ``k_fold``. When used during training, the reported model training metrics will be an average of all bootstrap iterations. + +If **k-fold**, training data will be automatically separated into *k* number of groups (where *k* is equal to ``k_fold``), and all training iterations will be repeated *k* number of times using k-fold cross validation. The saved and reported model training metrics will be an average of all k-fold iterations. + +Datasets can be separated into manually-curated k-folds using the **k-fold-manual** strategy. Assign each slide to a k-fold cohort in the annotations file, and designate the appropriate column header with ``k_fold_header`` + +The **k-fold-preserved-site** strategy is a cross-validation strategy that ensures site is preserved across the training/validation sets, in order to reduce bias from batch effect as described by `Howard, et al `_. This strategy is recommended when using data from The Cancer Genome Atlas (`TCGA `_). + +.. note:: + Preserved-site cross-validation requires either `CPLEX `_ or `Pyomo/Bonmin `_. The original implementation of the preserved-site cross-validation algorithm described by Howard et al can be found `on GitHub `_. + +If **none**, no validation testing will be performed. + +Re-using splits +--------------- + +For all validation strategies, training/validation splits can be logged to a JSON file automatically if a splits configuration file is provided to the argument ``splits``. When provided, :meth:`Dataset.split` will prioritize using previously-generated training/validation splits rather than generating a new split. This aids with experiment reproducibility and hyperparameter tuning. If training/validation splits are being prepared by a :ref:`Project-level function `, splits will be automatically logged to a ``splits.json`` file in the project root directory. + +Creating Dataloaders +******************** + +Finally, Datasets can also return either a ``tf.data.Datasets`` or ``torch.utils.data.Dataloader`` object to quickly and easily create a deep learning dataset ready to be used as model input, with the :meth:`Dataset.tensorflow` and :meth:`Dataset.torch` methods, respectively. See :ref:`dataloaders` for more detailed information and examples. + +Datasets have many other utility functions for working with and processing data. Read more in the :ref:`Dataset API documentation `. \ No newline at end of file diff --git a/docs-source/source/er_roc_patient.png b/docs-source/source/er_roc_patient.png index 6292be164..53c91914e 100644 Binary files a/docs-source/source/er_roc_patient.png and b/docs-source/source/er_roc_patient.png differ diff --git a/docs-source/source/er_roc_tile.png b/docs-source/source/er_roc_tile.png index 2e5115f1a..587c04c81 100644 Binary files a/docs-source/source/er_roc_tile.png and b/docs-source/source/er_roc_tile.png differ diff --git a/docs-source/source/evaluation.rst b/docs-source/source/evaluation.rst index c0b4dc2f2..ea5d06af4 100644 --- a/docs-source/source/evaluation.rst +++ b/docs-source/source/evaluation.rst @@ -1,48 +1,160 @@ +.. _evaluation: + Evaluation ========== -In addition to examining cross-validation training performance, model performance can be assessed with external dataset evaluation, and visualization of predictions across evaluation slides in the form of a heatmap. +Slideflow includes several tools for evaluating trained models. In the next sections, we'll review how to evaluate a model on a held-out test set, generate predictions without ground-truth labels, and visualize predictions with heatmaps. -Model evaluation -**************** +Evaluating a test set +********************* -Once training and hyperparameter tuning is complete, you can test model performance on your held-out evaluation set using the ``evaluate`` function. Specify the path to the saved with the ``model`` argument. For example: +The :meth:`slideflow.Project.evaluate` provides an easy interface for evaluating model performance on a held-out test set. Locate the saved model to evaluate (which will be in the project ``models/`` folder). :ref:`As with training `, the dataset to evaluate can be specified using either the ``filters`` or ``dataset`` arguments. If neither is provided, all slides in the project will be evaluated. .. code-block:: python + # Method 1: specifying filters P.evaluate( model="/path/to/trained_model_epoch1", - outcomes="category", - filters={"dataset": ["eval"]} + outcomes="tumor_type", + filters={"dataset": ["test"]} ) -.. autofunction:: slideflow.Project.evaluate - :noindex: + # Method 2: specify a dataset + dataset = P.dataset(tile_px=299, tile_um='10x') + test_dataset = dataset.filter({"dataset": ["test"]}) + P.evaluate( + model="/path/to/trained_model_epoch1", + outcomes="tumor_type", + dataset=test_dataset + ) -Heatmaps -******** +Results are returned from the ``Project.evaluate()`` function as a dictionary and saved in the project evaluation directory. Tile-, slide-, and patient- level predictions are also saved in the corresponding project evaluation folder, ``eval/``. + +Generating predictions +********************** -To generate a predictive heatmap for a set of slides, use the ``generate_heatmaps()`` function as below, which will automatically save heatmap images in your project directory: +For a dataset +------------- + +:meth:`slideflow.Project.predict` provides an interface for generating model predictions on an entire dataset. As above, locate the saved model from which to generate predictions, and specify the dataset with either ``filters`` or ``dataset`` arguments. .. code-block:: python - P.generate_heatmaps( + dfs = P.predict( model="/path/to/trained_model_epoch1", - filters={"dataset": ["eval"]} + filters={"dataset": ["test"]} ) + print(dfs['patient']) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + patient ... cohort-y_pred1 + 0 TCGA-05-4244-01Z-00-DX1... ... 0.032608 + 1 TCGA-05-4245-01Z-00-DX1... ... 0.216634 + 2 TCGA-05-4249-01Z-00-DX1... ... 0.000858 + 3 TCGA-05-4250-01Z-00-DX1... ... 0.015915 + 4 TCGA-05-4382-01Z-00-DX1... ... 0.020700 + .. ... ... ... + 936 TCGA-O2-A52S-01Z-00-DX1... ... 0.983500 + 937 TCGA-O2-A52V-01Z-00-DX1... ... 0.773328 + 938 TCGA-O2-A52W-01Z-00-DX1... ... 0.858558 + 939 TCGA-S2-AA1A-01Z-00-DX1... ... 0.000212 + 940 TCGA-XC-AA0X-01Z-00-DX1... ... 0.632612 + +Results are returned as a dictionary of pandas DataFrames (with the keys ``'tile'``, ``'slide'``, and ``'patient'`` for each level of prediction) and saved in the project evaluation directory, ``eval/``. + +For a single slide +------------------ + +You can also generate predictions for a single slide with either :func:`slideflow.slide.predict` or :meth:`slideflow.WSI.predict`. + +.. code-block:: python + + import slideflow as sf + + slide = '/path/to/slide.svs' + model = '/path/to/model_epoch1' + sf.slide.predict(slide, model) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + array([0.84378019, 0.15622007]) + +The returned array has the shape ``(num_classes,)``, indicating the whole-slide prediction for each outcome category. If the model was trained with uncertainty quantification, this function will return two arrays; the first with predictions, the second with estimated uncertainty. + +.. _generate_heatmaps: + +Heatmaps +******** -.. autofunction:: slideflow.Project.generate_heatmaps - :noindex: +For a dataset +------------- -If you would like to directly interact with the calculated heatmap data, create a :class:`slideflow.Heatmap` object by providing a path to a slide, a path to a model, and tile size information: +Predictive heatmaps can be created for an entire dataset using :meth:`slideflow.Project.generate_heatmaps`. Heatmaps will be saved and exported in the project directory. See the linked API documentation for arguments and customization. .. code-block:: python - from slideflow import Heatmap + P.generate_heatmaps(model="/path/to/trained_model_epoch1") - heatmap = Heatmap( +For a single slide +------------------ + +:class:`slideflow.Heatmap` provides more granular control for calculating and displaying a heatmap for a given slide. The required arguments are: + +- ``slide``: Either a path to a slide, or a :class:`slideflow.WSI` object. +- ``model``: Path to a saved Slideflow model. + +Additional keyword arguments can be used to customize and optimize the heatmap. In this example, we'll increase the batch size to 64 and allow multiprocessing by setting ``num_processes`` equal to our CPU core count, 16. + +.. code-block:: python + + heatmap = sf.Heatmap( slide='/path/to/slide.svs', model='/path/to/model' + batch_size=64, + num_processes=16 ) -The spatial map of logits, as calculated across the input slide, can be accessed through ``heatmap.logits``. The spatial map of post-convolution, penultimate activations can be accessed through ``heatmap.postconv``. The heatmap can be saved with ``heatmap.save('/path/')``. \ No newline at end of file +If ``slide`` is a :class:`slideflow.WSI`, the heatmap will be calculated only within non-masked areas and ROIs, if applicable. + +.. code-block:: python + + from slideflow.slide import qc + + # Prepare the slide + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302, rois='/path') + wsi.qc([qc.Otsu(), qc.Gaussian()]) + + # Generate a heatmap + heatmap = sf.Heatmap( + slide=wsi, + model='/path/to/model' + batch_size=64, + num_processes=16 + ) + +If ``slide`` is a path to a slide, Regions of Interest can be provided through the optional ``roi_dir`` or ``rois`` arguments. + +Once generated, heatmaps can be rendered and displayed (ie. in a Jupyter notebook) with :meth:`slideflow.Heatmap.plot`. + +.. code-block:: python + + heatmap.plot(class_idx=0, cmap='inferno') + +Insets showing zoomed-in portions of the heatmap can be added with :meth:`slideflow.Heatmap.add_inset`: + +.. code-block:: python + + heatmap.add_inset(zoom=20, x=(10000, 10500), y=(2500, 3000), loc=1, axes=False) + heatmap.add_inset(zoom=20, x=(12000, 12500), y=(7500, 8000), loc=3, axes=False) + heatmap.plot(class_idx=0, mpp=1) + +.. image:: heatmap_inset.jpg + +| + +Save rendered heatmaps for each outcome category with :meth:`slideflow.Heatmap.save`. The spatial map of predictions, as calculated across the input slide, can be accessed through ``Heatmap.predictions``. You can save the numpy array with calculated predictions (and uncertainty, if applicable) as an \*.npz file using :meth:`slideflow.Heatmap.save_npz`. \ No newline at end of file diff --git a/docs-source/source/example_report_small.jpg b/docs-source/source/example_report_small.jpg new file mode 100644 index 000000000..d2c026a43 Binary files /dev/null and b/docs-source/source/example_report_small.jpg differ diff --git a/docs-source/source/extract_tiles.rst b/docs-source/source/extract_tiles.rst deleted file mode 100644 index b421288b3..000000000 --- a/docs-source/source/extract_tiles.rst +++ /dev/null @@ -1,124 +0,0 @@ -.. _filtering: - -Tile extraction -=============== - -The next step is tile extraction, which is accomplished using the ``extract_tiles()`` function. The only arguments required are ``tile_px`` and ``tile_um``, which determine the size of the extracted tiles in pixels and microns, respectively: - -.. code-block:: python - - P.extract_tiles(tile_px=299, tile_um=302) - -To filter according to a columns in your annotations file, pass a dictionary to ``filters``, with keys equal to column names and values equal to a list of all acceptable values you want to include. If this argument is not supplied, all valid slides will be extracted. - -For example, to extract tiles only for slides that are labeled as "train" in the "dataset" column header in your annotations file, do: - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - filters={"dataset": ["train"]} - ) - -To further filter by the annotation header "mutation_status", including only slides with the category "braf" or "ras", do: - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - filters={ - "dataset": ["train"], - "mutation_status": ["braf", "ras"] - } - ) - -.. note:: - The ``filters`` argument can be also used for filtering input slides in many slideflow functions, including ``train()``, ``evaluate()``, ``generate_heatmaps()``, and ``generate_mosaic()``. - -Tiles will be extracted at the specified pixel and micron size. Tiles will be automatically stored in TFRecord format, although loose tiles can also be saved by passing ``save_tiles=True``. - -The full documentation for the ``extract_tiles`` function is given below: - -.. autofunction:: slideflow.Project.extract_tiles - :noindex: - -ROIs -**** - -By default, slides with valid ROIs will only have tiles extracted from within ROIs, and slides without ROIs will have tiles extracted across the whole-slide image. To skip slides that are missing ROIs, use ``skip_missing_roi=True``. To ignore ROIs entirely and extract tiles from whole-slide images, pass ``roi_method='ignore'``. You can alternatively extract *outside* the annotated ROIs by passing ``roi_method='outside'``. - -Stain Normalization -******************* - -Tiles can be normalized to account for differing strengths of H&E staining, which has been shown to improve machine learning accuracy on some datasets. Several normalization algorithms exist, and none have shown clear superiority over the other. However, while tile normalization may improve training performance, some tiles and slides may be prone to artifacts as a result of normalization algorithms. - -If you choose to use normalization, you may either normalize images to an internal H&E-stained control image contained within the pipeline, or you may explicitly provide a reference image for normalization. - -Normalization can be performed at the time of tile extraction or in real-time during training. Real-time normalization adds CPU overhead and may increase training or inference times for some models, although it allows greater flexibility, as normalization strategies can be changed without re-extracting tiles from your entire dataset. - -To normalize tiles during tile extraction, use the ``normalizer`` and ``normalizer_source`` arguments; ``normalizer`` is the name of the algorithm to use. A path to a normalization reference image may optionally be provided through ``normalizer_source``. Available stain normalization algorithms include: - -- **macenko**: M. Macenko et al., ‘A method for normalizing histology slides for quantitative analysis’, *IEEE International Symposium on Biomedical Imaging: From Nano to Macro*, 2009, pp. 1107–1110. -- **vahadane**: A. Vahadane et al., ‘Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images’, *IEEE Transactions on Medical Imaging*, vol. 35, no. 8, pp. 1962–1971, Aug. 2016. -- **reinhard**: E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, ‘Color transfer between images’, *IEEE Computer Graphics and Applications*, vol. 21, no. 5, pp. 34–41, Sep. 2001. -- **reinhard_fast**: A modification of the Reinhard algorithm with the brightness standardization step removed for computational efficiency. - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - normalizer='reinhard' - ) - -Alternatively, real-time normalization can be performed with all pipeline functions that process TFRecords. For example, real-time normalization during training is enabled by setting the appropriate hyperparameter: - -.. code-block:: python - - from slideflow.model import ModelParams - hp = ModelParams(..., normalizer='reinhard') - -If a normalizer was used during model training, the appropriate information will be stored in the model metadata file, `params.json`, located in the saved model folder. Any function within `slideflow` that uses this model will then process images using the same normalization strategy. - -Background filtering -******************** - -Slide background can be detected and filtered by two types of methods - tile-based methods and slide-based methods. - -Whitespace and grayspace filtering are two tile-based methods that detect the amount of whitespace or grayspace in a given tile, discarding the tile if the content exceeds a set threshold. Whitespace is calculated using overall brightness for each pixel, then counting the fraction of pixels with a brightness above some threshold. Grayspace is calculated by converting RGB images to the HSV spectrum, then counting the fraction of pixels with a saturation below some threshold. This filtering is performed separately for each tile as it is being extracted. Grayspace filtering is the default background filtering behavior. The arguments ``whitespace_fraction``, ``whitespace_threshold``, ``grayspace_fraction``, and ``grayspace_threshold`` are used for these methods, as described in the documentation for the tile extraction function (:func:`slideflow.Dataset.extract_tiles`). - -Alternatively, Otsu's thresholding can be performed on the lowest downsample level for a whole slide. This method generates a mask that identifies areas of foreground and marks areas of background to be discarded. Otsu's thresholding is performed in the HSV colorspace, and generally yields identical results to grayspace filtering. Otsu's thresholding is ~30% faster than grayspace filtering for slides with accessible downsample layers, but if downsample layers are not stored in a given slide or are inaccessible (e.g. ``enable_downsample=False``, which should be set for any system that does not have a patched pixman library), grayspace filtering will be significantly faster. To use Otsu's thresholding, set the argument ``qc='otsu'`` (and disable grayspace filtering by setting ``grayspace_threshold=1``) - -If you have pixman>0.38 and use slides with accessible downsample layers, Otsu's thresholding should be used. Otherwise, grayspace filtering will be faster. - -Quality control -*************** - -In addition to background filtering, additional blur-detection quality control can be used to identify out-of-focus areas, or areas with artifact. If annotated Regions of Interest (ROIs) are not available for your dataset, blur detection quality control should be enabled in order to ensure that high quality image tiles are extracted. If ROIs *are* available, it may be unnecessary. Blur detection may increase tile extraction time by 50% or more. - -To use blur detection QC, set ``qc='blur'`` (or ``qc='both'`` if also using Otsu's thresholding). - -If both Otsu's thresholding and blur detection are being used, Slideflow will automatically calculate Blur Burden, a metric used to assess the degree to which non-background tiles are either out-of-focus or contain artifact. In the tile extraction PDF report that is generated, the distribution of blur burden for slides in the dataset will be plotted on the first page. The report will contain the number of slides meeting criteria for warning, when the blur burden exceeds 5% for a given slide. A text file containing names of slides with high blur burden will be saved in the exported TFRecords directory. These slides should be manually reviewed to ensure they are of high enough quality to include in the dataset. - -Performance optimization -************************ - -The ``libvips`` library is used for all slide reading and tile extraction. As tile extraction is heavily reliant on random access reading, significant performance gains can be experienced by either 1) moving all slides to an SSD, or 2) utilizing an SSD or ramdisk buffer (to which slides will be copied prior to extraction). The use of a ramdisk buffer can improve tile extraction speed by 10-fold or greater! To maximize performance, pass the buffer path to the argument ``buffer``. - -Multiprocessing and multithreading is used during tile extraction to maximize performance efficiency. The number of process workers and threads per worker can be manually specified with ``num_workers`` and ``num_threads``, respectively. Optimal results are generally seen by setting ``num_workers=2`` and ``num_threads`` equal to the number of CPU cores available. Tile extraction speed scales linearly with CPU core availability. - -Extraction reports -****************** - -Once tiles have been extracted, a PDF report will be generated with a summary and sample of tiles extracted from their corresponding slides. An example of such a report is given below. It is generally good practice to review this report, as you may catch slides with data corruption, artifacts with stain normalization, or suboptimal whitespace/grayspace filtering. The report is saved in the project root directory. - -In addition to viewing reports after tile extraction, you may generate new reports on existing tfrecords with :func:`slideflow.Dataset.tfrecord_report`, by calling this function on a given dataset (see :ref:`dataset` for more information on datasets). For example: - -.. code-block:: python - - dataset = P.dataset(tile_px=299, tile_um=302) - dataset.tfrecord_report("/path/to/dest") - -You can also generate reports for slides that have not yet been extracted by passing ``dry_run=True`` to :meth:`slideflow.Dataset.extract_tiles`. \ No newline at end of file diff --git a/docs-source/source/features.rst b/docs-source/source/features.rst new file mode 100644 index 000000000..a7d086a58 --- /dev/null +++ b/docs-source/source/features.rst @@ -0,0 +1,485 @@ +.. _features: + +Generating Features +=================== + +Converting images into feature vectors is a common step for many machine learning tasks, including :ref:`feature space analysis ` and :ref:`multiple-instance learning (MIL) `. Slideflow provides a simple API for generating features from image tiles and includes several pretrained feature extractors. You can see a list of all available feature extractors with :func:`slideflow.list_extractors`. + +Generating Features +******************* + +The first step in generating features from a dataset of images is creating a feature extractor. Many types of feature extractors can be used, including imagenet-pretrained models, models finetuned in Slideflow, histology-specific pretrained feature extractors (ie. "foundation models"), or fine-tuned SSL models. In all cases, feature extractors are built with :func:`slideflow.build_feature_extractor`, and features are generated for a :ref:`Dataset ` using :meth:`slideflow.Dataset.generate_feature_bags`, as described :ref:`below `. + +.. code-block:: python + + # Build a feature extractor + ctranspath = sf.build_feature_extractor('ctranspath') + + # Generate features for a dataset + dataset.generate_feature_bags(ctranspath, outdir='/path/to/features') + + +Pretrained Extractors +********************* + +Slideflow includes several pathology-specific feature extractors, also referred to as foundation models, pretrained on large-scale histology datasets. + +.. list-table:: **Pretrained feature extractors.** Note: "histossl" was renamed to "phikon" in Slideflow 3.0. + :header-rows: 1 + :widths: 14 10 8 8 8 14 28 10 + + * - Model + - Type + - WSIs + - Input size + - Dim + - Source + - Package + - Link + * - **Virchow** + - DINOv2 + - 1.5M + - 224 + - 2560 + - Paige + - ``slideflow`` + - `Paper `__ + * - **CTransPath** + - SRCL + - 32K + - 224 + - 768 + - Tencent AI Lab + - ``slideflow-gpl`` + - `Paper `__ + * - **RetCCL** + - CCL + - 32K + - 256 + - 2048 + - Tencent AI Lab + - ``slideflow-gpl`` + - `Paper `__ + * - **Phikon** + - iBOT + - 6.1K + - 224 + - 768 + - Owkin + - ``slideflow-noncommercial`` + - `Paper `__ + * - **PLIP** + - CLIP + - N/A + - 224 + - 512 + - Zhao Lab + - ``slideflow-noncommercial`` + - `Paper `__ + * - **UNI** + - DINOv2 + - 100K + - 224 + - 1024 + - Mahmood Lab + - ``slideflow-noncommercial`` + - `Paper `__ + * - **GigaPath** + - DINOv2 + - 170K + - 256 + - 1536 + - Microsoft + - ``slideflow-noncommercial`` + - `Paper `__ + + +In order to respect the original licensing agreements, pretrained models are distributed in separate packages. The core ``slideflow`` package provides access to models under the **Apache-2.0** license, while models under **GPL-3.0** are available in the ``slideflow-gpl`` package. Models restricted to non-commercial use are available under the **CC BY-NC 4.0** license through the ``slideflow-noncommercial`` package. + +Loading weights +--------------- + +Pretrained feature extractors will automatically download their weights from Hugging Face upon creation. Some models, such as PLIP, GigaPath, UNI, and Phikon, require approval for access. Request approval on Hugging Face and ensure your local machine has been `authenticated `_. + +All pretrained models can also be loaded using local weights. Use the ``weights`` argument when creating a feature extractor. + +.. code-block:: python + + # Load UNI with local weights + uni = sf.build_feature_extractor('uni', weights='../pytorch_model.bin') + +Image preprocessing +------------------- + +Each feature extractor includes a default image preprocessing pipeline that matches the original implementation. However, preprocessing can also be manually adjusted using various keyword arguments when creating a feature extractor. + +- **resize**: ``int`` or ``bool``. If an ``int``, resizes images to this size. If ``True``, resizes images to the input size of the feature extractor. Default is ``False``. +- **center_crop**: ``int`` or ``bool``. If an ``int``, crops images to this size. If ``True``, crops images to the input size of the feature extractor. Center-cropping happens after resizing, if both are used. Default is ``False``. +- **interpolation**: ``str``. Interpolation method for resizing images. Default is ``bilinear`` for most models, but is ``bicubic`` for GigaPath and Virchow. +- **antialias**: ``bool``. Whether to apply antialiasing to resized images. Default is ``False`` (matching the default behavior of torchvision < 0.17). +- **norm_mean**: ``list``. Mean values for image normalization. Default is ``[0.485, 0.456, 0.406]`` for all models except PLIP. +- **norm_std**: ``list``. Standard deviation values for image normalization. Default is ``[0.229, 0.224, 0.225]`` for all models except PLIP. + + +Example: + +.. code-block:: python + + # Load a feature extractor with custom preprocessing + extractor = sf.build_feature_extractor( + 'ctranspath', + resize=224, + interpolation='bicubic', + antialias=True + ) + +Default values for these processing arguments are determined by the feature extractor. One notable exception to the standard preprocessing algorithm is GigaPath, for which images are resized first (default to 256x256) and then center cropped (default to 224x224), which mirrors the official implementation. + +For transparency, you can see the current preprocessing pipeline with ``extractor.transform``: + +.. code-block:: python + + >>> import slideflow as sf + >>> ctranspath = sf.build_feature_extractor( + ... 'ctranspath', + ... resize=256, + ... interpolation='bicubic', + ... center_crop=224 + ... ) + >>> ctranspath.transform + Compose( + CenterCrop(size=(224, 224)) + Resize(size=256, interpolation=bicubic, max_size=None, antialias=False) + Lambda() + Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)) + ) + + +GigaPath +-------- + +GigaPath is a DINOv2-based model from Microsoft/Providence trained on 170k whole-slide images and is bundled with ``slideflow-noncommercial``. The GigaPath model includes additional dependencies which are not broadly compatible with all OS distributions, and are thus not installed by default. To install the GigaPath dependencies: + +.. code-block:: bash + + pip install slideflow-noncommercial[gigapath] git+ssh://git@github.com/prov-gigapath/prov-gigapath + + +GigaPath has two stages: a tile encoder and slide-level encoder. The tile encoder (``"gigapath.tile"``) works the same as all other feature extractors in Slideflow. You can build this encoder directly: + +.. code-block:: python + + # Build the tile encoder + gigapath_tile = sf.build_feature_extractor("gigapath.tile") + + # Use the tile encoder + project.generate_feature_bags(gigapath_tile, ...) + + +or you can build the combined tile+slide model, and then use ``gigapath.tile``: + +.. code-block:: python + + # Build the tile encoder + gigapath = sf.build_feature_extractor("gigapath") + + # Use the tile encoder + project.generate_feature_bags(gigapath.tile, ...) + +As there are two stages to GigaPath, there are also separate model weights. As with other pretrained feature extractors, the weights will be auto-downloaded from Hugging Face upon first use if you are logged into Hugging Face and have been granted access to the repository. If you have manually downloaded the weights, these can be used with the following: + +.. code-block:: python + + # Example of how to supply tile + slide weights + # For the full GigaPath model + gigapath = sf.build_feature_extractor( + 'gigapath', + tile_encoder_weights='../pytorch_model.bin', + slide_encoder_weights='../slide_encoder.pth' + ) + + # Or, just supply the tile weights + gigapath_tile = sf.build_feature_extractor( + 'gigapath.tile', + weights='pytorch_model.bin' + ) + + +Once feature bags have been generated and saved with the GigaPath tile encoder, you can then generate slide-level embeddings with ``gigapath.slide``: + +.. code-block:: python + + # Load GigaPath + gigapath = sf.build_feature_extractor('gigapath') + + # Generate tile-level features + project.generate_feature_bags(gigapath.tile, ..., outdir='/gigapath_bags') + + # Generate slide-level embeddings + gigapath.slide.generate_and_save('/gigapath_bags', outdir='/gigapath_embeddings') + +In addition to running the tile and slide encoder steps separately, you can also run the combined pipeline all at once on a whole-slide image, generating a final slide-level embedding. + +.. code-block:: python + + # Load GigaPath + gigapath = sf.build_feature_extractor('gigapath') + + # Load slide + wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128) + + # Generate slide embedding + embedding = gigapath(wsi) + + +ImageNet Features +***************** + +To calculate features from an ImageNet-pretrained network, first build an imagenet feature extractor with :func:`slideflow.build_feature_extractor`. The first argument should be the name of an architecture followed by ``_imagenet``, and the expected tile size should be passed to the keyword argument ``tile_px``. You can optionally specify the layer from which to generate features with the ``layers`` argument; if not provided, it will default to calculating features from post-convolutional layer activations. For example, to build a ResNet50 feature extractor for images at 299 x 299 pixels: + +.. code-block:: python + + resnet50 = sf.build_feature_extractor( + 'resnet50_imagenet', + tile_px=299 + ) + +This will calculate features using activations from the post-convolutional layer. You can also concatenate activations from multiple neural network layers and apply pooling for layers with 2D output shapes. + +.. code-block:: python + + resnet50 = sf.build_feature_extractor( + 'resnet50_imagenet', + layers=['conv1_relu', 'conv3_block1_2_relu'], + pooling='avg', + tile_px=299 + ) + +If a model architecture is available in both the Tensorflow and PyTorch backends, Slideflow will default to using the active backend. You can manually set the feature extractor backend using ``backend``. + +.. code-block:: python + + # Create a PyTorch feature extractor + extractor = sf.build_feature_extractor( + 'resnet50_imagenet', + layers=['layer2.0.conv1', 'layer3.1.conv2'], + pooling='avg', + tile_px=299, + backend='torch' + ) + +You can view all available feature extractors with :func:`slideflow.model.list_extractors`. + +Layer Activations +***************** + +You can also calculate features from any model trained in Slideflow. The first argument to ``build_feature_extractor()`` should be the path of the trained model. You can optionally specify the layer at which to calculate activations using the ``layers`` keyword argument. If not specified, activations are calculated at the post-convolutional layer. + +.. code-block:: python + + # Calculate features from trained model. + features = build_feature_extractor( + '/path/to/model', + layers='sepconv3_bn' + ) + +Self-Supervised Learning +************************ + +Finally, you can also generate features from a trained :ref:`self-supervised learning ` model (either `SimCLR `_ or `DinoV2 `_). + +For SimCLR models, use ``'simclr'`` as the first argument to ``build_feature_extractor()``, and pass the path to a saved model (or saved checkpoint file) via the keyword argument ``ckpt``. + +.. code-block:: python + + simclr = sf.build_feature_extractor( + 'simclr', + ckpt='/path/to/simclr.ckpt' + ) + +For DinoV2 models, use ``'dinov2'`` as the first argument, and pass the model configuration YAML file to ``cfg`` and the teacher checkpoint weights to ``weights``. + +.. code-block:: python + + dinov2 = sf.build_feature_extractor( + 'dinov2', + weights='/path/to/teacher_checkpoint.pth', + cfg='/path/to/config.yaml' + ) + + + +Custom Extractors +***************** + +Slideflow also provides an API for integrating your own custom, pretrained feature extractor. See :ref:`custom_extractors` for additional information. + +.. _bags: + +Exporting Features +****************** + +Feature bags +------------ + +Once you have prepared a feature extractor, features can be generated for a dataset and exported to disk for later use. Pass a feature extractor to the first argument of :meth:`slideflow.Project.generate_feature_bags`, with a :class:`slideflow.Dataset` as the second argument. + +.. code-block:: python + + # Load a project and dataset. + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Create a feature extractor. + ctranspath = sf.build_feature_extractor('ctranspath', resize=True) + + # Calculate & export feature bags. + P.generate_feature_bags(ctranspath, dataset) + +.. note:: + + If you are generating features from a SimCLR model trained with stain normalization, + you should specify the stain normalizer using the ``normalizer`` argument to :meth:`slideflow.Project.generate_feature_bags` or :class:`slideflow.DatasetFeatures`. + +Features are calculated for slides in batches, keeping memory usage low. By default, features are saved to disk in a directory named ``pt_files`` within the project directory, but you can override the destination directory using the ``outdir`` argument. + +Alternatively, you can calculate features for a dataset using :class:`slideflow.DatasetFeatures` and the ``.to_torch()`` method. This will calculate features for your entire dataset at once, which may require a large amount of memory. The first argument should be the feature extractor, and the second argument should be a :class:`slideflow.Dataset`. + +.. code-block:: python + + # Calculate features for the entire dataset. + features = sf.DatasetFeatures(ctranspath, dataset) + + # Export feature bags. + features.to_torch('/path/to/bag_directory/') + + +.. warning:: + + Using :class:`slideflow.DatasetFeatures` directly may result in a large amount of memory usage, particularly for sizable datasets. When generating feature bags for training MIL models, it is recommended to use :meth:`slideflow.Project.generate_feature_bags` instead. + +Feature "bags" are PyTorch tensors of features for all images in a slide, saved to disk as ``.pt`` files. These bags are used to train MIL models. Bags can be manually loaded and inspected using :func:`torch.load`. + +.. code-block:: python + + >>> import torch + >>> bag = torch.load('/path/to/bag.pt') + >>> bag.shape + torch.Size([2310, 768]) + >>> bag.dtype + torch.float32 + +When image features are exported for a dataset, the feature extractor configuration is saved to ``bags_config.json`` in the same directory as the exported features. This configuration file can be used to rebuild the feature extractor. An example file is shown below. + +.. code-block:: json + + { + "extractor": { + "class": "slideflow.model.extractors.ctranspath.CTransPathFeatures", + "kwargs": { + "center_crop": true + } + }, + "normalizer": { + "method": "macenko", + "fit": { + "stain_matrix_target": [ + [ + 0.5062568187713623, + 0.22186939418315887 + ], + [ + 0.7532230615615845, + 0.8652154803276062 + ], + [ + 0.4069173336029053, + 0.42241501808166504 + ] + ], + "target_concentrations": [ + 1.7656903266906738, + 1.2797492742538452 + ] + } + }, + "num_features": 2048, + "tile_px": 299, + "tile_um": 302 + } + +The feature extractor can be manually rebuilt using :func:`slideflow.model.rebuild_extractor()`: + +.. code-block:: python + + from slideflow.model import rebuild_extractor + + # Recreate the feature extractor + # and stain normalizer, if applicable + extractor, normalizer = rebuild_extractor('/path/to/bags_config.json') + + +From a TFRecord +--------------- + +In addition to generating and exporting feature bags for a dataset, features can also be generated from a single TFRecord file. This may be useful for debugging or testing purposes. + +.. code-block:: python + + import slideflow as sf + + # Create a feature extractor + ctranspath = sf.build_feature_extractor('ctranspath') + + # Bags is a tensor of shape (n_tiles, n_features) + # Coords is a tensor of shape (n_tiles, 2), containing x/y tile coordinates. + bags, coords = ctranspath('file.tfrecords') + + +From a whole-slide image +------------------------ + +Feature extractors can also create features from a whole-slide image. This is useful for single-slide analysis, MIL inference, and other tasks where features are needed for the entire slide. Features are returned as a 3D tensor, with shape ``(width, height, n_features)``, reflecting the spatial arrangement of features for tiles across the image. + +.. code-block:: python + + # Load a feature extractor. + ctranspath = sf.build_feature_extractor('ctranspath') + + # Load a whole-slide image. + wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128) + + # Generate features for the whole slide. + # Shape: (width, height, n_features) + features = ctranspath(wsi) + + +Mixed precision +--------------- + +All feature extractors will use mixed precision by default. This can be disabled by setting the ``mixed_precision`` argument to ``False`` when creating the feature extractor. + +.. code-block:: python + + # Load a feature extractor without mixed precision + extractor = sf.build_feature_extractor('ctranspath', mixed_precision=False) + + +License & Citation +------------------ + +Licensing and citation information for the pretrained feature extractors is accessible with the ``.license`` and ``.citation`` attributes. + +.. code-block:: python + + >>> ctranspath.license + 'GNU General Public License v3.0' + >>> print(ctranspath.citation) + + @{wang2022, + title={Transformer-based Unsupervised Contrastive Learning for Histopathological Image Classification}, + author={Wang, Xiyue and Yang, Sen and Zhang, Jun and Wang, Minghui and Zhang, Jing and Yang, Wei and Huang, Junzhou and Han, Xiao}, + journal={Medical Image Analysis}, + year={2022}, + publisher={Elsevier} + } diff --git a/docs-source/source/gan.rst b/docs-source/source/gan.rst new file mode 100644 index 000000000..05dabf5f7 --- /dev/null +++ b/docs-source/source/gan.rst @@ -0,0 +1,21 @@ +.. currentmodule:: slideflow.gan + +slideflow.gan +============= + +.. automodule:: slideflow.gan + :members: + +See :ref:`stylegan` for more information on working with GANs. + +StyleGAN2 Interpolator +---------------------- + +.. autoclass:: StyleGAN2Interpolator + :inherited-members: + +Utility functions +----------------- + +.. automodule:: slideflow.gan.utils + :members: \ No newline at end of file diff --git a/docs-source/source/grad.rst b/docs-source/source/grad.rst new file mode 100644 index 000000000..f6cbb7170 --- /dev/null +++ b/docs-source/source/grad.rst @@ -0,0 +1,25 @@ +.. currentmodule:: slideflow.grad + +slideflow.grad +============== + +This submodule contains tools for calculating and display pixel attribution, or +saliency, maps. See :ref:`saliency` for more information. + +.. autoclass:: SaliencyMap + :inherited-members: + +.. automodule:: slideflow.grad + :members: + +.. autofunction:: comparison_plot + +.. autofunction:: inferno + +.. autofunction:: multi_plot + +.. autofunction:: oranges + +.. autofunction:: overlay + +.. autofunction:: saliency_map_comparison \ No newline at end of file diff --git a/docs-source/source/heatmap.rst b/docs-source/source/heatmap.rst index 2c6a29c25..a4a6804f5 100644 --- a/docs-source/source/heatmap.rst +++ b/docs-source/source/heatmap.rst @@ -1,14 +1,22 @@ .. currentmodule:: slideflow -slideflow.heatmap -===================== +slideflow.Heatmap +================= -:class:`slideflow.Heatmap` uses a model to generate predictions across a whole-slide image through -progressive convolution. These prediction heatmaps can be interactively displayed or saved for later use. - -.. automodule: slideflow.heatmap +.. autoclass:: Heatmap -Heatmap +Methods ------- -.. autoclass:: Heatmap - :inherited-members: \ No newline at end of file + +.. autofunction:: slideflow.Heatmap.add_inset +.. autofunction:: slideflow.Heatmap.clear_insets +.. autofunction:: slideflow.Heatmap.generate +.. autofunction:: slideflow.Heatmap.load +.. autofunction:: slideflow.Heatmap.load_npz +.. autofunction:: slideflow.Heatmap.plot +.. autofunction:: slideflow.Heatmap.plot_thumbnail +.. autofunction:: slideflow.Heatmap.plot_with_logit_cmap +.. autofunction:: slideflow.Heatmap.plot_uncertainty +.. autofunction:: slideflow.Heatmap.save +.. autofunction:: slideflow.Heatmap.save_npz +.. autofunction:: slideflow.Heatmap.view \ No newline at end of file diff --git a/docs-source/source/heatmap_example.png b/docs-source/source/heatmap_example.png index f279c7d91..8eaf08c06 100644 Binary files a/docs-source/source/heatmap_example.png and b/docs-source/source/heatmap_example.png differ diff --git a/docs-source/source/heatmap_inset.jpg b/docs-source/source/heatmap_inset.jpg new file mode 100644 index 000000000..b1e0bbc13 Binary files /dev/null and b/docs-source/source/heatmap_inset.jpg differ diff --git a/docs-source/source/index.rst b/docs-source/source/index.rst index 8be9286f0..cd172de66 100644 --- a/docs-source/source/index.rst +++ b/docs-source/source/index.rst @@ -7,44 +7,77 @@ Slideflow Documentation ======================= -``slideflow`` is a Python package that provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch. +Slideflow is a Python package that provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch. -Slideflow includes tools for efficient whole-slide image processing, easy and highly customizable model training with uncertainty quantification (UQ), and a number of functional tools to assist with analysis and interpretability, including predictive heatmaps, mosaic maps, and more. It is built with both `Tensorflow/Keras `_ and `PyTorch `_ backends, with fully cross-compatible TFRecord data storage. +Slideflow includes tools for efficient whole-slide image processing, easy and highly customizable model training with uncertainty quantification (UQ), and a number of functional tools to assist with analysis and interpretability, including predictive heatmaps, mosaic maps, GANs, saliency maps, and more. It is built with both `Tensorflow/Keras `_ and `PyTorch `_ backends, with fully cross-compatible TFRecord data storage. -The ``slideflow`` package includes a ``Project`` class to help coordinate project organization and supervise execution of the pipeline. This documentation starts with a high-level overview of the pipeline, and will include examples of how to execute functions using the ``Project`` class. We also provide several tutorials with examples of how Slideflow can be used on your own data. +This documentation starts with a high-level overview of the pipeline and includes examples of how to perform common tasks using the ``Project`` helper class. We also provide several tutorials with examples of how Slideflow can be used and extended for additional functionality. .. toctree:: :maxdepth: 1 - :caption: Overview + :caption: Introduction installation - pipeline + overview + quickstart project_setup - validation - extract_tiles + datasets_and_val + slide_processing training evaluation - layer_activations - custom_loops + posthoc uq - clam + features + mil + ssl + stylegan + saliency + segmentation + cellseg + custom_loops + studio troubleshooting - appendix .. toctree:: :maxdepth: 1 - :caption: Source + :caption: Developer Notes + + tfrecords + dataloaders + custom_extractors + tile_labels + plugins + +.. toctree:: + :maxdepth: 1 + :caption: API + slideflow project dataset + dataset_features heatmap + model_params + mosaic + slidemap + biscuit + slideflow_cellseg + io io_tensorflow io_torch + gan + grad + mil_module model - mosaic + model_tensorflow + model_torch + norm + simclr slide + slide_qc stats util + studio_module .. toctree:: :maxdepth: 1 @@ -54,4 +87,7 @@ The ``slideflow`` package includes a ``Project`` class to help coordinate projec tutorial2 tutorial3 tutorial4 - tutorial5 \ No newline at end of file + tutorial5 + tutorial6 + tutorial7 + tutorial8 \ No newline at end of file diff --git a/docs-source/source/installation.rst b/docs-source/source/installation.rst index 9070fb5d3..8ff81844f 100644 --- a/docs-source/source/installation.rst +++ b/docs-source/source/installation.rst @@ -1,79 +1,125 @@ Installation ============ -Slideflow has been tested and is supported on the following systems: +.. figure:: https://github.com/user-attachments/assets/53d5c1f8-8fbc-4e0f-bd62-db16797492b0 -- Ubuntu 18.04 -- Ubuntu 20.04 -- Centos 7 -- Centos 8 -- Centos 8 Stream +Slideflow is tested on **Linux-based systems** (Ubuntu, CentOS, Red Hat, and Raspberry Pi OS) and **macOS** (Intel and Apple). Windows support is experimental. -Software Requirements -********************* +Requirements +************ + +- Python >= 3.7 (<3.10 if using `cuCIM `_) +- `PyTorch `_ (1.9+) *or* `Tensorflow `_ (2.5-2.11) + - Core functionality, including tile extraction, data processing, and tile-based model training, is supported for both PyTorch and Tensorflow. Additional advanced tools, such as Multiple-Instance Learning (MIL), GANs, and pretrained foundation models, require PyTorch. + +Optional +-------- + +- `Libvips >= 8.9 `_ (alternative slide reader, adds support for \*.scn, \*.mrxs, \*.ndpi, \*.vms, and \*.vmu files) +- Linear solver (for site-preserved cross-validation): + + - `CPLEX 20.1.0 `_ with `Python API `_ + - *or* `Pyomo `_ with `Bonmin `_ solver -- Python 3.7 - 3.10 -- `OpenSlide `_ -- `Libvips 8.9+ `_ -- `CPLEX 20.1.0 `_ with `Python API `_ [*optional*] - used for preserved-site cross-validation -- `QuPath `_ [*optional*] - used for ROI annotations -- `Tensorflow 2.5-2.8 `_ or `PyTorch 1.9-1.11 `_ Download with pip ***************** +Slideflow can be installed either with PyPI or as a Docker container. To install via pip: + .. code-block:: bash # Update to latest pip - $ pip install --upgrade pip + pip install --upgrade pip wheel + + # Current stable release, Tensorflow backend + pip install slideflow[tf] cucim cupy-cuda11x + + # Alternatively, install with PyTorch backend + pip install slideflow[torch] cucim cupy-cuda11x + +The ``cupy`` package name depends on the installed CUDA version; `see here `_ for installation instructions. ``cucim`` and ``cupy`` are not required if using Libvips. - # Current stable release - $ pip install slideflow Run a Docker container ********************** -The `Slideflow docker images `_ have been pre-configured with OpenSlide, Libvips, and either PyTorch 1.11 or Tensorflow 2.8. Using a preconfigured `Docker `_ container is the easiest way to get started with compatible dependencies and GPU support. +Alternatively, pre-configured `docker images `_ are available with cuCIM, Libvips, and either PyTorch 1.11 or Tensorflow 2.9 pre-installed. Using a preconfigured `Docker `_ container is the easiest way to get started with compatible dependencies and GPU support. -To install with the Tensorflow 2.8 backend: +To run a Docker container with the Tensorflow backend: .. code-block:: bash - $ docker pull jamesdolezal/slideflow:latest-tf - $ docker run -it --gpus all jamesdolezal/slideflow:latest-tf + docker pull jamesdolezal/slideflow:latest-tf + docker run -it --gpus all jamesdolezal/slideflow:latest-tf -To install with the PyTorch 1.11 backend: +To run a Docker container with the PyTorch backend: .. code-block:: bash - $ docker pull jamesdolezal/slideflow:latest-torch - $ docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch + docker pull jamesdolezal/slideflow:latest-torch + docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch Build from source ***************** -To build Slideflow from source, clone the repository from the project `Github page `_: +To build Slideflow from source, clone the repository from the project `Github page `_: .. code-block:: bash - $ git clone https://github.com/jamesdolezal/slideflow - $ cd slideflow - $ pip install -r requirements.txt - $ python setup.py bdist_wheel - $ pip install dist/slideflow-1.X.X-py3-any.whl + git clone https://github.com/slideflow/slideflow + cd slideflow + conda env create -f environment.yml + conda activate slideflow + python setup.py bdist_wheel + pip install dist/slideflow* cupy-cuda11x -.. warning:: - A bug in the pixman library (version=0.38) will corrupt downsampled slide images, resulting in large black boxes across the slide. We have provided a patch for version 0.38 that has been tested for Ubuntu, which is provided in the project `Github page `_ (``pixman_repair.sh``), although it may not be suitable for all environments and we make no guarantees regarding its use. The `Slideflow docker images `_ already have this applied. If you are installing from source, have pixman version 0.38, and are unable to apply this patch, the use of downsampled image layers must be disabled to avoid corruption (pass ``enable_downsample=False`` to tile extraction functions). -Changing backends +Extensions +********** + +The core Slideflow package is licensed under the **Apache-2.0** license. Additional functionality, such as pretrained foundation models, are distributed in separate packages according to their licensing terms. Available extensions include: + +- **Slideflow-GPL**: GPL-3.0 licensed extensions (`GitHub `__) + - Includes: `RetCCL `__, `CTransPath `__, and `CLAM `__. +- **Slideflow-NonCommercial**: CC BY-NC 4.0 licensed extensions for non-commercial use (`GitHub `__) + - Includes: `HistoSSL `__, `PLIP `__, `GigaPath `__, `UNI `__, `BISCUIT `__, and `StyleGAN3 `__. + +These extensions can be installed via pip. The GigaPath feature extractor has additional, more restrictive dependencies that must be installed separately. + +.. code-block:: bash + + # Install Slideflow-GPL and Slideflow-NonCommercial + pip install slideflow-gpl slideflow-noncommercial + + # Install GigaPath dependencies, if desired + pip install slideflow-noncommercial[gigapath] git+ssh://git@github.com/prov-gigapath/prov-gigapath + + +.. note:: + The Slideflow-GPL and Slideflow-NonCommercial extensions are not included in the default Slideflow package due to their licensing terms. Please review the licensing terms of each extension before use. + + +PyTorch vs. Tensorflow +********************** + +Slideflow supports both PyTorch and Tensorflow, with cross-compatible TFRecord storage. Slideflow will default to using PyTorch if both are available, but the backend can be manually specified using the environmental variable ``SF_BACKEND``. For example: + +.. code-block:: bash + + export SF_BACKEND=tensorflow + +.. _slide_backend: + +cuCIM vs. Libvips ***************** -The default backend for this package is Tensorflow/Keras, but a full PyTorch backend is also included, with a dedicated TFRecord reader/writer that ensures saved image tiles can be served to both Tensorflow and PyTorch models in cross-compatible fashion. +By default, Slideflow reads whole-slide images using `cuCIM `_. Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a `Libvips `_ backend, which adds support for \*.scn, \*.mrxs, \*.ndpi, \*.vms, and \*.vmu files. You can set the active slide backend with the environmental variable ``SF_SLIDE_BACKEND``: -If using the Tensorflow backend, PyTorch does not need to be installed; the reverse is true as well. +.. code-block:: bash -To switch backends, simply set the environmental variable ``SF_BACKEND`` equal to either ``torch`` or ``tensorflow``: + export SF_SLIDE_BACKEND=libvips -.. code-block:: console - export SF_BACKEND=torch \ No newline at end of file +.. warning:: + A bug in the pixman library (version=0.38) will corrupt downsampled slide images, resulting in large black boxes across the slide. We have provided a patch for version 0.38 that has been tested for Ubuntu, which is provided in the project `Github page `_ (``pixman_repair.sh``), although it may not be suitable for all environments and we make no guarantees regarding its use. The `Slideflow docker images `_ already have this applied. If you are installing from source, have pixman version 0.38, and are unable to apply this patch, the use of downsampled image layers must be disabled to avoid corruption (pass ``enable_downsample=False`` to tile extraction functions). diff --git a/docs-source/source/io.rst b/docs-source/source/io.rst new file mode 100644 index 000000000..7ce32cf68 --- /dev/null +++ b/docs-source/source/io.rst @@ -0,0 +1,34 @@ +.. currentmodule:: slideflow.io + +slideflow.io +============ + +This module contains utility functions for working with TFRecords, cross-compatible +with both Tensorflow and PyTorch. + +Functions included in this module assist with processing TFRecords, detecting image and data format, +extracting tiles, splitting and merging TFrecords, and a variety of other manipulations. + +Additional Tensorflow-specific TFRecord reading/writing utility functions are +available in :py:mod:`slideflow.io.tensorflow`, and additional PyTorch-specific +functions are in :py:mod:`slideflow.io.torch`. + +.. autofunction:: convert_dtype +.. autofunction:: detect_tfrecord_format +.. autofunction:: extract_tiles +.. autofunction:: get_locations_from_tfrecord +.. autofunction:: get_tfrecord_by_index +.. autofunction:: get_tfrecord_by_location +.. autofunction:: get_tfrecord_parser +.. autofunction:: get_tfrecord_length +.. autofunction:: read_and_return_record +.. autofunction:: serialized_record +.. autofunction:: tfrecord_has_locations +.. autofunction:: update_manifest_at_dir +.. autofunction:: write_tfrecords_multi +.. autofunction:: write_tfrecords_single +.. autofunction:: write_tfrecords_merge + +slideflow.io.preservedsite +************************** +.. autofunction:: slideflow.io.preservedsite.generate_crossfolds \ No newline at end of file diff --git a/docs-source/source/io_tensorflow.rst b/docs-source/source/io_tensorflow.rst index 9fcd4ed6f..92dd7c2b8 100644 --- a/docs-source/source/io_tensorflow.rst +++ b/docs-source/source/io_tensorflow.rst @@ -3,17 +3,10 @@ slideflow.io.tensorflow ======================= -This module contains functions for processing TFRecords, including detecting contents and image format of saved -TFRecords, extracting tiles from TFRecords, splitting and merging TFRecrds, and a variety of other manipulations. - -The more important compontent of this module, however, is the :func:`slideflow.io.tensorflow.interleave` function, -which interleaves a set of tfrecords together into a :class:`tf.data.Datasets` object that can be used for training. -This interleaving can include patient or category-level balancing for returned batches (see :ref:`balancing`). +TFRecord interleaving in the Tensorflow backend is accomplished with :func:`slideflow.io.tensorflow.interleave`, which interleaves a set of tfrecords together into a :class:`tf.data.Datasets` object that can be used for training. This interleaving can include patient or category-level balancing for returned batches (see :ref:`balancing`). .. note:: - The TFRecord reading and interleaving implemented in this module is only compatible with Tensorflow models. - The :mod:`slideflow.io.torch` module includes an optimized, PyTorch-specific TFRecord reader based on a modified - version of the tfrecord reader/writer at: https://github.com/vahidk/tfrecord. + The TFRecord reading and interleaving implemented in this module is only compatible with Tensorflow models. The :mod:`slideflow.io.torch` module includes a PyTorch-specific TFRecord reader. .. automodule:: slideflow.io.tensorflow :members: \ No newline at end of file diff --git a/docs-source/source/io_torch.rst b/docs-source/source/io_torch.rst index 76ef6e8d1..3a77a0270 100644 --- a/docs-source/source/io_torch.rst +++ b/docs-source/source/io_torch.rst @@ -10,4 +10,9 @@ interleaving is supervised by :func:`slideflow.io.torch.interleave`, while the :func:`slideflow.io.torch.interleave_dataloader` function provides a PyTorch DataLoader object which can be directly used. .. automodule:: slideflow.io.torch - :members: \ No newline at end of file + :members: + :exclude-members: StyleGAN2Interleaver, TileLabelInterleaver, InterleaveIterator, IndexedInterleaver + +.. autoclass:: slideflow.io.torch.InterleaveIterator + +.. autoclass:: slideflow.io.torch.IndexedInterleaver \ No newline at end of file diff --git a/docs-source/source/layer_activations.rst b/docs-source/source/layer_activations.rst deleted file mode 100644 index dcd8dada9..000000000 --- a/docs-source/source/layer_activations.rst +++ /dev/null @@ -1,136 +0,0 @@ -Features / layer activations -============================ - -Once a model has been fully trained and evaluated, you may use the model to generate features from layer activations to gain better insight into the kinds of image features the model has learned. - -Working with Layer Features -*************************** - -To work with features / intermediate layer activations calculated from a model, the :class:`slideflow.model.Features` class will generate features on a tile or slide level, and the :class:`slideflow.DatasetFeatures` class will generate features for an entire dataset. - -DatasetFeatures ---------------- - -The easiest way to get started with intermediate layer activations is the :class:`slideflow.DatasetFeatures` class, which is used to calculate and examine activations across an entire dataset. Instancing the class supervises the calculation and caching of layer activations, which can then be exported, viewed (as a mosaic map), or analyzed with various statistical methods. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class. - -.. code-block:: python - - features = P.generate_features('/path/to/trained_model') - -Alternatively, you can create an instance of this class directly: - -.. code-block:: python - - from slideflow.model import DatasetFeatures - - dataset = P.dataset(299, 302) - labels, unique_outcomes = dataset.labels('HPV') - - features = DatasetFeatures( - model='/path/to/trained_model', - dataset=dataset, - annotations=labels - ) - -Tile-level feature activations for each slide can be accessed directly from ``slideflow.DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Logits are stored in ``slideflow.DatasetFeatures.logits``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_logits)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``slideflow.DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``). - -To return the average logits value for each slide (averaged across constituent tiles), use :func:`slideflow.DatasetFeatures.logits_mean`. Similarly, :func:`slideflow.DatasetFeatures.logits_predict` can be used to generate final slide-level logit predictions. - -Features across categories can be statistically compared using :func:`slideflow.DatasetFeatures.stats`, which will calculate and save statistics to a specified directory. - -.. code-block:: python - - features.stats('/outdir', method='mean') - -To compare layer features across outcome categories and find features which differ significantly across categories, use the :func:`slideflow.DatasetFeatures.box_plots` function. For example, to generate boxplots for the first 100 features: - -.. code-block:: python - - features.box_plots(range(100), '/outdir') - -.. image:: boxplot_example.png - -Many other functions are available, as described in the documentation, :class:`slideflow.DatasetFeatures`. - -Features --------- - -The :class:`slideflow.model.Features` class can be used to generate layer activations / features for a single batch of images. For example, to calculate features for a batch of images while looping through a dataset: - -.. code-block:: python - - from slideflow.model import Features - - features = Features(layer='postconv') - for img_batch in dataset: - postconv_features = features(img_batch) - -You can choose to return features from any combination of intermediate layers by passing layer name(s) to the argument ``layer``. The interface can also return logits, by passing ``include_logits=True``. - -To calculate layer features across an entire slide, the same interface can be called on a :class:`slideflow.WSI` object, generating a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``: - -.. code-block:: python - - from slideflow import WSI - from slideflow.model import Features - - slide = WSI(...) - interface = Features('/model/path', layers='postconv') - feature_grid = interface(slide) - - -Mosaic maps -*********** - -To visualize the distribution of features across a dataset, a mosaic map can be created from a :class:`slideflow.DatasetFeatures` instance. Mosaic maps are generated by using features (layer activations) from a dataset, performing dimensionality reduction (UMAP) on the activations (via :class:`slideflow.SlideMap`), and overlaying tile images onto the UMAP (via :class:`slideflow.Mosaic`). By default, the post-convolutional ('postconv') layer is used when calculating features, but any combination of other layers can be also be used. The ``Project`` class has a function which can supervise these steps automatically and save the final figure to the project directory. - -.. code-block:: python - - features = P.generate_features('/path/to/trained_model') - mosaic = project.generate_mosaic(features) - mosaic.save('mosaic.png') - -.. autofunction:: slideflow.Project.generate_mosaic - :noindex: - -.. image:: mosaic_example.png - -To plot the underlying UMAP without overlaid images, the :class:`slideflow.SlideMap` used to create the mosaic map can be accesssed via ``slideflow.Mosaic.slide_map``. You can then use the :func:`slideflow.SlideMap.save` function to save the plot: - -.. code-block:: python - - mosaic = project.generate_mosaic(...) - mosiac.slide_map.save('umap.png') - -Tiles on the plot can be labeled using slide labels from the project annotations file, using the function :func:`slideflow.SlideMap.label_by_slide`. For example, the following will label the slide map according to the categorical outcome "HPV_status" in the project annotations file: - -.. code-block:: python - - # Get slide labels - dataset = project.dataset(tile_px=299, tile_um=302) - labels, unique_lables = dataset.labels('HPV_status') - - # Create the mosaic map and access the underlying SlideMap - mosaic = project.generate_mosaic(...) - - # Label the slide map with our outcome - mosiac.slide_map.label_by_slide(labels) - - # Save - mosiac.slide_map.save('umap_labeled.png') - -By default, all tiles in a dataset (which may be hundreds of thousands or millions of images) will be mapped onto the mosaic map. Instead of mapping all tiles within a slide, you can alternatively choose to map only a single tile per slide with the argument ``map_slide='centroid'``. This will calculate the tile nearest to centroid for each slide and display only this tile: - -.. code-block:: python - - # Create the mosaic map and access the underlying SlideMap - mosaic = project.generate_mosaic(..., map_slide='centroid') - -There are many additional arguments that can be provided to the :meth:`slideflow.Project.generate_mosaic()` function to customize the mosaic and UMAP plots, and many additional functions that can be applied to :class:`slideflow.Mosaic` and :class:`slideflow.SlideMap`. For example, it may be interesting to view a UMAP of tiles with an added third dimension, such as the activation value of a particular penultimate layer node. With this kind of plot, one can visualize how the activation of a particular node varies across the UMAP. To make such a plot, use the ``save_3d_plot`` function of the ``SlideMap``: - -.. code-block:: python - - mosaic = project.generate_mosaic(...) - mosiac.slide_map.save_3d_plot('3d_plot.png', feature=497) - -.. image:: 3d_umap.png diff --git a/docs-source/source/mil.rst b/docs-source/source/mil.rst new file mode 100644 index 000000000..74ef4d4e7 --- /dev/null +++ b/docs-source/source/mil.rst @@ -0,0 +1,332 @@ +.. _mil: + +Multiple-Instance Learning (MIL) +================================ + +In addition to standard tile-based neural networks, Slideflow also supports training multiple-instance learning (MIL) models. Several architectures are available, including `attention-based MIL `_ (``"Attention_MIL"``), `CLAM `_ (``"CLAM_SB",`` ``"CLAM_MB"``, ``"MIL_fc"``, ``"MIL_fc_mc"``), `TransMIL `_ (``"TransMIL"``), and `HistoBistro Transformer `_ (``"bistro.transformer"``). Custom architectures can also be trained. MIL training requires PyTorch. + +Skip to :ref:`tutorial8` for a complete example of MIL training. + +See :ref:`mil_api` for more information on the MIL API. + +Generating Features +******************* + +The first step in MIL model development is generating features from image tiles, as discussed in the :ref:`features` section. Features from whole-slide images are exported as "bags" of features, where each bag contains a set of features from a single slide. Each bag is a PyTorch tensor saved in ``*.pt`` format. Bags are saved in a directory, and the directory path is passed to the MIL model during training and evaluation. + +Training +******** + +Model Configuration +------------------- + +To train an MIL model using exported features, first prepare an MIL configuration using :func:`slideflow.mil.mil_config`. + +The first argument to this function is the model architecture (which can be a name or a custom ``torch.nn.Module`` model), and the remaining arguments are used to configure the training process, such as learning rate and number of epochs. Training is executed using `FastAI `_ with `1cycle learning rate scheduling `_. + +.. code-block:: python + + import slideflow as sf + from slideflow.mil import mil_config + + config = mil_config('attention_mil', lr=1e-3) + +Available models out-of-the-box include `attention-based MIL `_ (``"Attention_MIL"``), `transformer MIL `_ (``"TransMIL"``), and `HistoBistro Transformer `_ (``"bistro.transformer"``). `CLAM `_ (``"CLAM_SB",`` ``"CLAM_MB"``, ``"MIL_fc"``, ``"MIL_fc_mc"``) models are available through ``slideflow-gpl``: + +.. code-block:: bash + + pip install slideflow-gpl + +Custom MIL models can also be trained with this API, as discussed :ref:`below `. + + +Classification & Regression +--------------------------- + +MIL models can be trained for both classification and regression tasks. The type of outcome is determined through the loss function, which defaults to ``"cross_entropy"``. To train a model for regression, set the loss function to one of the following regression losses, and ensure that your outcome labels are continuous. You can also train to multiple outcomes by passing a list of outcome names. + +- **"mse"** (``nn.CrossEntropyLoss``): Mean squared error. +- **"mae"** (``nn.L1Loss``): Mean absolute error. +- **"huber"** (``nn.SmoothL1Loss``): Huber loss. + +.. code-block:: python + + # Prepare a regression-compatible MIL configuration + config = mil_config('attention_mil', lr=1e-3, loss='mse') + + # Train the model + project.train_mil( + config=config, + ..., + outcomes=['age', 'grade'] + ) + + +Training an MIL Model +--------------------- + +Next, prepare a :ref:`training and validation dataset ` and use :func:`slideflow.Project.train_mil` to start training. For example, to train a model using three-fold cross-validation to the outcome "HPV_status": + +.. code-block:: python + + ... + + # Prepare a project and dataset + P = sf.Project(...) + full_dataset = dataset = P.dataset(tile_px=299, tile_um=302) + + # Split the dataset using three-fold, site-preserved cross-validation + splits = full_dataset.kfold_split( + k=3, + labels='HPV_status', + preserved_site=True + ) + + # Train on each cross-fold + for train, val in splits: + P.train_mil( + config=config, + outcomes='HPV_status', + train_dataset=train, + val_dataset=val, + bags='/path/to/bag_directory' + ) + +Model training statistics, including validation performance (AUROC, AP) and predictions on the validation dataset, will be saved in an ``mil`` subfolder within the main project directory. + +If you are training an attention-based MIL model (``attention_mil``, ``clam_sb``, ``clam_mb``), heatmaps of attention can be generated for each slide in the validation dataset by using the argument ``attention_heatmaps=True``. You can customize these heatmaps with ``interpolation`` and ``cmap`` arguments to control the heatmap interpolation and colormap, respectively. + +.. code-block:: python + + # Generate attention heatmaps, + # using the 'magma' colormap and no interpolation. + P.train_mil( + attention_heatmaps=True, + cmap='magma', + interpolation=None + ) + +Hyperparameters, model configuration, and feature extractor information is logged to ``mil_params.json`` in the model directory. This file also contains information about the input and output shapes of the MIL network and outcome labels. An example file is shown below. + +.. code-block:: json + + { + "trainer": "fastai", + "params": { + + }, + "outcomes": "histology", + "outcome_labels": { + "0": "Adenocarcinoma", + "1": "Squamous" + }, + "bags": "/mnt/data/projects/example_project/bags/simclr-263510/", + "input_shape": 1024, + "output_shape": 2, + "bags_encoder": { + "extractor": { + "class": "slideflow.model.extractors.simclr.SimCLR_Features", + "kwargs": { + "center_crop": false, + "ckpt": "/mnt/data/projects/example_project/simclr/00001-EXAMPLE/ckpt-263510.ckpt" + } + }, + "normalizer": null, + "num_features": 1024, + "tile_px": 299, + "tile_um": 302 + } + } + +.. _multimag: + +Multi-Magnification MIL +----------------------- + +Slideflow 2.2 introduced a multi-magnification, multi-modal MIL model, ``MultiModal_Attention_MIL`` (``"mm_attention_mil"``). This late-fusion multimodal model is based on standard attention-based MIL, but accepts multiple input modalities (e.g., multiple magnifications) simultaneously. Each input modality is processed by a separate encoder network and a separate attention module. The attention-weighted features from each modality are then concatenated and passed to a fully-connected layer. + +Multimodal models are trained using the same API as standard MIL models. Modalities are specified using the ``bags`` argument to :func:`slideflow.Project.train_mil`, where the number of modes is determined by the number of bag directories provided. Within each bag directory, bags should be generated using the same feature extractor and at the same magnification, but feature extractors and magnifications can vary between bag directories. + +For example, to train a multimodal model using two magnifications, you would pass two bag paths to the model. In this case, the ``/path/to/bags_10x`` directory contains bags generated from a 10x feature extractor, and the ``/path/to/bags_40x`` directory contains bags generated from a 40x feature extractor. + +.. code-block:: python + + # Configure a multimodal MIL model. + config = mil_config('mm_attention_mil', lr=1e-4) + + # Set the bags paths for each modality. + bags_10x = '/path/to/bags_10x' + bags_40x = '/path/to/bags_40x' + + P.train_mil( + config=config, + outcomes='HPV_status', + train_dataset=train, + val_dataset=val, + bags=[bags_10x, bags_40x] + ) + +You can use any number of modalities, and the feature extractors for each modality can be different. For example, you could train a multimodal model using features from a custom SimCLR model at 5x and features from a pretrained CTransPath model at 20x. + +The feature extractors used for each modality, as specified in the ``bags_config.json`` files in the bag directories, will be logged in the final ``mil_params.json`` file. Multimodal MIL models can be interactively viewed in :ref:`Slideflow Studio `, allowing you to visualize the attention weights for each modality separately. + +.. _custom_mil: + +Custom Architectures +-------------------- + +Training custom MIL models is straightforward with Slideflow, particularly if your model can adhere to a few simple guidelines: + +- Initialized with ``(num_feats, num_outputs)`` (e.g., ``Attention_MIL(768, 2)``) +- Input is feature bags with shape ``(batch, num_tiles, num_feats)``. If the model needs a "lens" input, then the model attribute ``use_lens`` should be True. +- Has a ``relocate()`` function that moves the model to detected device/GPU +- Ability to get attention through one of two methods: + - ``forward()`` function includes an optional ``return_attention`` argument, which if True returns attention scores after model output + - Has a ``calculate_attention()`` function that returns attention scores + +If the above applies to your model, you can train it simply by passing it as the first argument to :func:`slideflow.mil.mil_config`. + +.. code-block:: python + + import slideflow as sf + from slideflow.mil import mil_config + from my_module import CustomMIL + + config = mil_config(CustomMIL, lr=1e-3) + + +For larger projects, or if you are designing a plugin/extension for Slideflow, custom models can be registered to facilitate easy creation. If your model adheres to the above guidelines, you can register it for use with the following: + +.. code-block:: python + + from slideflow.mil import register_model + + @register_model + def my_model(): + return MyModelClass + + +You can then use your model when creating an MIL configuration: + +.. code-block:: python + + config = sf.mil.mil_config('my_model', ...) + + +If the above guidelines do *not* apply to your model, or if you want to customize model logic or functionality, you can supply a custom MIL configuration class that will supervise model building and dataset preparation. Your custom configuration class should inherit ``slideflow.mil.MILModelConfig``, and methods in this class can be overloaded to provide additional functionality. For example, to create an MIL configuration that uses a custom loss and custom metrics: + +.. code-block:: python + + from slideflow.mil import MILModelConfig + + class MyModelConfig(MILModelConfig): + + @property + def loss_fn(self): + return my_custom_loss + + def get_metrics(self): + return [my_metric1, my_metric2] + + +When registering your model, you should specify that it should use your custom configuration: + +.. code-block:: python + + @register_model(config=MyModelConfig) + def my_model(): + return MyModelClass + + +For an example of how to utilize model registration and configuration customization, see our `CLAM implementation `__ available through ``slideflow-gpl``. + + +Evaluation +********** + +To evaluate a saved MIL model on an external dataset, first extract features from a dataset, then use :func:`slideflow.Project.evaluate_mil`, which displays evaluation metrics and returns predictions as a DataFrame. + +.. code-block:: python + + import slideflow as sf + + # Prepare a project and dataset + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Generate features using CTransPath + ctranspath = sf.build_feature_extractor('ctranspath', resize=True) + features = sf.DatasetFeatures(ctranspath, dataset=dataset) + features.to_torch('/path/to/bag_directory') + + # Evaluate a saved MIL model + df = P.evaluate_mil( + '/path/to/saved_model' + outcomes='HPV_status', + dataset=dataset, + bags='/path/to/bag_directory', + ) + +As with training, attention heatmaps can be generated for attention-based MIL models with the argument ``attention_heatmaps=True``, and these can be customized using ``cmap`` and ``interpolation`` arguments. + +.. image:: att_heatmap.jpg + +Generating Predictions +********************** + +In addition to generating slide-level predictions during training and evaluation, you can also generate tile-level predictions and attention scores for a dataset using :func:`slideflow.mil.get_mil_tile_predictions`. This function returns a DataFrame containing tile-level predictions and attention. + +.. code-block:: python + + >>> from slideflow.mil import get_mil_tile_predictions + >>> df = get_mil_tile_predictions(model, dataset, bags) + >>> df + slide loc_x loc_y ... y_pred3 y_pred4 y_pred5 + 0 TCGA-4V-A9QI-01Z-0... 2210 7349 ... 0.181155 0.468446 0.070175 + 1 TCGA-4V-A9QI-01Z-0... 5795 1971 ... 0.243721 0.131991 0.009169 + 2 TCGA-4V-A9QI-01Z-0... 6273 5437 ... 0.096196 0.583367 0.090258 + 3 TCGA-4V-A9QI-01Z-0... 2330 3047 ... 0.056426 0.264386 0.300199 + 4 TCGA-4V-A9QI-01Z-0... 3644 3525 ... 0.134535 0.534353 0.013619 + ... ... ... ... ... ... ... ... + 391809 TCGA-4X-A9FA-01Z-0... 6034 3352 ... 0.004119 0.003636 0.005673 + 391810 TCGA-4X-A9FA-01Z-0... 6643 1401 ... 0.012790 0.010269 0.011726 + 391811 TCGA-4X-A9FA-01Z-0... 5546 2011 ... 0.009777 0.013556 0.025255 + 391812 TCGA-4X-A9FA-01Z-0... 6277 2864 ... 0.026638 0.018499 0.031061 + 391813 TCGA-4X-A9FA-01Z-0... 4083 4205 ... 0.009875 0.009582 0.022125 + + [391814 rows x 15 columns] + + +Single-Slide Inference +********************** + +Predictions can also be generated for individual slides, without requiring the user to manually generate feature bags. Use :func:`slideflow.model.predict_slide` to generate predictions for a single slide. The first argument is th path to the saved MIL model (a directory containing ``mil_params.json``), and the second argument can either be a path to a slide or a loaded :class:`sf.WSI` object. + +.. code-block:: python + + from slideflow.mil import predict_slide + from slideflow.slide import qc + + # Load a slide and apply Otsu thresholding + slide = '/path/to/slide.svs' + wsi = sf.WSI(slide, tile_px=299, tile_um=302) + wsi.qc(qc.Otsu()) + + # Calculate predictions and attention heatmap + model = '/path/to/mil_model' + y_pred, y_att = predict_slide(model, wsi) + + +The function will return a tuple of predictions and attention heatmaps. If the model is not attention-based, the attention heatmap will be ``None``. To calculate attention for a model, set ``attention=True``: + +.. code-block:: python + + y_pred, y_att = predict_slide(model, slide, attention=True) + +The returned attention values will be a masked ``numpy.ndarray`` with the same shape as the slide tile extraction grid. Unused tiles will have masked attention values. + + +Visualizing Predictions +*********************** + +Heatmaps of attention and tile-level predictions can be interactively visualized in Slideflow Studio by enabling the Multiple-Instance Learning extension (new in Slideflow 2.1.0). This extension is discussed in more detail in the :ref:`extensions` section. \ No newline at end of file diff --git a/docs-source/source/mil_module.rst b/docs-source/source/mil_module.rst new file mode 100644 index 000000000..a42d054d0 --- /dev/null +++ b/docs-source/source/mil_module.rst @@ -0,0 +1,102 @@ +.. _mil_api: + +.. currentmodule:: slideflow.mil + +slideflow.mil +============== + +This submodule contains tools for multiple-instance learning (MIL) model training and evaluation. See :ref:`mil` for more information. A summary of the API is given below. + +**Training:** + - :func:`train_mil()`: Train an MIL model, using an MIL configuration, Datasets, and a directory of bags. + - :func:`build_fastai_learner()`: Build and return the FastAI Learner, but do not execute training. Useful for customizing training. + - :func:`build_multimodal_learner()`: Build and return a FastAI Learner designed for multi-modal/multi-magnification input. + +**Evaluation/Inference:** + - :func:`eval_mil()`: Evaluate an MIL model using a path to a saved model, a Dataset, and path to bags. Generates metrics. + - :func:`predict_mil()`: Generate predictions from an MIL model and saved bags. Returns a pandas dataframe. + - :func:`predict_multimodal_mil()`: Generate predictions from a multimodal MIL model. Returns a dataframe. + - :func:`predict_slide()`: Generate MIL predictions for a single slide. Returns a 2D array of predictions and attention. + - :func:`predict_from_bags()`: Low-level interface for generating predictions from a loaded MIL model and pre-loaded bag Tensors. + - :func:`predict_from_multimodal_bags()`: Low-level interface for generating multimodal predictions from a loaded MIL model and bag Tensors. + - :func:`get_mil_tile_predictions()`: Get tile-level predictions and attention from a saved MIL model for a given Dataset and saved bags. + - :func:`generate_attention_heatmaps()`: Generate and save attention heatmaps. + - :func:`generate_mil_features()`: Get last-layer activations from an MIL model. Returns an MILFeatures object. + + +Main functions +************** + +.. autofunction:: mil_config +.. autofunction:: train_mil +.. autofunction:: build_fastai_learner +.. autofunction:: build_multimodal_learner +.. autofunction:: eval_mil +.. autofunction:: predict_mil +.. autofunction:: predict_multimodal_mil +.. autofunction:: predict_from_bags +.. autofunction:: predict_from_multimodal_bags +.. autofunction:: predict_slide +.. autofunction:: get_mil_tile_predictions +.. autofunction:: generate_attention_heatmaps +.. autofunction:: generate_mil_features + +TrainerConfig +************* + +.. autoclass:: slideflow.mil.TrainerConfig +.. autosummary:: + + TrainerConfig.model_fn + TrainerConfig.loss_fn + TrainerConfig.is_multimodal + TrainerConfig.model_type + +.. autofunction:: slideflow.mil.TrainerConfig.to_dict +.. autofunction:: slideflow.mil.TrainerConfig.json_dump +.. autofunction:: slideflow.mil.TrainerConfig.is_classification +.. autofunction:: slideflow.mil.TrainerConfig.get_metrics +.. autofunction:: slideflow.mil.TrainerConfig.prepare_training +.. autofunction:: slideflow.mil.TrainerConfig.build_model +.. autofunction:: slideflow.mil.TrainerConfig.predict +.. autofunction:: slideflow.mil.TrainerConfig.batched_predict +.. autofunction:: slideflow.mil.TrainerConfig.train +.. autofunction:: slideflow.mil.TrainerConfig.eval +.. autofunction:: slideflow.mil.TrainerConfig.build_train_dataloader +.. autofunction:: slideflow.mil.TrainerConfig.build_val_dataloader +.. autofunction:: slideflow.mil.TrainerConfig.inspect_batch +.. autofunction:: slideflow.mil.TrainerConfig.run_metrics + +MILModelConfig +************** + +.. autoclass:: MILModelConfig +.. autosummary:: + + MILModelConfig.apply_softmax + MILModelConfig.loss_fn + MILModelConfig.model_fn + MILModelConfig.model_type + MILModelConfig.is_multimodal + +.. autofunction:: slideflow.mil.MILModelConfig.is_classification +.. autofunction:: slideflow.mil.MILModelConfig.to_dict +.. autofunction:: slideflow.mil.MILModelConfig.inspect_batch +.. autofunction:: slideflow.mil.MILModelConfig.build_model +.. autofunction:: slideflow.mil.MILModelConfig.predict +.. autofunction:: slideflow.mil.MILModelConfig.batched_predict +.. autofunction:: slideflow.mil.MILModelConfig.run_metrics + +CLAMModelConfig +*************** + +The CLAM model configuration class requires ``slideflow-gpl``, which can be installed with: + +.. code-block:: bash + + pip install slideflow-gpl + +Once installed, the class is available at ``slideflow.clam.CLAMModelConfig``. + +.. autoclass:: slideflow.clam.CLAMModelConfig + diff --git a/docs-source/source/model.rst b/docs-source/source/model.rst index ec987dacf..ce1302ac8 100644 --- a/docs-source/source/model.rst +++ b/docs-source/source/model.rst @@ -5,9 +5,9 @@ slideflow.model This module provides the :class:`ModelParams` class to organize model and training parameters/hyperparameters and assist with model building, as well as the :class:`Trainer` class that -executes model training and evaluation. :class:`LinearTrainer` and :class:`CPHTrainer` -are extensions of this class, supporting linear and Cox Proportional Hazards outcomes, respectively. The function -:func:`trainer_from_hp` can choose and return the correct model instance based on the provided +executes model training and evaluation. :class:`RegressionTrainer` and :class:`SurvivalTrainer` +are extensions of this class, supporting regression and Cox Proportional Hazards outcomes, respectively. The function +:func:`build_trainer` can choose and return the correct model instance based on the provided hyperparameters. .. note:: @@ -15,64 +15,39 @@ hyperparameters. :mod:`slideflow.model.tensorflow` or :mod:`slideflow.model.torch` according to the currently active backend, indicated by the environmental variable ``SF_BACKEND``. -Configuring and training models -******************************* - -:class:`slideflow.model.ModelParams` will build models according to a set of model parameters and a given set of -outcome labels. To change the core image convolutional model to another architecture, set the ``model`` parameter -to the custom model class. - -.. code-block:: python - - import CustomModel - from slideflow.model import ModelParams - - mp = ModelParams(model=CustomModel, ...) - -Working with layer activations -****************************** - -:class:`slideflow.model.Features` creates an interface to efficiently generate features/layer activations and logits -from either a batch of images (returning a batch of activations/logits) or a whole-slide image (returning a grid of -activations/logits). - -:class:`slideflow.DatasetFeatures` calculates features and logits for an entire dataset, storing -result arrays into a dictionary mapping slide names to the generated activations. This buffer of whole-dataset -activations can then be used for functions requiring analysis of whole-dataset activations, including -:class:`slideflow.SlideMap` and :class:`slideflow.mosiac.Mosaic`. - -.. automodule: slideflow.model - -ModelParams -*********** -.. autoclass:: ModelParams - :inherited-members: +See :ref:`training` for a detailed look at how to train models. Trainer -*********** +******* .. autoclass:: Trainer - :inherited-members: +.. autofunction:: slideflow.model.Trainer.load +.. autofunction:: slideflow.model.Trainer.evaluate +.. autofunction:: slideflow.model.Trainer.predict +.. autofunction:: slideflow.model.Trainer.train -LinearTrainer -************* -.. autoclass:: LinearTrainer - :inherited-members: +RegressionTrainer +***************** +.. autoclass:: RegressionTrainer -CPHTrainer -*********** -.. autoclass:: CPHTrainer - :inherited-members: - -trainer_from_hp +SurvivalTrainer *************** -.. autofunction:: trainer_from_hp +.. autoclass:: SurvivalTrainer Features -*********** +******** .. autoclass:: Features - :inherited-members: +.. autofunction:: slideflow.model.Features.from_model +.. autofunction:: slideflow.model.Features.__call__ -DatasetFeatures -**************** -.. autoclass:: DatasetFeatures - :inherited-members: \ No newline at end of file +Other functions +*************** +.. autofunction:: build_trainer +.. autofunction:: build_feature_extractor +.. autofunction:: list_extractors +.. autofunction:: load +.. autofunction:: is_tensorflow_model +.. autofunction:: is_tensorflow_tensor +.. autofunction:: is_torch_model +.. autofunction:: is_torch_tensor +.. autofunction:: read_hp_sweep +.. autofunction:: rebuild_extractor \ No newline at end of file diff --git a/docs-source/source/model_params.rst b/docs-source/source/model_params.rst new file mode 100644 index 000000000..041374bdd --- /dev/null +++ b/docs-source/source/model_params.rst @@ -0,0 +1,39 @@ +.. currentmodule:: slideflow + +.. _model_params: + +slideflow.ModelParams +===================== + +The :class:`ModelParams` class organizes model and training parameters/hyperparameters and assists with model building. + +See :ref:`training` for a detailed look at how to train models. + +ModelParams +*********** +.. autoclass:: ModelParams +.. autofunction:: slideflow.ModelParams.to_dict +.. autofunction:: slideflow.ModelParams.get_normalizer +.. autofunction:: slideflow.ModelParams.validate +.. autofunction:: slideflow.ModelParams.model_type + +Mini-batch balancing +******************** + +During training, mini-batch balancing can be customized to assist with increasing representation of sparse outcomes or small slides. Five mini-batch balancing methods are available when configuring :class:`slideflow.ModelParams`, set through the parameters ``training_balance`` and ``validation_balance``. These are ``'tile'``, ``'category'``, ``'patient'``, ``'slide'``, and ``'none'``. + +If **tile-level balancing** ("tile") is used, tiles will be selected randomly from the population of all extracted tiles. + +If **slide-based balancing** ("patient") is used, batches will contain equal representation of images from each slide. + +If **patient-based balancing** ("patient") is used, batches will balance image tiles across patients. The balancing is similar to slide-based balancing, except across patients (as each patient may have more than one slide). + +If **category-based balancing** ("category") is used, batches will contain equal representation from each outcome category. + +If **no balancing** is performed, batches will be assembled by randomly selecting from TFRecords. This is equivalent to slide-based balancing if each slide has its own TFRecord (default behavior). + +See :ref:`balancing` for more discussion on sampling and mini-batch balancing. + +.. note:: + + If you are :ref:`using a Trainer ` to train your models, you can further customize the mini-batch balancing strategy by using :meth:`slideflow.Dataset.balance` on your training and/or validation datasets. \ No newline at end of file diff --git a/docs-source/source/model_tensorflow.rst b/docs-source/source/model_tensorflow.rst new file mode 100644 index 000000000..cd53d4e62 --- /dev/null +++ b/docs-source/source/model_tensorflow.rst @@ -0,0 +1,11 @@ +.. currentmodule:: slideflow.model.tensorflow + +slideflow.model.tensorflow +========================== + +This submodule contains Tensorflow-specific utility functions when working in the Tensorflow backend. + +.. autofunction:: slideflow.model.tensorflow.flatten +.. autofunction:: slideflow.model.tensorflow.load +.. autofunction:: slideflow.model.tensorflow.log_manifest +.. autofunction:: slideflow.model.tensorflow.unwrap diff --git a/docs-source/source/model_torch.rst b/docs-source/source/model_torch.rst new file mode 100644 index 000000000..9b374a8d1 --- /dev/null +++ b/docs-source/source/model_torch.rst @@ -0,0 +1,10 @@ +.. currentmodule:: slideflow.model.torch + +slideflow.model.torch +========================== + +This submodule contains PyTorch-specific utility functions when working in the PyTorch backend. + +.. autofunction:: slideflow.model.torch.lazy_load_pretrained +.. autofunction:: slideflow.model.torch.load +.. autofunction:: slideflow.model.torch.log_manifest diff --git a/docs-source/source/mosaic.rst b/docs-source/source/mosaic.rst index 19feb91c9..1cd293503 100644 --- a/docs-source/source/mosaic.rst +++ b/docs-source/source/mosaic.rst @@ -1,10 +1,11 @@ -.. currentmodule:: slideflow.mosaic +.. currentmodule:: slideflow -slideflow.mosaic +.. _mosaic: + +slideflow.Mosaic ================ -This module provides the :class:`slideflow.Mosaic` class, which plots tile images onto a map of slides, -generating mosaic maps. +:class:`slideflow.Mosaic` plots tile images onto a map of slides, generating a mosaic map. The idea of a mosaic map is to visualize image feature variation across slides and among categories, in an attempt to better understand the kinds of image features discriminative models might be using to generate class predictions. @@ -17,10 +18,16 @@ An example of a mosaic map can be found in Figure 4 of `this paper `_, without the use of feature inversion. -.. automodule: slideflow.mosaic +See :ref:`mosaic_map` for an example of how a mosaic map can be used in the context of a project. + +.. autoclass:: Mosaic -Mosaic ------- +Methods +------- -.. autoclass:: slideflow.Mosaic - :inherited-members: \ No newline at end of file +.. autofunction:: slideflow.Mosaic.generate_grid +.. autofunction:: slideflow.Mosaic.plot +.. autofunction:: slideflow.Mosaic.points_at_grid_index +.. autofunction:: slideflow.Mosaic.save +.. autofunction:: slideflow.Mosaic.save_report +.. autofunction:: slideflow.Mosaic.view \ No newline at end of file diff --git a/docs-source/source/mosaic_example.png b/docs-source/source/mosaic_example.png index 84f0bb5ad..fa94695a7 100644 Binary files a/docs-source/source/mosaic_example.png and b/docs-source/source/mosaic_example.png differ diff --git a/docs-source/source/norm.rst b/docs-source/source/norm.rst new file mode 100644 index 000000000..cba2f1c2f --- /dev/null +++ b/docs-source/source/norm.rst @@ -0,0 +1,358 @@ +.. currentmodule:: slideflow.norm + +slideflow.norm +=============== + +The ``slideflow.norm`` submodule includes tools for H&E stain normalization and augmentation. + +Available stain normalization algorithms include: + +- **macenko**: `Original Macenko paper `_. +- **macenko_fast**: Modified Macenko algorithm with the brightness standardization step removed. +- **reinhard**: `Original Reinhard paper `_. +- **reinhard_fast**: Modified Reinhard algorithm with the brightness standardization step removed. +- **reinhard_mask**: Modified Reinhard algorithm, with background/whitespace removed. +- **reinhard_fast_mask**: Modified Reinhard-Fast algorithm, with background/whitespace removed. +- **vahadane**: `Original Vahadane paper `_. +- **augment**: HSV colorspace augmentation. +- **cyclegan**: CycleGAN-based stain normalization, as implemented by `Zingman et al `_ (PyTorch only) + +Overview +******** + +The main normalizer interface, :class:`slideflow.norm.StainNormalizer`, offers +efficient numpy implementations for the Macenko, Reinhard, and Vahadane H&E stain normalization algorithms, as well +as an HSV colorspace stain augmentation method. This normalizer can convert +images to and from Tensors, numpy arrays, and raw JPEG/PNG images. + +In addition to these numpy implementations, PyTorch-native and Tensorflow-native +implementations are also provided, which offer performance improvements, GPU acceleration, +and/or vectorized application. The native normalizers are found in +``slideflow.norm.tensorflow`` and ``slideflow.norm.torch``, respectively. + +The Vahadane normalizer has two numpy implementations available: SPAMS +(``vahadane_spams``) and sklearn (``vahadane_sklearn``). By default, +the SPAMS implementation will be used if unspecified (``method='vahadane'``). + +Use :func:`slideflow.norm.autoselect` to get the fastest available normalizer +for a given method and active backend (Tensorflow/PyTorch). + +How to use +********** + +There are four ways you can use stain normalizers: 1) on individual images, 2) during dataset iteration, 3) during tile extraction, or 4) on-the-fly during training. + +Individual images +----------------- + +Stain normalizers can be used directly on individual images or batches of images. The Tensorflow and PyTorch-native stain normalizers perform operations on Tensors, allowing you to incoporate stain normalization into an external preprocessing pipeline. + +Load a backend-native stain normalizer with ``autoselect``, then transform an image with ``StainNormalizer.transform()``. This function will auto-detect the source image type, perform the most efficient transformation possible, and return normalized images of the same type. + +.. code-block:: python + + import slideflow as sf + + macenko = sf.norm.autoselect('macenko') + image = macenko.transform(image) + +You can use :meth:`slideflow.norm.StainNormalizer.fit` to fit the normalizer to a custom reference image, or use one of our preset fits. + +Dataloader pre-processing +------------------------- + +You can apply stain normalization during dataloader preprocessing by passing the ``StainNormalizer`` object to the ``normalizer`` argument of either ``Dataset.tensorflow()`` or ``Dataset.torch()``. + +.. code-block:: python + + import slideflow as sf + + # Get a PyTorch-native Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Create a PyTorch dataloader that applies stain normalization + dataset = sf.Dataset(...) + dataloader = dataset.torch(..., normalizer=macenko) + +.. note:: + + GPU acceleration cannot be performed within a PyTorch dataloader. Stain normalizers have a ``.preprocess()`` function that stain-normalizes and standardizes a batch of images, so the workflow to normalize on GPU in a custom PyTorch training loop would be: + + - Get a Dataloader with ``dataset.torch(standardize=False, normalize=False)`` + - On an image batch, preprocess with ``normalizer.preprocess()``: + + .. code-block:: python + + # Slideflow dataset + dataset = Project.dataset(tile_px=..., tile_um=...) + + # Create PyTorch dataloader + dataloader = dataset.torch(..., standardize=False) + + # Get a stain normalizer + normalizer = sf.norm.autoselect('reinhard') + + # Iterate through the dataloader + for img_batch, labels in dataloader: + + # Stain normalize using GPU + img_batch = img_batch.to('cuda') + with torch.no_grad(): + proc_batch = normalizer.preprocess(img_batch) + + ... + + +During tile extraction +---------------------- + +Image tiles can be normalized during tile extraction by using the ``normalizer`` and ``normalizer_source`` arguments. ``normalizer`` is the name of the algorithm. The normalizer source - either a path to a reference image, or a ``str`` indicating one of our presets (e.g. ``'v1'``, ``'v2'``, ``'v3'``) - can also be set with ``normalizer_source``. + +.. code-block:: python + + P.extract_tiles( + tile_px=299, + tile_um=302, + normalizer='reinhard' + ) + +On-the-fly +---------- + +Performing stain normalization on-the-fly provides greater flexibility, as it allows you to change normalization strategies without re-extracting all of your image tiles. + +Real-time normalization can be performed for most pipeline functions - such as model training or feature generation - by setting the ``normalizer`` and/or ``normalizer_source`` hyperparameters. + +.. code-block:: python + + from slideflow.model import ModelParams + hp = ModelParams(..., normalizer='reinhard') + +If a model was trained using a normalizer, the normalizer algorithm and fit information will be stored in the model metadata file, ``params.json``, in the saved model folder. Any Slideflow function that uses this model will automatically process images using the same normalization strategy. + +.. _normalizer_performance: + +Performance +*********** + +Slideflow has Tensorflow, PyTorch, and Numpy/OpenCV implementations of stain normalization algorithms. Performance benchmarks for these implementations +are given below: + +.. list-table:: **Performance Benchmarks** (299 x 299 images, Slideflow 2.0.0, benchmarked on 3960X and A100 40GB) + :header-rows: 1 + + * - + - Tensorflow backend + - PyTorch backend + * - macenko + - 929 img/s (**native**) + - 881 img/s (**native**) + * - macenko_fast + - 1,404 img/s (**native**) + - 1,088 img/s (**native**) + * - reinhard + - 1,136 img/s (**native**) + - 3,329 img/s (**native**) + * - reinhard_fast + - 4,226 img/s (**native**) + - 4,187 img/s (**native**) + * - reinhard_mask + - 1,136 img/s (**native**) + - 3,941 img/s (**native**) + * - reinhard_fast_mask + - 4,496 img/s (**native**) + - 4,058 img/s (**native**) + * - vahadane_spams + - 0.7 img/s + - 2.2 img/s + * - vahadane_sklearn + - 0.9 img/s + - 1.0 img/s + +.. _contextual_normalization: + +Contextual Normalization +************************ + +Contextual stain normalization allows you to stain normalize an image using the staining context of a separate image. When the context image is a thumbnail of the whole slide, this may provide slight improvements in normalization quality for areas of a slide that are predominantly eosin (e.g. necrosis or low cellularity). For the Macenko normalizer, this works by determining the maximum H&E concentrations from the context image rather than the image being transformed. For the Reinhard normalizer, channel means and standard deviations are calculated from the context image instead of the image being transformed. This normalization approach can result in poor quality images if the context image has pen marks or other artifacts, so we do not recommend using this approach without ROIs or effective slide-level filtering. + +Contextual normalization can be enabled during tile extraction by passing the argument ``context_normalize=True`` to :meth:`slideflow.Dataset.extract_tiles()`. + +You can use contextual normalization when manually using a ``StainNormalizer`` object by using the ``.context()`` function. The context can either be a slide (path or ``sf.WSI``) or an image (Tensor or np.ndarray). + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Use a given slide as context + slide = sf.WSI('slide.svs', ...) + + # Context normalize an image + with macenko.context(slide): + img = macenko.transform(img) + +You can also manually set or clear the normalizer context with ``.set_context()`` and ``.clear_context()``: + +.. code-block:: python + + # Set the normalizer context + macenko.set_context(slide) + + # Context normalize an image + img = macenko.transform(img) + + # Remove the normalizer context + macenko.clear_context() + +Contextual normalization is not supported with on-the-fly normalization during training or dataset iteration. + +.. _stain_augmentation: + +Stain Augmentation +****************** + +One of the benefits of on-the-fly stain normalization is the ability to perform dynamic stain augmentation with normalization. For Reinhard normalizers, this is performed by randomizing the channel means and channel standard deviations. For Macenko normalizers, stain augmentation is performed by randomizing the stain matrix target and the target concentrations. In all cases, randomization is performed by sampling from a normal distribution whose mean is the reference fit and whose standard deviation is a predefined value (in ``sf.norm.utils.augment_presets``). Of note, this strategy differs from the more commonly used strategy `described by Tellez `_, where augmentation is performed by randomly perturbing images in the stain matrix space without normalization. + +To enable stain augmentation, add the letter 'n' to the ``augment`` parameter when training a model. + +.. code-block:: python + + import slideflow as sf + + # Open a project + project = sf.Project(...) + + # Add stain augmentation to augmentation pipeline + params = sf.ModelParams(..., augment='xryjn') + + # Train a model + project.train(..., params=params) + +When using a StainNormalizer object, you can perform a combination of normalization and augmention for an image by using the argument ``augment=True`` when calling :meth:`StainNormalizer.transform`: + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Perform combination of stain normalization and augmentation + img = macenko.transform(img, augment=True) + +To stain augment an image without normalization, use the method :meth:`StainNormalizer.augment`: + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Perform stain augmentation + img = macenko.augment(img) + + +StainNormalizer +*************** + +.. autoclass:: StainNormalizer +.. autofunction:: slideflow.norm.StainNormalizer.fit +.. autofunction:: slideflow.norm.StainNormalizer.get_fit +.. autofunction:: slideflow.norm.StainNormalizer.set_fit +.. autofunction:: slideflow.norm.StainNormalizer.augment +.. autofunction:: slideflow.norm.StainNormalizer.transform +.. autofunction:: slideflow.norm.StainNormalizer.jpeg_to_jpeg +.. autofunction:: slideflow.norm.StainNormalizer.jpeg_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.png_to_png +.. autofunction:: slideflow.norm.StainNormalizer.png_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.rgb_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.tf_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.tf_to_tf +.. autofunction:: slideflow.norm.StainNormalizer.torch_to_torch + +Example images +************** + +.. figure:: norm_compare/wsi_norm_compare.jpg + + Comparison of normalizers applied to a whole-slide image. + +.. figure:: norm_compare/tile_norm_compare.jpg + + Comparison of normalizers applied to an image tile. + +.. figure:: norm_compare/wsi_unnormalized.jpg + + Unnormalized whole-slide images. + +.. figure:: norm_compare/wsi_reinhard_v1.jpg + + Whole-slide images normalized with **Reinhard**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_reinhard_v2.jpg + + Whole-slide images normalized with **Reinhard**, fit to preset "v2" + +.. figure:: norm_compare/wsi_macenko_v1.jpg + + Whole-slide images normalized with **Macenko**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_macenko_v2.jpg + + Whole-slide images normalized with **Macenko**, fit to preset "v2" + +.. figure:: norm_compare/wsi_vahadane_v1.jpg + + Whole-slide images normalized with **Vahadane**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_vahadane_v2.jpg + + Whole-slide images normalized with **Vahadane**, fit to preset "v2" + +.. figure:: norm_compare/wsi_vahadane_spams_v1.jpg + + Whole-slide images normalized with **Vahadane (SPAMS)**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_vahadane_spams_v2.jpg + + Whole-slide images normalized with **Vahadane (SPAMS)**, fit to preset "v2" + +.. figure:: norm_compare/tile_unnormalized.jpg + + Unnormalized image tiles. + +.. figure:: norm_compare/tile_reinhard_v1.jpg + + Image tiles normalized with **Reinhard Mask**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_reinhard_v2.jpg + + Image tiles normalized with **Reinhard Mask**, fit to preset "v2" + +.. figure:: norm_compare/tile_macenko_v1.jpg + + Image tiles normalized with **Macenko**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_macenko_v2.jpg + + Image tiles normalized with **Macenko**, fit to preset "v2" + +.. figure:: norm_compare/tile_vahadane_v1.jpg + + Image tiles normalized with **Vahadane**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_vahadane_v2.jpg + + Image tiles normalized with **Vahadane**, fit to preset "v2" + +.. figure:: norm_compare/tile_vahadane_spams_v1.jpg + + Image tiles normalized with **Vahadane (SPAMS)**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_vahadane_spams_v2.jpg + + Image tiles normalized with **Vahadane (SPAMS)**, fit to preset "v2" \ No newline at end of file diff --git a/docs-source/source/norm_compare/.gitattributes b/docs-source/source/norm_compare/.gitattributes new file mode 100644 index 000000000..a014ef866 --- /dev/null +++ b/docs-source/source/norm_compare/.gitattributes @@ -0,0 +1,20 @@ +wsi_vahadane_v2.jpg filter=lfs diff=lfs merge=lfs -text +wsi_reinhard_v2.jpg filter=lfs diff=lfs merge=lfs -text +tile_macenko_v2.jpg filter=lfs diff=lfs merge=lfs -text +tile_norm_compare.jpg filter=lfs diff=lfs merge=lfs -text +tile_reinhard_v1.jpg filter=lfs diff=lfs merge=lfs -text +tile_reinhard_v2.jpg filter=lfs diff=lfs merge=lfs -text +tile_unnormalized.jpg filter=lfs diff=lfs merge=lfs -text +tile_vahadane_spams_v1.jpg filter=lfs diff=lfs merge=lfs -text +wsi_reinhard_v1.jpg filter=lfs diff=lfs merge=lfs -text +tile_macenko_v1.jpg filter=lfs diff=lfs merge=lfs -text +tile_vahadane_v1.jpg filter=lfs diff=lfs merge=lfs -text +tile_vahadane_v2.jpg filter=lfs diff=lfs merge=lfs -text +wsi_macenko_v1.jpg filter=lfs diff=lfs merge=lfs -text +wsi_vahadane_spams_v2.jpg filter=lfs diff=lfs merge=lfs -text +wsi_vahadane_v1.jpg filter=lfs diff=lfs merge=lfs -text +tile_vahadane_spams_v2.jpg filter=lfs diff=lfs merge=lfs -text +wsi_macenko_v2.jpg filter=lfs diff=lfs merge=lfs -text +wsi_norm_compare.jpg filter=lfs diff=lfs merge=lfs -text +wsi_unnormalized.jpg filter=lfs diff=lfs merge=lfs -text +wsi_vahadane_spams_v1.jpg filter=lfs diff=lfs merge=lfs -text diff --git a/docs-source/source/norm_compare/tile_macenko_v1.jpg b/docs-source/source/norm_compare/tile_macenko_v1.jpg new file mode 100644 index 000000000..4035e5b58 --- /dev/null +++ b/docs-source/source/norm_compare/tile_macenko_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93fddb2cad886e3dfe40689bcb81888de5e6788fb5e40e5da540bd38ac15540e +size 449475 diff --git a/docs-source/source/norm_compare/tile_macenko_v2.jpg b/docs-source/source/norm_compare/tile_macenko_v2.jpg new file mode 100644 index 000000000..7f89a068f --- /dev/null +++ b/docs-source/source/norm_compare/tile_macenko_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d34072a00bcd184708e4c1e825f9e6a52609752edcb6354bf55baf1802078871 +size 482941 diff --git a/docs-source/source/norm_compare/tile_norm_compare.jpg b/docs-source/source/norm_compare/tile_norm_compare.jpg new file mode 100644 index 000000000..1ced2db28 --- /dev/null +++ b/docs-source/source/norm_compare/tile_norm_compare.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:485d5aaae58586967ab8ebc54cf6eb6989e39b57a114cae8aad4799c33a931b6 +size 202186 diff --git a/docs-source/source/norm_compare/tile_reinhard_v1.jpg b/docs-source/source/norm_compare/tile_reinhard_v1.jpg new file mode 100644 index 000000000..9ceac7a8e --- /dev/null +++ b/docs-source/source/norm_compare/tile_reinhard_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea1d40bb5f175dc52521867b65c21583e902dd65cbbd509dddec0c8852615939 +size 448746 diff --git a/docs-source/source/norm_compare/tile_reinhard_v2.jpg b/docs-source/source/norm_compare/tile_reinhard_v2.jpg new file mode 100644 index 000000000..84339ff78 --- /dev/null +++ b/docs-source/source/norm_compare/tile_reinhard_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb41b12616d12ff9d97354d52b5231d691bd89c238075cd744deebef30d37dc5 +size 471962 diff --git a/docs-source/source/norm_compare/tile_unnormalized.jpg b/docs-source/source/norm_compare/tile_unnormalized.jpg new file mode 100644 index 000000000..b0166875c --- /dev/null +++ b/docs-source/source/norm_compare/tile_unnormalized.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06f9806e355c0c7bf6c191bcf45d91bbd0d47f1b17a59e891bac0b4bb053f540 +size 472963 diff --git a/docs-source/source/norm_compare/tile_vahadane_spams_v1.jpg b/docs-source/source/norm_compare/tile_vahadane_spams_v1.jpg new file mode 100644 index 000000000..299f9d986 --- /dev/null +++ b/docs-source/source/norm_compare/tile_vahadane_spams_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b57fcbb3ad30100ff2a21220c755880a03891784bd874c76f219c7d336bea81 +size 553459 diff --git a/docs-source/source/norm_compare/tile_vahadane_spams_v2.jpg b/docs-source/source/norm_compare/tile_vahadane_spams_v2.jpg new file mode 100644 index 000000000..058c18652 --- /dev/null +++ b/docs-source/source/norm_compare/tile_vahadane_spams_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17f4f2eb346616f5aa1c6a9634e0dfe07fed4a642d55a2d3bfa2bf2f652e04f1 +size 487835 diff --git a/docs-source/source/norm_compare/tile_vahadane_v1.jpg b/docs-source/source/norm_compare/tile_vahadane_v1.jpg new file mode 100644 index 000000000..27728290c --- /dev/null +++ b/docs-source/source/norm_compare/tile_vahadane_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32d67b4e00b5175c68edb2024d9dfd25ecf03e485371a7b97bc92402bf7561a6 +size 532878 diff --git a/docs-source/source/norm_compare/tile_vahadane_v2.jpg b/docs-source/source/norm_compare/tile_vahadane_v2.jpg new file mode 100644 index 000000000..65a68325f --- /dev/null +++ b/docs-source/source/norm_compare/tile_vahadane_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b64694c8ea57abdbef3fbaff109cccb578f98b015664f3a64d770bc8f4401ce +size 532041 diff --git a/docs-source/source/norm_compare/wsi_macenko_v1.jpg b/docs-source/source/norm_compare/wsi_macenko_v1.jpg new file mode 100644 index 000000000..3fb6def9b --- /dev/null +++ b/docs-source/source/norm_compare/wsi_macenko_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2fdabf853080db1d387b821a45b5e031ddbdbd6610530f5ebab3b20b2495d62 +size 276965 diff --git a/docs-source/source/norm_compare/wsi_macenko_v2.jpg b/docs-source/source/norm_compare/wsi_macenko_v2.jpg new file mode 100644 index 000000000..f49d669ee --- /dev/null +++ b/docs-source/source/norm_compare/wsi_macenko_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de2d422b8759701918a1507050e1c9534a774d611bed04c356a04f0fe67e8736 +size 299939 diff --git a/docs-source/source/norm_compare/wsi_norm_compare.jpg b/docs-source/source/norm_compare/wsi_norm_compare.jpg new file mode 100644 index 000000000..474e570b8 --- /dev/null +++ b/docs-source/source/norm_compare/wsi_norm_compare.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f3c3db88e810348c7aa0b3c1a3f3dace2fbbb826eeaa99f9ff43afe79d00c80 +size 164826 diff --git a/docs-source/source/norm_compare/wsi_reinhard_v1.jpg b/docs-source/source/norm_compare/wsi_reinhard_v1.jpg new file mode 100644 index 000000000..df80f08f7 --- /dev/null +++ b/docs-source/source/norm_compare/wsi_reinhard_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75f3b257bdb5220308c692f235467697f1e3293355622e08879ea87e82eac4f9 +size 252296 diff --git a/docs-source/source/norm_compare/wsi_reinhard_v2.jpg b/docs-source/source/norm_compare/wsi_reinhard_v2.jpg new file mode 100644 index 000000000..04feaefea --- /dev/null +++ b/docs-source/source/norm_compare/wsi_reinhard_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60dcb84c37fb480983536886d977740c4ff2bd5c999c1da18144ad4be44efbaf +size 264586 diff --git a/docs-source/source/norm_compare/wsi_unnormalized.jpg b/docs-source/source/norm_compare/wsi_unnormalized.jpg new file mode 100644 index 000000000..ed223021e --- /dev/null +++ b/docs-source/source/norm_compare/wsi_unnormalized.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:845a9560dbd48706c02ab868b5fa3605dd1e67eeb4f24f01047dbd80d4232319 +size 274875 diff --git a/docs-source/source/norm_compare/wsi_vahadane_spams_v1.jpg b/docs-source/source/norm_compare/wsi_vahadane_spams_v1.jpg new file mode 100644 index 000000000..a5b115899 --- /dev/null +++ b/docs-source/source/norm_compare/wsi_vahadane_spams_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26ae25e069ec461be09f2ac3bcdb6be2f099d852acacf2b2e6e786bcc91d6153 +size 274749 diff --git a/docs-source/source/norm_compare/wsi_vahadane_spams_v2.jpg b/docs-source/source/norm_compare/wsi_vahadane_spams_v2.jpg new file mode 100644 index 000000000..a5b115899 --- /dev/null +++ b/docs-source/source/norm_compare/wsi_vahadane_spams_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26ae25e069ec461be09f2ac3bcdb6be2f099d852acacf2b2e6e786bcc91d6153 +size 274749 diff --git a/docs-source/source/norm_compare/wsi_vahadane_v1.jpg b/docs-source/source/norm_compare/wsi_vahadane_v1.jpg new file mode 100644 index 000000000..f047af5cc --- /dev/null +++ b/docs-source/source/norm_compare/wsi_vahadane_v1.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bced8a0598fd2698f14846ce689016c519ff1a507fd499076b2f239b5e271f11 +size 276323 diff --git a/docs-source/source/norm_compare/wsi_vahadane_v2.jpg b/docs-source/source/norm_compare/wsi_vahadane_v2.jpg new file mode 100644 index 000000000..a5b115899 --- /dev/null +++ b/docs-source/source/norm_compare/wsi_vahadane_v2.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26ae25e069ec461be09f2ac3bcdb6be2f099d852acacf2b2e6e786bcc91d6153 +size 274749 diff --git a/docs-source/source/otsu.png b/docs-source/source/otsu.png new file mode 100644 index 000000000..114ecec88 --- /dev/null +++ b/docs-source/source/otsu.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e603cf318e2b012d6318d044a9fc42e4c16b6be75509af7062b2420a6c96918 +size 981740 diff --git a/docs-source/source/overview.png b/docs-source/source/overview.png index 23cee11be..16ba338d2 100644 Binary files a/docs-source/source/overview.png and b/docs-source/source/overview.png differ diff --git a/docs-source/source/overview.rst b/docs-source/source/overview.rst new file mode 100644 index 000000000..49b84f3aa --- /dev/null +++ b/docs-source/source/overview.rst @@ -0,0 +1,65 @@ +Overview +======== + +Slideflow provides tools for easily building and testing a variety of deep learning models for digital pathology. + +This section provides a high-level overview of the most common application: building and testing a weakly supervised predictive model. Slideflow supports many other tasks, including :ref:`multiple-instance learning (MIL) `, :ref:`self-supervised learning (SSL) `, :ref:`generative adversarial networks (GANs) `, :ref:`tissue ` and :ref:`cell ` segmentation, and :ref:`deployment & visualization `, which are discussed in subsequent sections. + +.. figure:: overview.png + + *High-level overview of model building.* + +The pipeline for a deep learning classification experiment is separated into three phases. + +1) **Tile extraction** - annotate slides with regions of interest (ROIs) [*optional*] and extract image tiles from whole-slide images. + +2) **Model training** - determine model parameters, train a model, and evaluate the model on a held-out test set. + +3) **Explainability** - generate predictive heatmaps and analyze learned image features. + +| + +A brief introduction to the steps needed to execute a basic experiment is provided below. Each process will be described in more detail in the following sections. + +Step 1: Prepare a dataset +************************* + +- **Extract tiles**. :ref:`Tiles are extracted ` from slides at a given magnification size in microns (or a magnification layer, such as "10x"), and saved at a given resolution in pixels. The optimal extraction size in both microns and pixels will depend on your dataset and model architecture. Poor quality tiles - including background tiles or tiles with high whitespace content - can be discarded with quality control methods. Tiles will be stored as TFRecords, a binary file format used to improve dataset reading performance during training. Each slide will have its own TFRecord file containing its extracted tiles. + +- **Set aside final evaluation set**. :ref:`Split the dataset ` into a training/validation set and held-out test set. + +- **Determing validation plan**. By default, three-fold cross-validation will be performed during training. Many other validation strategies are also supported (:ref:`validation_planning`). + +Step 2: Train a model +********************* + +- **Choose model type**. Choose the endpoint (e.g. classification, regression, time-to-event) and type of model (tile-based or multiple-instance learning). + +- **Set hyperparameters**. Choose a model architecture (e.g. InceptionV3, VGG16, ResNet, etc.) and a set of hyperparameters (e.g. batch size, learning rate, etc.). This can be done manually, or :ref:`hyperparameters can be optimized ` via grid search or Bayesian optimization. + +- **Initiate training**. :ref:`Train your model `, taking note of training and validation performance (e.g. accuracy, AUROC, AP, R-squared, C-index). + +Step 3: Evaluate the model +************************** + +- **Evaluate on held-out set**: :ref:`Evaluate your final model ` model on the held-out dataset. + +Step 4: Generate heatmaps +************************* + +- **Generate heatmaps**: :ref:`Generate heatmaps ` of predictions across slides in the held-out dataset to assist with interpretability. For MIL models, heatmaps of both predictions and attention can be generated. + +.. image:: heatmap_example.png + +Step 5: Make a Mosaic map +************************* + +- **Generate a mosaic map**: :ref:`Create a mosaic map `, which visually illustrates the latent space of your trained model and held-out dataset, to assist with interpretability. + +.. image:: mosaic_example.png + +Step 6: Live visualization +************************** +- **Deploy the model**: Finally, use a trained model to visualize predictions for whole-slide images with the interactive tool :ref:`Slideflow Studio `. This whole-slide image viewer includes deep learning tools enabling you to visualize model predictions on whole-slide images, standard JPG/PNG files, real-time camera feeds, and even Generative Adversarial Network (GAN)-generated images. + +.. image:: workbench_preview.png diff --git a/docs-source/source/performance.png b/docs-source/source/performance.png new file mode 100644 index 000000000..a26677a02 --- /dev/null +++ b/docs-source/source/performance.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1f462c4c51b5aa72275df10a15bb5632234b973ae61ac191e7b8e82e054fe34 +size 859069 diff --git a/docs-source/source/pipeline.rst b/docs-source/source/pipeline.rst deleted file mode 100644 index 0c23996a5..000000000 --- a/docs-source/source/pipeline.rst +++ /dev/null @@ -1,65 +0,0 @@ -Pipeline Overview -================= - -.. figure:: overview.png - - *High-level overview of main functions.* - -The overall pipeline for a deep learning experiment is separated into three phases. - -1) **Tile extraction** - involves annotating slides with regions of interest (ROIs) (*optional*), setting up a project, and extracting image tiles from whole-slide images. - -2) **Model training** - includes performing a hyperparameter sweep [*optional*], training a model, and evaluating the trained model on a held-out test set. - -3) **Explainability** - involves generating predictive heatmaps and analyzing learned image features. - -| - -A high-level overview of each of these phases is provided below. We will examine execution of each step in more detail in the following sections. - -Step 1: ROI Annotation -********************** - -1) **Label ROIs** (optional). Using `QuPath `_, annotate whole-slide images with the Polygon tool. Then, click **Automate** -> **Show script editor**. In the box that comes up, click **File** -> **Open** and load the ``qupath_roi.groovy`` script (QuPath 0.2 or greater) or ``qupath_roi_legacy.groovy`` (QuPath 0.1.x). Click **Run** -> **Run** if using QuPath 0.2 or greater, or **Run** -> **Run for Project** if using QuPath 0.1.x. ROIs will be exported in CSV format in the QuPath project directory, in the subdirectory "ROI". - -.. note:: - This step may be skipped if you are performing analysis on whole-slide images, rather than annotated tumor regions. - -Step 2: Dataset preparation -*************************** - -2) **Extract tiles**. Once ROIs have been created, tiles will need to be extracted from the ROIs across all of your slides. Tiles will be extracted at a given magnification size in microns, and saved at a given resolution in pixels. The optimal extraction size in both microns and pixels will depend on your dataset and model architecture. Poor quality tiles - including background tiles or tiles with high whitespace content - will be automatically discarded. Tiles will be stored as TFRecords, a binary file format used to improve dataset reading performance during training. Each slide will have its own TFRecord file containing its extracted tiles. - -3) **Set aside final evaluation set**. Using the project annotations CSV file, designate which slides should be saved for final evaluation. - -4) **Establish training and validation dataset**. By default, three-fold cross-validation will be performed during training. Many other validation strategies are also supported (:ref:`validation_planning`). - -Step 3: Model training -********************** - -5) **Choose hyperparameters**. Before training can begin, you must choose both a model architecture (e.g. InceptionV3, VGG16, ResNet, etc.) and a set of hyperparameters (e.g. batch size, learning rate, etc.). This can be done explicitly one at a time, or an automatic hyperparameter sweep can be configured. - -6) **Initiate training**. Train your model across all desired hyperparameters and select the best-performing hyperparameter combination for final evaluation testing. - -Step 4: Model evaluation -************************ -Validation testing is performed both during training - at specified epochs - and after training has completed. Various metrics are recorded in the project directory at these intervals to assist with model performance assessment, including: - -- **Training and validation loss** -- **Training and validation accuracy** (for categorical outcomes) -- **Tile-level, slide-level, and patient-level AUROC and AP** (for categorical outcomes) -- **Tile-level, slide-level, and patient-level scatter plots with R-squared** (for continuous outcomes) -- **Tile-level, slide-level, and patient-level C-index** (for Cox Proportional Hazards models) -- **Histograms of predictions** (for continuous outcomes) - -Step 5: Heatmaps -**************** -In addition to the above metrics, performance of a trained model can be assessed by visualizing predictions for a set slides as heatmaps. - -.. image:: heatmap_example.png - -Step 6: Mosaic maps -******************* -Finally, learned image features can be visualized using dimensionality reduction on model layer activations. A set of image tiles is first provided to your trained model, which calculates activations at a specified intermediate layer. Tile-level activations are then plotted with dimensionality reduction (UMAP), and points on the plot are replaced with image tiles, generating a mosaic map. - -.. image:: mosaic_example.png diff --git a/docs-source/source/plugins.rst b/docs-source/source/plugins.rst new file mode 100644 index 000000000..118dcc51f --- /dev/null +++ b/docs-source/source/plugins.rst @@ -0,0 +1,95 @@ +.. _plugins: + +Creating a Slideflow Plugin +=========================== + +Slideflow has been designed to be extensible, and we encourage users to contribute their own plugins to the Slideflow ecosystem. Plugins can be used to add new functionality to Slideflow, such as new feature extractors or new model architectures. This page provides an overview of how to create and use plugins with Slideflow. + + +MIL Model Registration +---------------------- + +As discussed in :ref:`custom_mil`, Slideflow supports the registration of custom MIL models. This is done by using the ``register_model`` decorator to register a custom MIL model. + +For example, suppose you have a custom MIL model called ``MyMILModel`` that you want to register with Slideflow. You've already designed the model such that it meets Slideflow's MIL :ref:`requirements `. Now you want to make it available for use directly within Slideflow. You can accomplish this by using the ``register_model`` decorator: + +.. code-block:: python + + from slideflow.model.mil import register_model + + @register_model + def my_mil_model(**kwargs): + from . import MyMILModel + return MyMILModel(**kwargs) + +Once this code is run, the custom MIL model will be available for use with Slideflow: + +.. code-block:: python + + import slideflow as sf + + model = sf.build_mil_model("my_mil_model") + + +Feature Extractors +------------------ + +Similarly, Slideflow supports the integration of custom feature extractors via the ``register_torch`` and ``register_tf`` decorators. Please see our detailed :ref:`developer note ` for more information on how to create and register custom extractors. Briefly, you can register a custom feature extractor with Slideflow as follows: + +.. code-block:: python + + from slideflow.model.extractors import register_torch + + @register_torch + def my_foundation_model(**kwargs): + from . import MyFoundationModel + return MyFoundationModel(**kwargs) + + +Creating a Plugin +----------------- + +Once you have a custom MIL model or feature extractor that you want to integrate with Slideflow, you can create a plugin to make it available to other users. + +Slideflow supports external plugins via standard Python entry points, allowing you to publish your own package that integrates with Slideflow. + +In your package's ``setup.py`` file, use the "entry_points" key to connect with the Slideflow plugin interface: + +.. code-block:: python + + ..., + entry_points={ + 'slideflow.plugins': [ + 'extras = my_package:register_extras', + ], + }, + +Then, in your package's root ``__init__.py`` file, write a ``register_extras()`` function that does any preparation needed to initialize or import your model. + +(in ``my_package/__init__.py``) + +.. code-block:: python + + def register_extras(): + # Import the model, and do any other necessary preparation. + # If my_module contains the @register_model decorator, + # the model will be registered with Slideflow automatically. + from . import my_module + + print("Registered MyFoundationModel") + +You can then build and distribute your plugin, and once installed, the registration with Slideflow will happen automatically: + +.. code-block:: bash + + pip install my_package + + +.. code-block:: python + + import slideflow as sf + + model = sf.build_feature_extractor("my_foundation_model") + + +For a complete example, head over to our `Slideflow-GPL `_ and `Slideflow-NonCommercial `_ repositories, which have been built using the plugin system described above. \ No newline at end of file diff --git a/docs-source/source/posthoc.rst b/docs-source/source/posthoc.rst new file mode 100644 index 000000000..11f46acc7 --- /dev/null +++ b/docs-source/source/posthoc.rst @@ -0,0 +1,259 @@ +.. currentmodule:: slideflow.model + +.. _activations: + +Layer Activations +================= + +Investigating the latent space of a neural network can provide useful insights into the structure of your data and what models have learned during training. Slideflow provides several tools for post-hoc latent space analysis of trained neural networks, primarily by calculating activations at one or more neural network layers for all images in a dataset. In the next sections, we will take a look at how these layer activations can be calculated for downstream analysis and provide examples of analyses that can be performed. + +Calculating Layer Activations +***************************** + +Activations at one or more layers of a trained network can be calculated with :class:`slideflow.model.Features` and :class:`slideflow.DatasetFeatures`. The former provides an interface for calculating layer activations for a batch of images, and the latter supervises calculations across an entire dataset. + +Batch of images +--------------- + +:class:`Features` provides an interface for calculating layer activations and predictions on a batch of images. The following arguments are available: + +- ``path``: Path to model, from which layer activations are calculated. Required. +- ``layers``: Layer(s) at which to calculate activations. +- ``include_preds``: Also return the final network output (predictions) +- ``pooling``: Apply pooling to layer activations, to reduce dimensionality to one dimension. + +If ``layers`` is not supplied, activations at the post-convolutional layer will be calculated by default. + +Once initialized, the resulting object can be called on a batch of images and will return the layer activations for all images in the batch. For example, to calculate activations at the ``sep_conv_3`` layer of a model while looping through a dataset: + +.. code-block:: python + + import slideflow as sf + + sepconv3 = sf.model.Features('model/path', layer='sep_conv_3') + for img_batch in dataset: + postconv_activations = sepconv3(img_batch) + +If ``layer`` is a list of layer names, activations at each layer will be calculated and concatenated. If ``include_preds`` is ``True``, the interface will also return the final predictions: + +.. code-block:: python + + sepconv3_and_preds = sf.model.Features(..., include_preds=True) + layer_activations, preds = sepconv3_and_preds(img_batch) + +.. note:: + + :class:`Features` assumes that image batches already have any necessary preprocessing already applied, including standardization and stain normalization. + +See the API documentation for :class:`Features` for more information. + +Single slide +------------ + +Layer activations can also be calculated across an entire slide using the same :class:`Features` interface. Calling the object on a :class:`slideflow.WSI` object will generate a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``: + +.. code-block:: python + + import slideflow as sf + + slide = sf.WSI(...) + postconv = sf.model.Features('/model/path', layers='postconv') + feature_grid = postconv(slide) + print(feature_grid.shape) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + (50, 45, 2048) + +.. _dataset_features: + +Entire dataset +-------------- + +Finally, layer activations can also be calculated for an entire dataset using :class:`slideflow.DatasetFeatures`. Instancing the class supervises the calculation and caching of layer activations, which can then be used for downstream analysis. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class. + +.. code-block:: python + + dts_ftrs = P.generate_features('/path/to/trained_model') + +Alternatively, you can create an instance of this class directly: + +.. code-block:: python + + import slideflow as sf + + dataset = P.dataset(tile_px=299, tile_um=302) + dts_ftrs = sf.DatasetFeatures( + model='/path/to/trained_model', + dataset=dataset, + ) + +Tile-level feature activations for each slide can be accessed directly from ``DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Predictions are stored in ``DatasetFeatures.predictions``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_classes)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``). + +Activations can be exported to a Pandas DataFrame with :meth:`slideflow.DatasetFeatures.to_df` or exported into PyTorch format with :meth:`slideflow.DatasetFeatures.to_torch`. See :ref:`features` for more information about generating and exporting features for MIL models. + +Read the API documentation for :class:`slideflow.DatasetFeatures` for more information. + +.. _slidemap: + +Mapping Activations +******************* + +Layer activations across a dataset can be dimensionality reduced with UMAP and plotted for visualization using :meth:`slideflow.DatasetFeatures.map_activations`. This function returns an instance of :class:`slideflow.SlideMap`, a class that provides easy access to labeling and plotting. + +The below example calculates layer activations at the neural network layer ``sep_conv_3`` for an entire dataset, and then reduces the activations into two dimensions for easy visualization using UMAP. Any valid `UMAP parameters `_ can be passed via keyword argument. + +.. code-block:: python + + dts_ftrs = P.generate_features( + model='/path/to/trained_model', + layers='sep_conv_3' + ) + slide_map = dts_ftrs.map_activations( + n_neighbors=10, # UMAP parameter + min_dist=0.2 # UMAP parameter + ) + +We can then plot the activations with :meth:`slideflow.SlideMap.plot`. All keyword arguments are passed to the `matplotlib scatter `_ function. + +.. code-block:: python + + import matplotlib.pyplot as plt + + slide_map.plot(s=10) + plt.show() + +We can add labels to our plot by first passing a dictionary with slide labels to the function :meth:`slideflow.SlideMap.label_by_slide`. + +.. code-block:: python + + # Get a dictionary mapping slide names to category labels + dataset = P.dataset(tile_px=299, tile_um='10x') + labels, unique_labels = dataset.labels('subtype', format='name') + + # Assign the labels to the slide map, then plot + slide_map.label_by_slide(labels) + slide_map.plot() + +.. image:: umap_example.png + +| + +Finally, we can use :meth:`SlideMap.umap_transform` to project new data into two dimensions using the previously fit UMAP. + +.. code-block:: python + + import slideflow as sf + import numpy as np + + # Create a SlideMap using layer activations reduced with UMAP + dts_ftrs = P.generate_features( + model='/path/to/trained_model', + layers='sep_conv_3' + ) + slide_map = dts_ftrs.map_activations() + + # Load some dummy data. + # Second dimension must match size of activation vector. + dummy = np.random.random((100, 1024)) + + # Transform the data using the already-fit UMAP. + transformed = slide_map.umap_transform(dummy) + print(transformed.shape) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + (100, 2) + +Read more about additional :class:`slideflow.SlideMap` functions, including saving, loading, and clustering, in the linked API documentation. + +.. _mosaic_map: + +Mosaic Maps +*********** + +Mosaic maps provide a tool for visualizing the distribution of histologic image features in a dataset through analysis of neural network layer activations. Similar to `activation atlases `_, a mosaic map is generated by first calculating layer activations for a dataset, dimensionality reducing these activations with `UMAP `_, and then overlaying corresponding images in a grid-wise fashion. + +.. image:: mosaic_example.png + +| + +In the previous sections, we reviewed how to calculate layer activations across a dataset, and then dimensionality reduce these activations into two dimensions using UMAP. :class:`slideflow.Mosaic` provides a tool for converting these activation maps into a grid of image tiles plotted according to their associated activation vectors. + +Quickstart +---------- + +The fastest way to build a mosaic map is using :class:`slideflow.Project.generate_mosaic`, which requires a ``DatasetFeatures`` object as its only mandatory argument and returns an instance of :class:`slideflow.Mosaic`. + +.. code-block:: python + + dts_ftrs = P.generate_features('/path/to/trained_model', layers='postconv') + mosaic = P.generate_mosaic(dts_ftrs) + mosaic.save('mosaic.png') + +When created with this interface, the underlying :class:`slideflow.SlideMap` object used to create the mosaic map is accessible via ``slideflow.Mosaic.slide_map``. You could, for example, use :func:`slideflow.SlideMap.save` to save the UMAP plot: + +.. code-block:: python + + mosiac.slide_map.save('umap.png') + +From a SlideMap +--------------- + +Any ``SlideMap`` can be converted to a mosaic map with :meth:`slideflow.SlideMap.generate_mosaic()`. + +.. code-block:: python + + ftrs = P.generate_features('/path/to/model') + slide_map = ftrs.map_activations() + mosaic = slide_map.generate_mosaic() + mosaic.save('mosaic.png') + +Manual creation +--------------- + +Mosaic maps can be flexibly created with :class:`slideflow.Mosaic`, requiring two components: a set of images and corresponding coordinates. Images and coordinates can either be manually provided, or the mosaic can dynamically read images from TFRecords (as is done with :meth:`Project.generate_mosaic()`). + +The first argument of :class:`slideflow.Mosaic` provides the images, and may be either of the following: + +- A list or array of images (np.ndarray, HxWxC) +- A list of tuples, containing ``(slide_name, tfrecord_index)`` + + +The second argument provides the coordinates: + +- A list or array of (x, y) coordinates for each image + + +For example, to create a mosaic map from a list of images and coordinates: + +.. code-block:: python + + # Example data (images are HxWxC, np.ndarray) + images = [np.ndarray(...), ...] + coords = [(0.2, 0.9), ...] + + # Generate the mosaic + mosaic = Mosaic(images, coordinates) + mosaic.plot() + +You can also generate a mosaic map where the images are tuples of `(tfrecord, tfrecord_index)`. In this case, the mosaic map will dynamically read images from TFRecords during plotting. + +.. code-block:: python + + # Example data + tfrecords = ['/path/to/tfrecord`.tfrecords', ...] + idx = [253, 112, ...] + coords = [(0.2, 0.9), ...] + + # Generate mosaic map + mosaic = sf.Mosaic( + images=[(tfr, idx) for tfr, idx in zip(tfrecords, idx)], + coords=coords + ) + +There are several additional arguments that can be used to customize the mosaic map plotting. Read the linked API documentation for :class:`slideflow.Mosaic` for more information. \ No newline at end of file diff --git a/docs-source/source/project.rst b/docs-source/source/project.rst index de0821f16..b423dad7a 100644 --- a/docs-source/source/project.rst +++ b/docs-source/source/project.rst @@ -1,11 +1,79 @@ .. currentmodule:: slideflow +.. _project: + slideflow.Project ================= -This class provides a high-level interface that simplifies execution of pipeline functions. Nearly all pipeline tasks -can be accomplished with the methods in this class, although directly interacting with the various objects in this -package will enable more granular control. - .. autoclass:: Project - :members: + +Attributes +---------- + +.. autosummary:: + + Project.annotations + Project.dataset_config + Project.eval_dir + Project.models_dir + Project.name + Project.neptune_api + Project.neptune_workspace + Project.sources + +Methods +------- + +.. autofunction:: slideflow.Project.add_source + +.. autofunction:: slideflow.Project.associate_slide_names + +.. autofunction:: slideflow.Project.cell_segmentation + +.. autofunction:: slideflow.Project.create_blank_annotations + +.. autofunction:: slideflow.Project.create_hp_sweep + +.. autofunction:: slideflow.Project.evaluate + +.. autofunction:: slideflow.Project.evaluate_mil + +.. autofunction:: slideflow.Project.extract_cells + +.. autofunction:: slideflow.Project.extract_tiles + +.. autofunction:: slideflow.Project.gan_train + +.. autofunction:: slideflow.Project.gan_generate + +.. autofunction:: slideflow.Project.generate_features + +.. autofunction:: slideflow.Project.generate_feature_bags + +.. autofunction:: slideflow.Project.generate_heatmaps + +.. autofunction:: slideflow.Project.generate_mosaic + +.. autofunction:: slideflow.Project.generate_mosaic_from_annotations + +.. autofunction:: slideflow.Project.generate_tfrecord_heatmap + +.. autofunction:: slideflow.Project.dataset + +.. autofunction:: slideflow.Project.predict + +.. autofunction:: slideflow.Project.predict_ensemble + +.. autofunction:: slideflow.Project.predict_wsi + +.. autofunction:: slideflow.Project.save + +.. autofunction:: slideflow.Project.smac_search + +.. autofunction:: slideflow.Project.train + +.. autofunction:: slideflow.Project.train_ensemble + +.. autofunction:: slideflow.Project.train_mil + +.. autofunction:: slideflow.Project.train_simclr diff --git a/docs-source/source/project_setup.rst b/docs-source/source/project_setup.rst index 0027830bd..54d8dfd7d 100644 --- a/docs-source/source/project_setup.rst +++ b/docs-source/source/project_setup.rst @@ -1,22 +1,23 @@ +.. _project_setup: + Setting up a Project ==================== -The easiest way to use ``slideflow`` is through the bundled project management class, :class:`slideflow.Project`, which supports unified datasets, annotations, and project directory structure for all pipeline functions. +Slideflow :ref:`Projects ` organize datasets, annotations, and results into a unified directory and provide a high-level API for common tasks. -To initialize a new project, pass keyword arguments to :class:`slideflow.Project` with project settings: +Use :func:`slideflow.create_project` to create a new project, supplying an annotations file (with patient labels) and path to slides. A new dataset source (collection of slides and tfrecords) will be configured. Additional keyword arguments can be used to specify the location of trecords and saved models. .. code-block:: python import slideflow as sf - P = sf.Project( - '/path/to/project/directory', - name="MyProject", + P = sf.create_project( + root='project_path', annotations="./annotations.csv" - ... + slides='/path/to/slides/' ) -A project will then be initialized at the given directory, with settings saved in a ``settings.json`` file. Any project settings not provided via keyword arguments will use defaults. Each project will have the following settings: +Project settings are saved in a ``settings.json`` file in the root project directory. Each project will have the following settings: +-------------------------------+-------------------------------------------------------+ | **name** | Project name. | @@ -29,7 +30,7 @@ A project will then be initialized at the given directory, with settings saved i | **dataset_config** | Path to JSON file containing dataset configuration. | | | Defaults to "./datasets.json" | +-------------------------------+-------------------------------------------------------+ -| **sources** | Names of dataset(s) to include in the project. | +| **sources** | Names of dataset source(s) to include in the project. | | | Defaults to an empty list. | +-------------------------------+-------------------------------------------------------+ | **models_dir** | Path, where model files and results are saved. | @@ -44,29 +45,17 @@ Once a project has been initialized at a directory, you may then load the projec .. code-block:: python import slideflow as sf - P = sf.Project('/path/to/project/directory') + P = sf.load_project('/path/to/project/directory') -Pipeline functions are then called on the project object ``P``. +.. _dataset_sources: -Alternatively, you can use the bundled ``run_project.py`` script to execute project functions stored in ``actions.py`` files in project directories. When ``run_project.py`` is run, it initializes a ``Project`` object at a given directory, then looks for and loads an ``actions.py`` file in this directory, executing functions contained therein. +Dataset Sources +*************** -To create a new project with this script, or execute functions on an existing project, use the following syntax: +A :ref:`dataset source ` is a collection of slides, Regions of Interest (ROI) annotations (if available), and extracted tiles. Sources are defined in the project dataset configuration file, which can be shared and used across multiple projects or saved locally within a project directory. These configuration files have the following format: .. code-block:: bash - $ python3 run_project.py -p /path/to/project/directory - -where the -p flag is used to designate the path to your project directory. Other available flags can be seen by running ``python3 run_project.py --help``. - -Configuring Datasets -******************** - -Once initial project settings are established, you will need to either create or load a dataset configuration, which will specify directory locations for slides, ROIs, tiles, and TFRecords for each group of slides. - -Dataset configurations are saved in a JSON file with the below syntax. Dataset configuration files can be shared and used across multiple projects, or saved locally within a project directory. - -.. code-block:: json - { "SOURCE": { @@ -77,20 +66,24 @@ Dataset configurations are saved in a JSON file with the below syntax. Dataset c } } -Add a new dataset source to a project with ``Project.add_dataset()``, which will save the dataset in JSON format to the project dataset configuration file. +When a project is created with :func:`slideflow.create_project`, a dataset source is automatically created. You can change where slides and extracted tiles are stored by editing the project's dataset configuration file. + +It is possible for a project to have multiple dataset sources - for example, you may choose to organize data from multiple institutions into separate sources. You can add a new dataset source to a project with :meth:`Project.add_source`, which will update the project dataset configuration file accordingly. .. code-block:: python P.add_source( - name="NAME", + name="SOURCE_NAME", slides="/slides/directory", roi="/roi/directory", tiles="/tiles/directory", tfrecords="/tfrecords/directory" ) -Setting up annotations -********************** +Read more about :ref:`working with datasets `. + +Annotations +*********** Your annotations file is used to label patients and slides with clinical data and/or other outcome variables that will be used for training. Each line in the annotations file should correspond to a unique slide. Patients may have more than one slide. @@ -119,37 +112,4 @@ An example annotations file is generated each time a new project is initialized. P.create_blank_annotations() -The ``slide`` column may not need to be explicitly set in the annotations file by the user. Rather, once a dataset has been set up, slideflow will search through the linked slide directories and attempt to match slides to entries in the annotations file using **patient**. Entries that are blank in the **slide** column will be auto-populated with any detected and matching slides, if available. - -.. _execute: - -Executing commands -****************** - -If you plan to use the ``run_project.py`` script for your projects, open the ``actions.py`` file located in the project directory. It should look something like this: - -.. code-block:: python - - def main(P): - #P.extract_tiles(tile_px=299, tile_um=302) - - #P.train( - # "category", - # filters = { - # 'category': ['NEG', 'POS'], - # 'dataset': 'train' - # }, - #) - - #model = '/path_to_model/' - #P.evaluate(model, outcomes="category", filters={'dataset': 'eval'}) - #P.generate_heatmaps(model_to_evaluate) - pass - -The ``main()`` function contains several example functions. These serve as examples to help remind you of functions and arguments you can use on projects. - -To execute the commands you have prepared in this file, execute the ``run_project.py`` script pointing to your project directory. - -.. code-block:: bash - - $ python3 run_project.py -p /path/to/project/directory \ No newline at end of file +The ``slide`` column may not need to be explicitly set in the annotations file by the user. Rather, once a dataset has been set up, slideflow will search through the linked slide directories and attempt to match slides to entries in the annotations file using **patient**. Entries that are blank in the **slide** column will be auto-populated with any detected and matching slides, if available. \ No newline at end of file diff --git a/docs-source/source/quickstart.rst b/docs-source/source/quickstart.rst new file mode 100644 index 000000000..880eb0fb9 --- /dev/null +++ b/docs-source/source/quickstart.rst @@ -0,0 +1,168 @@ +Quickstart +========== + +This section provides an example of using Slideflow to build a deep learning classifier from digital pathology slides. Follow the links in each section for more information. + +Preparing a project +******************* + +Slideflow experiments are organized using :class:`slideflow.Project`, which supervises storage of data, saved models, and results. The ``slideflow.project`` module has three preconfigured projects with associated slides and clinical annotations: ``LungAdenoSquam``, ``ThyroidBRS``, and ``BreastER``. + +For this example, we will the ``LungAdenoSquam`` project to train a classifier to predict lung adenocarcinoma (Adeno) vs. squamous cell carcinoma (Squam). + +.. code-block:: python + + import slideflow as sf + + # Download preconfigured project, with slides and annotations. + project = sf.create_project( + root='data', + cfg=sf.project.LungAdenoSquam(), + download=True + ) + +Read more about :ref:`setting up a project on your own data `. + +Data preparation +**************** + +The core imaging data used in Slideflow are image tiles :ref:`extracted from slides ` at a specific magnification and pixel resolution. Tile extraction and downstream image processing is handled through the primitive :ref:`slideflow.Dataset `. We can request a ``Dataset`` at a given tile size from our project using :meth:`slideflow.Project.dataset`. Tile magnification can be specified in microns (as an ``int``) or as optical magnification (e.g. ``'40x'``). + +.. code-block:: python + + # Prepare a dataset of image tiles. + dataset = project.dataset( + tile_px=299, # Tile size, in pixels. + tile_um='10x' # Tile size, in microns or magnification. + ) + dataset.summary() + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 0 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +Tile extraction +--------------- + +We prepare imaging data for training by extracting tiles from slides. Background areas of slides will be filtered out with Otsu's thresholding. + +.. code-block:: python + + # Extract tiles from all slides in the dataset. + dataset.extract_tiles(qc='otsu') + +Read more about tile extraction and :ref:`slide processing in Slideflow `. + +Held-out test sets +------------------ + +Now that we have our dataset and we've completed the initial tile image processing, we'll split the dataset into a training cohort and a held-out test cohort with :meth:`slideflow.Dataset.split`. We'll split while balancing the outcome ``'subtype'`` equally in the training and test dataset, with 30% of the data retained in the held-out set. + +.. code-block:: python + + # Split our dataset into a training and held-out test set. + train_dataset, test_dataset = dataset.split( + model_type='classification', + labels='subtype', + val_fraction=0.3 + ) + +Read more about :ref:`Dataset management `. + +Configuring models +****************** + +Neural network models are prepared for training with :class:`slideflow.ModelParams`, through which we define the model architecture, loss, and hyperparameters. Dozens of architectures are available in both the Tensorflow and PyTorch backends, and both neural network :ref:`architectures ` and :ref:`loss ` functions can be customized. In this example, we will use the included Xception network. + +.. code-block:: python + + # Prepare a model and hyperparameters. + params = sf.ModelParams( + tile_px=299, + tile_um='10x', + model='xception', + batch_size=64, + learning_rate=0.0001 + ) + +Read more about :ref:`hyperparameter optimization in Slideflow `. + +Training a model +**************** + +Models can be trained from these hyperparameter configurations using :meth:`Project.train`. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is :ref:`highly configurable `. In this case, we are training a binary categorization model to predict the outcome ``'subtype'``, and we will distribute training across multiple GPUs. + +By default, Slideflow will train/validate on the full dataset using k-fold cross-validation, but validation settings :ref:`can be customized `. If you would like to restrict training to only a subset of your data - for example, to leave a held-out test set untouched - you can manually specify a dataset for training. In this case, we will train on ``train_dataset``, and allow Slideflow to further split this into training and validation using three-fold cross-validation. + +.. code-block:: python + + # Train a model from a set of hyperparameters. + results = P.train( + 'subtype', + dataset=train_dataset, + params=params, + val_strategy='k-fold', + val_k_fold=3, + multi_gpu=True, + ) + +Models and training results will be saved in the project ``models/`` folder. + +Read more about :ref:`training a model `. + +Evaluating a trained model +************************** + +After training, you can test model performance on a held-out test dataset with :meth:`Project.evaluate`, or generate predictions without evaluation (when ground-truth labels are not available) with :meth:`Project.predict`. As with :meth:`Project.train`, we can specify a :class:`slideflow.Dataset` to evaluate. + +.. code-block:: python + + # Train a model from a set of hyperparameters. + test_results = P.evaluate( + model='/path/to/trained_model_epoch1' + outcomes='subtype', + dataset=test_dataset + ) + +Read more about :ref:`model evaluation `. + +Post-hoc analysis +***************** + +Slideflow includes a number of analytical tools for working with trained models. Read more about :ref:`heatmaps `, :ref:`model explainability `, :ref:`analysis of layer activations `, and real-time inference in an interactive :ref:`whole-slide image reader `. \ No newline at end of file diff --git a/docs-source/source/roi_annotation.jpg b/docs-source/source/roi_annotation.jpg new file mode 100644 index 000000000..ea27b0d67 Binary files /dev/null and b/docs-source/source/roi_annotation.jpg differ diff --git a/docs-source/source/roi_filter.jpg b/docs-source/source/roi_filter.jpg new file mode 100644 index 000000000..de7e15545 Binary files /dev/null and b/docs-source/source/roi_filter.jpg differ diff --git a/docs-source/source/saliency.rst b/docs-source/source/saliency.rst new file mode 100644 index 000000000..114f26ba8 --- /dev/null +++ b/docs-source/source/saliency.rst @@ -0,0 +1,137 @@ +.. _saliency: + +Saliency Maps +============= + +Slideflow provides an API for calculating gradient-based pixel attribution (saliency maps), as implemented by `PAIR `_. Saliency maps can be calculated manually (as described below), or interactively in :ref:`Slideflow Studio `. + +:class:`slideflow.grad.SaliencyMap` provides an interface for preparing a saliency map generator from a loaded model (Tensorflow or PyTorch) and calculating maps from preprocessed images. Supported methods include: + +- Vanilla gradients +- Integrated gradients +- Guided integrated gradients +- Blur integrated gradients +- XRAI +- Grad-CAM + +Generating a Saliency Map +------------------------- + +Creating a saliency map with :class:`slideflow.grad.SaliencyMap` requires two components: a loaded model and a preprocessed image. Trained models can be loaded from disk with :func:`slideflow.model.load`, and the model's preprocessing function can be prepared with :func:`slideflow.util.get_preprocess_fn`. + +.. code-block:: python + + import slideflow as sf + + # Load a trained model and preprocessing function. + model = sf.model.load('../saved_model') + preprocess = sf.util.get_preprocess_fn('../saved_model') + + # Prepare a SaliencyMap + sal_map = SaliencyMap(model, class_idx=0) + + +There are several ways you might acquire an image to use for a saliency map. To load an image tile from a whole-slide image, you can index a :class:`slideflow.WSI` object: + +.. code-block:: python + + import slideflow as sf + + # Load a whole-slide image. + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + + # Extract a tile using grid indexing. + image = wsi[10, 25] + +.. image:: saliency_source.jpg + :width: 299px + +| + +Alternatively, if you know the coordinates for an image tile and want to extract it from TFRecords, you can use :meth:`slideflow.Dataset.read_tfrecord_by_location`: + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Get the tile from slide "12345" at location (2000, 2000) + slide, image = dataset.read_tfrecord_by_location( + slide='12345', + loc=(2000, 2000) + ) + +Once you have an image and a loaded ``SaliencyMap`` object, you can calculate a saliency map from the preprocessed image: + +.. code-block:: python + + mask = sal_map.integrated_gradients(preprocess(image)) + + +Plotting a Saliency Map +----------------------- + +Once a saliency map has been created, you can plot the image as a heatmap or as an overlay. The ``slideflow.grad`` submodule includes several utility functions to assist with plotting. For example, to plot a basic heatmap using the ``inferno`` matplotlib colormap, use :func:`slideflow.grad.plot_utils.inferno`: + +.. code-block:: python + + from PIL import Image + from slideflow.grad.plot_utils import inferno + + pil_image = Image.fromarray(inferno(mask)) + pil_image.show() + +.. image:: saliency_heatmap.jpg + :width: 299px + +| + +To plot this saliency map as an overlay, use :func:`slideflow.grad.plot_utils.overlay`, passing in both the unprocessed image and the saliency map: + +.. code-block:: python + + from PIL import Image + from slideflow.grad.plot_utils import overlay + + overlay_img = overlay(image.numpy(), mask) + pil_image = Image.fromarray(overlay_img) + pil_image.show() + +.. image:: saliency_overlay.jpg + :width: 299px + +| + +Complete Example +---------------- + +The following is a complete example for how to calculate and plot a saliency map for an image tile taken from a whole-slide image. + + +.. code-block:: python + + import slideflow as sf + from slideflow.grad import SaliencyMap + from slideflow.grad.plot_utils import overlay + from PIL import Image + + # Load a slide and find the desired image tile. + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + image = wsi[20, 20] + + # Load a model and preprocessing function. + model = sf.model.load_model(../saved_model) + preprocess = sf.util.get_preprocess_fn('../saved_model') + + # Prepare the saliency map + sal_map = SaliencyMap(model, class_idx=0) + + # Calculate saliency map using integrated gradients. + ig_map = sal_map.integrated_gradients(preprocess(image)) + + # Display the saliency map as an overlay. + overlay_img = overlay(image, ig_map) + Image.fromarray(overlay_img).show() diff --git a/docs-source/source/saliency_heatmap.jpg b/docs-source/source/saliency_heatmap.jpg new file mode 100644 index 000000000..09e8290e5 Binary files /dev/null and b/docs-source/source/saliency_heatmap.jpg differ diff --git a/docs-source/source/saliency_overlay.jpg b/docs-source/source/saliency_overlay.jpg new file mode 100644 index 000000000..c7963636e Binary files /dev/null and b/docs-source/source/saliency_overlay.jpg differ diff --git a/docs-source/source/saliency_source.jpg b/docs-source/source/saliency_source.jpg new file mode 100644 index 000000000..e5b324295 Binary files /dev/null and b/docs-source/source/saliency_source.jpg differ diff --git a/docs-source/source/segmentation.rst b/docs-source/source/segmentation.rst new file mode 100644 index 000000000..11aac6600 --- /dev/null +++ b/docs-source/source/segmentation.rst @@ -0,0 +1,292 @@ +.. currentmodule:: slideflow.segmentation + +.. _segmentation: + +Tissue Segmentation +=================== + +In addition to classification tasks, Slideflow also supports training and deploying whole-slide tissue segmentation models. Segmentation models identify and label regions of interest in a slide, and can be used for tasks such as tumor identification, tissue labeling, or quality control. Once trained, these models can be used for :ref:`slide QC `, generating :ref:`regions of interest `, or live deployment in :ref:`Slideflow Studio `. + +.. note:: + + Tissue segmentation requires PyTorch. Dependencies can be installed with ``pip install slideflow[torch]``. + +Segmentation Modes +------------------ + +Tissue segmentation is performed at the whole-slide level, trained on randomly cropped sections of the slide thumbnail at a specified resolution. Slideflow supports three segmentation modes: + +- ``'binary'``: For binary segmentation, the goal is to differentiate a single tissue type from background. +- ``'multiclass'``: For multiclass segmentation, the goal is twofold: differentiate tissue from background, and assign a class label to each identified region. This is useful in instances where regions have non-overlapping labels. +- ``'multilabel'``: For multilabel segmentation, the goal is to assign each tissue type to a class, but regions may have overlapping labels. + +Generating Data +--------------- + +.. note:: + Segmentation thumbnails and masks do not need to be explicitly exported prior to training. They will be generated automatically during training if they do not exist. However, exporting them beforehand can be useful for data visualization, troubleshooting, and computational efficiency. + + +Segmentation models in Slideflow are trained on regions of interest, which can be generated as discussed in :ref:`regions_of_interest` and :ref:`studio_roi`. Once ROIs have been generated and (optionally) labeled, whole-slide thumbnails and ROI masks can be exported using ``segment.export_thumbs_and_masks()``. The ``mpp`` argument specifies the resolution of the exported images in microns-per-pixel. We recommend ``mpp=20`` for a good balance between image size and memory requirements, or ``mpp=10`` for tasks needing higher resolution. + +.. code-block:: python + + from slideflow import segment + + # Load a project and dataset + project = slideflow.load_project('path/to/project') + dataset = project.dataset() + + # Export thumbnails and masks + segment.export_thumbs_and_masks( + dataset, + mpp=20, # Microns-per-pixel resolution + dest='path/to/output' + ) + +By default, ROIs are exported as binary masks. To export multidimensional masks for multiclass or multilabel applications, use the ``mode`` and ``labels`` arguments. When ``mode`` is ``'multiclass'`` or ``'multilabel'``, masks will be exported in (N, W, H) format, where N is the number of unique ROI labels. The ``labels`` argument should be a list of strings corresponding to the ROI labels in the dataset that should be included. + +.. code-block:: python + + ... + + # Export thumbnails and masks + segment.export_thumbs_and_masks( + dataset, + mpp=20, # Microns-per-pixel resolution + dest='path/to/output', + mode='multiclass', + labels=['tumor', 'stroma', 'necrosis'] + ) + + +Training a Model +---------------- + +Segmentation models are configured using a :class:`segment.SegmentConfig` object. This object specifies the model architecture, image resolution (MPP), training parameters, and other settings. For example, to configure a model for multiclass segmentation with a resolution of 20 MPP, use: + +.. code-block:: python + + from slideflow import segment + + # Create a config object + config = segment.SegmentConfig( + mpp=20, # Microns-per-pixel resolution + size=1024, # Size of cropped/rotated images during training + mode='multiclass', + labels=['tumor', 'stroma', 'necrosis'], + arch='Unet', + encoder_name='resnet34', + train_batch_size=16, + epochs=10, + lr=1e-4, + ) + +Slideflow uses the `segmentation_models_pytorch `_ library to implement segmentation models. The ``arch`` argument specifies the model architecture, and the ``encoder_name`` argument specifies the encoder backbone. See available models and encoders in the `segmentation_models_pytorch documentation `_. + +The segmentation model can then be trained using the :func:`segment.train` function. This function takes a :class:`segment.SegmentConfig` object and a :class:`slideflow.Dataset` object as arguments. During training, segmentation thumbnails and masks are randomly cropped to the specified ``size``, and images/masks then undergo augmentation with random flipping/rotating. + +For example, to train a model for binary segmentation with a resolution of 20 MPP, use: + +.. code-block:: python + + from slideflow import segment + + # Create a config object + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + + # Train the model + segment.train(config, dataset, dest='path/to/output') + +To use thumbnails and masks previously exported with :func:`segment.export_thumbs_and_masks`, specify the path to the exported data using the ``data_source`` argument. This is more computationally efficient than generating data on-the-fly during training. For example: + +.. code-block:: python + + from slideflow import segment + + # Export thumbnails and masks + segment.export_thumbs_and_masks(dataset, mpp=20, dest='masks/') + + # Create a config object + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + + # Train the model + segment.train(config, dataset, data_source='masks/', dest='path/to/output') + +After training, the model will be saved as a ``model.pth`` file in the destination directory specified by ``dest``, and the model configuration will be saved as a ``segment_config.json`` file. + +Model Inference +--------------- + +After training, models can be loaded using :func:`segment.load_model_and_config`. This function takes a path to a model file as an argument, and returns a tuple containing the model and configuration object. For example: + +.. code-block:: python + + from slideflow import segment + + # Load the model and config + model, config = segment.load_model_and_config('path/to/model.pth') + +To run inference on a slide, use the :meth:`segment.SegmentModel.run_slide_inference` method. This method takes a :class:`slideflow.WSI` object or str (path to slide) as an argument, and returns an array of pixel-level predictions. For binary models, the output shape will be ``(H, W)``. For multiclass models, the output shape will be ``(N+1, H, W)`` (the first channel is predicted background), and for multilabel models, the output shape will be ``(N, H, W)``, where ``N`` is the number of labels. + +.. code-block:: python + + from slideflow import segment + + # Load the model and config + model, config = segment.load_model_and_config('path/to/model.pth') + + # Run inference, returning an np.ndarray + pred = model.run_slide_inference('/path/to/slide') + +You can also run inference directly on an arbitrary image using the :meth:`segment.SegmentModel.run_tiled_inference` method. This method takes an image array (np.ndarray, in W, H, C format) as an argument, and returns an array of pixel-level predictions. Predictions are generated in tiles and merged. The output shape will be ``(H, W)`` for binary models, ``(N+1, H, W)`` for multiclass models, and ``(N, H, W)`` for multilabel models. + +Generating QC Masks +------------------- + +The :class:`slideflow.slide.qc.Segment` class provides an easy interface for generating QC masks from a segmentation model. This class takes a path to a trained segmentation model as an argument, and can be used for QC :ref:`as previously described `. For example: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load a project and dataset + project = sf.load_project('path/to/project') + dataset = project.dataset(299, 302) + + # Create a QC mask + segmenter = qc.Segment('/path/to/model.pth') + + # Extract tiles with this QC + dataset.extract_tiles(..., qc=segmenter) + +You can also use this interface for applying QC to a single slide: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load the slide + wsi = sf.WSI('/path/to/slide', ...) + + # Create the QC algorithm + segmenter = qc.Segment('/path/to/model.pth') + + # Apply QC + applied_mask = wsi.qc(segmenter) + +For binary models, the QC mask will filter out tiles that are predicted to be background. + +For multiclass models, the QC mask will filter out tiles predicted to be background (class index 0). This can be customized by setting ``class_idx`` to another value. For example, to create a QC algorithm that filters out tiles predicted to be tumor (class index 1), use: + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', class_idx=1) + +For multilabel models, the QC mask will filter out tiles predicted to be background for all class labels. This can be customized to filter out tiles based only on a specific class label by setting ``class_idx``. For example, to create a QC algorithm that filters out tiles that are not predicted to be tumor (class index 1) while ignoring predictions for necrosis (class index 2), use: + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', class_idx=1) + +In all cases, the thresholding direction can be reversed with by setting ``threshold_direction='greater'``. This might be useful, for example, if the segmentation model was trained to identify pen marks or artifacts, and you want to filter out areas predicted to be artifacts. + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', threshold_direction='greater') + +Generating ROIs +--------------- + +The :class:`slideflow.slide.qc.Segment` also provides an easy interface for generating regions of interest (ROIs). Use :meth:`slideflow.slide.qc.Segment.generate_rois` method to generate and apply ROIs to a slide. If the segmentation model is multiclass or multilabel, generated ROIs will be labeled. For example: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load a project and dataset + wsi = sf.WSI('/path/to/slide', ...) + + # Create a QC mask + segmenter = qc.Segment('/path/to/model.pth') + + # Generate and apply ROIs to a slide + roi_outlines = segmenter.generate_rois(wsi) + +By default, this will apply generated ROIs directly to the :class:`slideflow.WSI` object. If you wish to calculate ROI outlines without applying them to the slide, use the argument ``apply=False``. + +In addition to generating ROIs for a single slide, you can also generate ROIs for an entire dataset using :meth:`slideflow.Dataset.generate_rois`. For example: + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Generate ROIs for all slides in the dataset. + dataset.generate_rois('path/to/model.pth') + +ROIs will be saved in the ROIs directory as configured in the dataset settings. Alternatively, ROIs can be exported to a user-defined directory using the ``dest`` argument. + +By default, ROIs will be generated for all slides in the dataset, skipping slides with existing ROIs. To overwrite any existing ROIs, use the ``overwrite=True`` argument. + + +Deployment in Studio +-------------------- + +.. video:: tissue_seg.mp4 + :autoplay: + +| + +Segmentation models can be deployed in :ref:`Slideflow Studio ` for live segmentation and QC. To do this, start by training a segmentation model as described above. Then, see the :ref:`studio_segmentation` documentation for instructions on how to deploy the model for live QC and/or ROI generation. + + +Complete Example +---------------- + +1. Label ROIs +************* + +Create labeled ROIs as described in :ref:`studio_roi`. + +2. Train a model +**************** + +.. code-block:: python + + import slideflow as sf + from slideflow import segment + + # Load a project and dataset + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Train a binary segmentation model + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + segment.train(config, dataset, dest='path/to/output') + +3. Generate ROIs (optional) +*************************** + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Generate ROIs for all slides in the dataset. + dataset.generate_rois('path/to/model.pth') + +4. Deploy in Studio +******************* + +Use the model for either QC or ROI generation in Slideflow Studio, as described in :ref:`studio_segmentation`. + diff --git a/docs-source/source/simclr.rst b/docs-source/source/simclr.rst new file mode 100644 index 000000000..20fd6d6d1 --- /dev/null +++ b/docs-source/source/simclr.rst @@ -0,0 +1,16 @@ +.. currentmodule:: slideflow.simclr + +slideflow.simclr +================ + +This module contains utility functions for training a SimCLR model. Please see +:ref:`simclr_ssl` for more information on the high-level API and recommended use. + +.. autofunction:: slideflow.simclr.get_args +.. autofunction:: slideflow.simclr.load +.. autofunction:: slideflow.simclr.load_model_args +.. autofunction:: slideflow.simclr.run_simclr + +.. autoclass:: slideflow.simclr.SimCLR +.. autoclass:: slideflow.simclr.SimCLR_Args +.. autoclass:: slideflow.simclr.DatasetBuilder diff --git a/docs-source/source/slide.rst b/docs-source/source/slide.rst index 12978469d..a1e97548e 100644 --- a/docs-source/source/slide.rst +++ b/docs-source/source/slide.rst @@ -1,19 +1,68 @@ .. currentmodule:: slideflow.slide slideflow.slide -===================== +=============== This module contains classes to load slides and extract tiles. For optimal performance, tile extraction should generally not be performed by instancing these classes directly, but by calling either :func:`slideflow.Project.extract_tiles` or :func:`slideflow.Dataset.extract_tiles`, which include performance optimizations and additional functionality. -WSI -*** +slideflow.WSI +************* + .. autoclass:: WSI - :inherited-members: -TMA -*** -.. autoclass:: TMA - :inherited-members: \ No newline at end of file +Attributes +---------- + +.. autosummary:: + + WSI.dimensions + WSI.qc_mask + WSI.levels + WSI.level_dimensions + WSI.level_downsamples + WSI.level_mpp + WSI.properties + WSI.slide + WSI.vendor + +Methods +------- + +.. autofunction:: slideflow.WSI.align_to +.. autofunction:: slideflow.WSI.align_tiles_to +.. autofunction:: slideflow.WSI.apply_qc_mask +.. autofunction:: slideflow.WSI.apply_segmentation +.. autofunction:: slideflow.WSI.area +.. autofunction:: slideflow.WSI.build_generator +.. autofunction:: slideflow.WSI.dim_to_mpp +.. autofunction:: slideflow.WSI.get_tile_mask +.. autofunction:: slideflow.WSI.get_tile_dataframe +.. autofunction:: slideflow.WSI.extract_cells +.. autofunction:: slideflow.WSI.extract_tiles +.. autofunction:: slideflow.WSI.export_rois +.. autofunction:: slideflow.WSI.has_rois +.. autofunction:: slideflow.WSI.load_csv_roi +.. autofunction:: slideflow.WSI.load_json_roi +.. autofunction:: slideflow.WSI.load_roi_array +.. autofunction:: slideflow.WSI.mpp_to_dim +.. autofunction:: slideflow.WSI.predict +.. autofunction:: slideflow.WSI.preview +.. autofunction:: slideflow.WSI.process_rois +.. autofunction:: slideflow.WSI.show_alignment +.. autofunction:: slideflow.WSI.square_thumb +.. autofunction:: slideflow.WSI.qc +.. autofunction:: slideflow.WSI.remove_qc +.. autofunction:: slideflow.WSI.remove_roi +.. autofunction:: slideflow.WSI.tensorflow +.. autofunction:: slideflow.WSI.torch +.. autofunction:: slideflow.WSI.thumb +.. autofunction:: slideflow.WSI.verify_alignment +.. autofunction:: slideflow.WSI.view + +Other functions +*************** + +.. autofunction:: slideflow.slide.predict \ No newline at end of file diff --git a/docs-source/source/slide_filter.jpg b/docs-source/source/slide_filter.jpg new file mode 100644 index 000000000..f33468f6b Binary files /dev/null and b/docs-source/source/slide_filter.jpg differ diff --git a/docs-source/source/slide_processing.rst b/docs-source/source/slide_processing.rst new file mode 100644 index 000000000..6b15b6275 --- /dev/null +++ b/docs-source/source/slide_processing.rst @@ -0,0 +1,324 @@ +.. _filtering: + +Slide Processing +================ + +.. image:: tile_extraction_overview.png + +| + +Whole-slide histopathological images present many challenges for machine learning researchers, as these large gigapixel images may contain out-of-focus regions, pen marks, uneven staining, or varying optical resolutions. Slideflow provides tools for both flexible and computationally efficient slide processing in order to build datasets ready for machine learning applications. + +Most tools in Slideflow work with image tiles - extracted sub-regions of a whole-slide image - as the primary data source. For efficiency, image tiles are first buffered into :ref:`TFRecords ` , a binary file format that greatly improves IO throughput. Although training can be performed without using TFRecords (see :ref:`from_wsi`), we recommend tile extraction as the first step for most projects. + +Tile extraction +*************** + +Image tiles are extracted from whole-slide images using either :meth:`slideflow.Project.extract_tiles` or :meth:`slideflow.Dataset.extract_tiles`. When using the Project interface, the only arguments required are ``tile_px`` and ``tile_um``, which determine the size of the extracted image tiles in pixels and microns: + +.. code-block:: python + + P.extract_tiles(tile_px=299, tile_um=302) + +and when using a :class:`slideflow.Dataset`, no arguments are required. + +.. code-block:: python + + dataset.extract_tiles() + +Tiles will be extracted at the specified pixel and micron size and stored in TFRecord format. Loose image tiles (\*.jpg or \*.png format) can also be saved with the argument ``save_tiles=True``. + +See the :meth:`slideflow.Dataset.extract_tiles` API documentation for customization options. + +.. note:: + + Slide scanners may have differing microns-per-pixel (MPP) resolutions, so "10X" magnification from one scanner may be slightly different than "10X" on another scanner. Specifying a fixed ``tile_um`` ensures all image tiles have both the same pixel size and micron size. This MPP-harmonization step uses the `Libvips resize `_ function on extracted images. To disable this step and instead extract tiles at a given `downsample layer `_, set ``tile_um`` equal to a magnification level rather than micron size: + + .. code-block:: python + + P.extract_tiles(tile_px=299, tile_um="10x") + +Cell segmentation +***************** + +An alternative to extracting tiles in a grid across whole-slide images is extracting tiles at detected cell centroids. This is discussed separately in :ref:`cellseg`. + +.. _regions_of_interest: + +Regions of Interest +******************* + +Tile extraction can be optionally restricted based on pathologist-annotated Regions of Interest (ROI), allowing you to enrich your dataset by only using relevant sections of a slide. + +We offer two methods for annotating ROIs - :ref:`Slideflow Studio ` and `QuPath `_. Please see the Slideflow Studio section for instructions on generating ROI annotations using the Slideflow interface. + +If you are using QuPath, annotate whole-slide images using the Polygon tool. Then, click **Automate** -> **Show script editor**. In the box that comes up, click **File** -> **Open** and load the ``qupath_roi.groovy`` script (QuPath 0.2 or greater) or ``qupath_roi_legacy.groovy`` (QuPath 0.1.x), scripts `available on GitHub `_. Click **Run** -> **Run** if using QuPath 0.2 or greater, or **Run** -> **Run for Project** if using QuPath 0.1.x. ROIs will be exported in CSV format in the QuPath project directory, in the subdirectory "ROI". + +Once ROI CSV files are generated, ensure they are placed in the folder expected by your :ref:`Project ` or :ref:`Dataset ` based on their respective configurations. + +The ``roi_method`` argument to the ``extract_tiles()`` functions allow you to control how ROIs are used. Options include: + +- ``'auto'``: Default behavior. For slides with a valid ROI, extract tiles from within ROIs only. For slides without ROIs, extract from the whole-slide image. +- ``'inside'``: Extract from within ROIs, and skip any slides missing ROIs. +- ``'outside'``: Extract from outside ROIs, and skip any slides missing ROIs. +- ``'ignore'``: Ignore all ROIs, extracting from whole-slide images. + +.. note:: + + Nested ROIs will be rendered as holes. + +By default, ROIs filter tiles based on the center point of the tile. Alternatively, you can filter tiles based on the proportion of the tile inside an ROI by using the argument ``roi_filter_method``. If ``roi_filter_method`` is set to a float (0-1), this value will be interpreted as a proportion threshold. If the proportion of a tile inside an ROI is greater than this number, the tile is included. For example, if ``roi_filter_method=0.7``, a tile that is 80% inside of an ROI will be included, but a tile that is only 60% inside of an ROI will be excluded. + +.. image:: roi_filter.jpg + +| + +.. _roi_labels: + +ROIs can optionally be assigned a label. Labels can be added or changed using :ref:`Slideflow Studio `, or by adding a "label" column in the ROI CSV file. Labels can be used to train strongly supervised models, where each tile is assigned a label based on the ROI it is extracted from, rather than inheriting the label of the whole-slide image. See the developer note :ref:`tile_labels` for more information. + +To retrieve the ROI name (and label, if present) for all tiles in a slide, use :meth:`slideflow.WSI.get_tile_dataframe`. This will return a Pandas DataFrame with the following columns: + + - **loc_x**: X-coordinate of tile center + - **loc_y**: Y-coordinate of tile center + - **grid_x**: X grid index of the tile + - **grid_y**: Y grid index of the tile + - **roi_name**: Name of the ROI if tile is in an ROI, else None + - **roi_desc**: Description of the ROI if tile is in ROI, else None + - **label**: ROI label, if present. + +The **loc_x** and **loc_y** columns contain the same tile location information :ref:`stored in TFRecords `. + +You can also retrieve this information for all slides in a dataset by using :meth:`slideflow.Dataset.get_tile_dataframe`, which will return a DataFrame with the same columns as above, plus ``slide`` column. + + +Masking & Filtering +******************* + +Slideflow provides two approaches for refining where image tiles should be extracted from whole-slide images: **slide-level masking** and **tile-level filtering**. In these next sections, we'll review options for both approaches. + +Otsu's thresholding +------------------- + +.. image:: otsu.png + +| + +Otsu's thresholding is a **slide-based method** that distinguishes foreground (tissue) from background (empty slide). Otsu's thresholding is performed in the HSV colorspace and yields similar results to grayspace filtering, a tile-level filtering method described below. + +To apply Otsu's thresholding to slides before tile extraction, use the ``qc`` argument of the ``.extract_tiles()`` functions. + +.. code-block:: python + + from slideflow.slide import qc + + # Use this QC during tile extraction + P.extract_tiles(qc=qc.Otsu()) + + +You can also apply Otsu's thresholding to a single slide with the :meth:`slideflow.WSI.qc` method. See :class:`the WSI API documentation ` for more information on working with individual slides. + +.. code-block:: python + + # Apply Otsu's thresholding to a WSI object + wsi = sf.WSI(...) + wsi.qc(qc).show() + + +Gaussian blur filtering +----------------------- + +.. image:: blur.png + +| + +Gaussian blur masking is another **slide-based method** that can detect pen marks and out-of-focus areas, and is particularly useful for datasets lacking annotated Regions of Interest (ROIs). Gaussian blur masking is applied similarly, using the ``qc`` argument. + +Two versions of Gaussian blur masking are available: ``qc.Gaussian`` and ``qc.GaussianV2`` (new in Slideflow 2.1.0). The latter is the default and recommended version, as it is more computationally efficient. The former is provided for backwards compatibility. + +.. code-block:: python + + from slideflow.slide import qc + + # Use this QC during tile extraction + P.extract_tiles(qc=qc.GaussianV2()) + +By default, Gaussian blur masking is calculated at 4 times lower magnification than the tile extraction MPP (e.g., when extracting tiles at 10X effective magnification, Gaussian filtering would be calculated at 2.5X). This is to reduce computation time. You can change this behavior by manually setting the ``mpp`` argument to a specific microns-per-pixel value. + +Gaussian blur masking is performed on gray images. The ``sigma`` argument controls the standard deviation of the Gaussian blur kernel. The default value of 3 is recommended, but you may need to adjust this value for your dataset. A higher value will result in more areas being masked, while a lower value will result in fewer areas being masked. + +.. code-block:: python + + from slideflow.slide import qc + + # Customize the Gaussian filter, + # using a sigma of 2 and a mpp of 1 (10X magnification) + gaussian = qc.GaussianV2(mpp=1, sigma=2) + +You can also use multiple slide-level masking methods by providing a list to ``qc``. + +.. code-block:: python + + from slideflow.slide import qc + + qc = [ + qc.Otsu(), + qc.Gaussian() + ] + P.extract_tiles(qc=qc) + +If both Otsu's thresholding and blur detection are being used, Slideflow will calculate Blur Burden, a metric used to assess the degree to which non-background tiles are either out-of-focus or contain artifact. In the tile extraction PDF report that is generated (see next section), the distribution of blur burden for slides in the dataset will be plotted on the first page. The report will contain the number of slides meeting criteria for warning, when the blur burden exceeds 5% for a given slide. A text file containing names of slides with high blur burden will be saved in the exported TFRecords directory. These slides should be manually reviewed to ensure they are of high enough quality to include in the dataset. + +DeepFocus +--------- + +Slideflow also provides an interface for using `DeepFocus `_ to identify in-focus regions. DeepFocus is a lightweight neural network that predicts whether a section of a slide is in- or out-of-focus. When used as a slide-level masking method, DeepFocus will filter out-of-focus tiles from a slide. By default, DeepFocus is applied to slides at 40X magnification, although this can be customized with the ``tile_um`` argument. + +.. code-block:: python + + from slideflow.slide import qc + + deepfocus = qc.DeepFocus(tile_um='20x') + slide.qc(deepfocus) + +Alternatively, you can also retrieve raw predictions from the DeepFocus model for a slide by calling the deepfocus object on a :class:`slideflow.WSI` object, passing the argument threshold=False: + +.. code-block:: python + + preds = deepfocus(slide, threshold=False) + +Custom deep learning QC +----------------------- + +You can also create your own deep learning slide filters. To create a custom deep learning QC method like DeepFocus, create a custom slide filter that inherits :class:`slideflow.slide.qc.StridedDL`. For example, to manually recreate the above DeepFocus model, first clone the `TF2 fork on GitHub `_, which contains the DeepFocus architecture and model weights, and create a custom class as below: + +.. code-block:: python + + from slideflow.slide.qc import strided_dl + from deepfocus.keras_model import load_checkpoint, deepfocus_v3 + + class CustomDeepFocus(strided_dl.StridedDL): + + def __init__(self): + model = deepfocus_v3() + checkpoint = '/path/to/deepfocus/checkpoints/ver5' + load_checkpoint(model, checkpoint) + super().__init__( + model=model, + pred_idx=1, + tile_px=64, + tile_um='40x' + ) + +Then, supply this class to the ``qc`` argument as above. + +.. code-block:: python + + P.extract_tiles(qc=CustomDeepFocus()) + + +See :ref:`qc` for more information on the API for further QC customization. + +Segmentation Models (U-Net) +--------------------------- + +Slideflow also provides an interface for both training and using segmentation models (e.g. U-Net, FPN, DeepLabV3) for slide-level masking. This is discussed separately in :ref:`segmentation`. + +Grayspace filtering +-------------------- + +Grayspace filtering is a **tile-based method** that detects the amount of grayspace in a given image tile and discards the tile if the content exceeds a set threshold. RGB image tiles are converted to the HSV spectrum, and the fraction of pixels with saturation below a certain threshold is calculated. This filtering is performed separately for each tile as it is being extracted. Relevant arguments for grayspace filtering include: + + +- ``grayspace_threshold``: Saturation value, below which a pixel is considered gray. Range 0-1. Defaults to 0.05. +- ``grayspace_fraction``: Image tiles with grayspace above this fraction will be discarded. Defaults to 0.6. + +Grayspace filtering is enabled by default, and can be disabled by passing ``grayspace_fraction=1`` to the ``.extract_tiles()`` functions. + +Grayspace filtering is similar to Otsu's thresholding, with both operating in the HSV colorspace. Otsu's thresholding is ~30% faster than grayspace filtering for slides with accessible downsample layers, but if downsample layers are not stored in a given slide or are inaccessible (e.g. ``enable_downsample=False``), grayspace filtering may be faster. Grayspace filtering is more reliable than Otsu's thresholding for slides with abundant pen marks or other artifact, which can present issues for the Otsu's thresholding algorithm. + +Whitepsace filtering +-------------------- + +Whitespace filtering is performed similarly to grayspace filtering. Whitespace is calculated using overall brightness for each pixel, then counting the fraction of pixels with a brightness above some threshold. As with grayspace filtering, there are two relevant arguments: + + +- ``whitespace_threshold``: Brightness value, above which a pixel is considered white. Range 0-255. Defaults to 230. +- ``whitespace_fraction``: Image tiles with whitespace above this fraction will be discarded. Defaults to 1.0 (disabled). + +Whitespace filtering is disabled by default. + +Stain normalization +******************* + +.. image:: norm_compare/wsi_norm_compare.jpg + +Image tiles can undergo digital Hematoxylin and Eosin (H&E) stain normalization either during tile extraction or in real-time during training. Real-time normalization adds CPU overhead during training and inference but offers greater flexibility, allowing you to test different normalization strategies without re-extracting tiles from your entire dataset. + +Available stain normalization algorithms include: + +- **macenko**: `Original Macenko paper `_. +- **macenko_fast**: Modified Macenko algorithm with the brightness standardization step removed. +- **reinhard**: `Original Reinhard paper `_. +- **reinhard_fast**: Modified Reinhard algorithm with the brightness standardization step removed. +- **reinhard_mask**: Modified Reinhard algorithm, with background/whitespace removed. +- **reinhard_fast_mask**: Modified Reinhard-Fast algorithm, with background/whitespace removed. +- **vahadane**: `Original Vahadane paper `_. +- **augment**: HSV colorspace augmentation. +- **cyclegan**: CycleGAN-based stain normalization, as implemented by `Zingman et al `_ (PyTorch only) + +The Macenko and Reinhard stain normalizers are highly efficient, with native Tensorflow, PyTorch, and Numpy/OpenCV implementations, and support GPU acceleration (see :ref:`performance benchmarks `). + +During tile extraction +---------------------- + +Image tiles can be normalized during tile extraction by using the ``normalizer`` and ``normalizer_source`` arguments. ``normalizer`` is the name of the algorithm. The normalizer source - either a path to a reference image, or a ``str`` indicating one of our presets (e.g. ``'v1'``, ``'v2'``, ``'v3'``) - can also be set with ``normalizer_source``. + +.. code-block:: python + + P.extract_tiles( + tile_px=299, + tile_um=302, + normalizer='reinhard' + ) + +:ref:`Contextual stain normalization ` is supported when normalizing during tile extraction. + +On-the-fly +---------- + +The stain normalization implementations in Slideflow are fast and efficient, with separate Tensorflow-native, PyTorch-native, and Numpy/OpenCV implementations. In most instances, we recommend performing stain normalization on-the-fly as a part of image pre-processing, as this provides flexibility for changing normalization strategies without re-extracting all of your image tiles. + +Real-time normalization can be performed by setting the ``normalizer`` and/or ``normalizer_source`` hyperparameters. + +.. code-block:: python + + from slideflow.model import ModelParams + hp = ModelParams(..., normalizer='reinhard') + +If a model was trained using a normalizer, the normalizer algorithm and fit information will be stored in the model metadata file, ``params.json``, in the saved model folder. Any Slideflow function that uses this model will automatically process images using the same normalization strategy. + +When stain normalizing on-the-fly, stain augmentation becomes available as a training augmentation technique. Read more about :ref:`stain augmentation `. + +The normalizer interfaces can also be access directly through :class:`slideflow.norm.StainNormalizer`. See :py:mod:`slideflow.norm` for examples and more information. + +Performance optimization +************************ + +As tile extraction is heavily reliant on random access reading, significant performance gains can be experienced by either 1) moving all slides to an SSD, or 2) utilizing an SSD or ramdisk buffer (to which slides will be copied prior to extraction). The use of a ramdisk buffer can improve tile extraction speed by 10-fold or greater! To maximize performance, pass the buffer path to the argument ``buffer``. + +Extraction reports +****************** + +Once tiles have been extracted, a PDF report will be generated with a summary and sample of tiles extracted from their corresponding slides. An example of such a report is given below. Reviewing this report may enable you to identify data corruption, artifacts with stain normalization, or suboptimal background filtering. The report is saved in the TFRecords directory. + +.. image:: example_report_small.jpg + +In addition to viewing reports after tile extraction, you may generate new reports on existing tfrecords with :func:`slideflow.Dataset.tfrecord_report`, by calling this function on a given dataset. For example: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + dataset.tfrecord_report("/path/to/dest") + +You can also generate reports for slides that have not yet been extracted by passing ``dry_run=True`` to :meth:`slideflow.Dataset.extract_tiles`. diff --git a/docs-source/source/slide_qc.rst b/docs-source/source/slide_qc.rst new file mode 100644 index 000000000..d38e0be04 --- /dev/null +++ b/docs-source/source/slide_qc.rst @@ -0,0 +1,36 @@ +.. currentmodule:: slideflow.slide.qc + +.. _qc: + +slideflow.slide.qc +================== + +This module contains functions for slide-level quality control, including Otsu's thresholding and Gaussian blur filtering. Quality control methods are used by passing a list of callables to the ``qc`` argument of ``.extract_tiles()``. They can also be directly applied to a slide with :meth:`slideflow.WSI.qc`. + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Define custom QC options + qc = [ + qc.Otsu(), + qc.Gaussian(sigma=2) + ] + + # Use this QC during tile extraction + P.extract_tiles(qc=qc) + + # Alternatively, you can use the same QC directly on a WSI object + wsi = sf.WSI(...) + wsi.qc(qc).show() + +.. autoclass:: Otsu + +.. autoclass:: Gaussian + +.. autoclass:: Save + +.. autoclass:: Load + +.. autoclass:: StridedDL \ No newline at end of file diff --git a/docs-source/source/slideflow.rst b/docs-source/source/slideflow.rst new file mode 100644 index 000000000..edec27e5d --- /dev/null +++ b/docs-source/source/slideflow.rst @@ -0,0 +1,11 @@ +.. currentmodule:: slideflow + +slideflow +========= + +.. autofunction:: slideflow.about +.. autofunction:: slideflow.build_feature_extractor +.. autofunction:: slideflow.create_project +.. autofunction:: slideflow.load_project +.. autofunction:: slideflow.getLoggingLevel +.. autofunction:: slideflow.setLoggingLevel diff --git a/docs-source/source/slideflow_cellseg.rst b/docs-source/source/slideflow_cellseg.rst new file mode 100644 index 000000000..b30e22dbe --- /dev/null +++ b/docs-source/source/slideflow_cellseg.rst @@ -0,0 +1,23 @@ +.. currentmodule:: slideflow.cellseg + +slideflow.cellseg +================= + +This module contains utility functions for performing whole-slide image cell segmentation with Cellpose. + +See :ref:`cellseg` for more information. + +.. autofunction:: segment_slide + +Segmentation +************ +.. autoclass:: Segmentation +.. autofunction:: slideflow.cellseg.Segmentation.apply_rois +.. autofunction:: slideflow.cellseg.Segmentation.calculate_centroids +.. autofunction:: slideflow.cellseg.Segmentation.calculate_outlines +.. autofunction:: slideflow.cellseg.Segmentation.centroids +.. autofunction:: slideflow.cellseg.Segmentation.centroid_to_image +.. autofunction:: slideflow.cellseg.Segmentation.extract_centroids +.. autofunction:: slideflow.cellseg.Segmentation.mask_to_image +.. autofunction:: slideflow.cellseg.Segmentation.outline_to_image +.. autofunction:: slideflow.cellseg.Segmentation.save \ No newline at end of file diff --git a/docs-source/source/slidemap.rst b/docs-source/source/slidemap.rst new file mode 100644 index 000000000..7dc67546b --- /dev/null +++ b/docs-source/source/slidemap.rst @@ -0,0 +1,61 @@ +.. currentmodule:: slideflow + +slideflow.SlideMap +================== + +:class:`slideflow.SlideMap` assists with visualizing tiles and slides in two-dimensional space. + +Once a model has been trained, tile-level predictions and intermediate layer activations can be calculated +across an entire dataset with :class:`slideflow.DatasetFeatures`. +The :class:`slideflow.SlideMap` class can then perform dimensionality reduction on these dataset-wide +activations, plotting tiles and slides in two-dimensional space. Visualizing the distribution and clustering +of tile-level and slide-level layer activations can help reveal underlying structures in the dataset and shared +visual features among classes. + +The primary method of use is first generating an :class:`slideflow.DatasetFeatures` from a trained +model, then using :meth:`slideflow.DatasetFeatures.map_activations`, which returns an instance of +:class:`slideflow.SlideMap`. + +.. code-block:: python + + ftrs = sf.DatasetFeatures(model='/path/', ...) + slide_map = ftrs.map_activations() + +Alternatively, if you would like to map slides from a dataset in two-dimensional space using pre-calculated *x* and *y* +coordinates, you can use the :meth:`sldieflow.SlideMap.from_xy` class method. In addition to X and Y, this method +requires supplying tile-level metadata in the form of a list of dicts. Each dict must contain the name of the origin +slide and the tile index in the slide TFRecord. + +.. code-block:: python + + x = np.array(...) + y = np.array(...) + slides = ['slide1', 'slide1', 'slide5', ...] + slide_map = sf.SlideMap.from_xy(x=x, y=y, slides=slides) + +.. autoclass:: SlideMap + +Methods +------- + +.. autofunction:: slideflow.SlideMap.activations +.. autofunction:: slideflow.SlideMap.build_mosaic +.. autofunction:: slideflow.SlideMap.cluster +.. autofunction:: slideflow.SlideMap.neighbors +.. autofunction:: slideflow.SlideMap.filter +.. autofunction:: slideflow.SlideMap.umap_transform +.. autofunction:: slideflow.SlideMap.label +.. autofunction:: slideflow.SlideMap.label_by_preds +.. autofunction:: slideflow.SlideMap.label_by_slide +.. autofunction:: slideflow.SlideMap.label_by_uncertainty +.. autofunction:: slideflow.SlideMap.load +.. autofunction:: slideflow.SlideMap.load_coordinates +.. autofunction:: slideflow.SlideMap.load_umap +.. autofunction:: slideflow.SlideMap.plot +.. autofunction:: slideflow.SlideMap.plot_3d +.. autofunction:: slideflow.SlideMap.save +.. autofunction:: slideflow.SlideMap.save_3d +.. autofunction:: slideflow.SlideMap.save_plot +.. autofunction:: slideflow.SlideMap.save_coordinates +.. autofunction:: slideflow.SlideMap.save_umap +.. autofunction:: slideflow.SlideMap.save_encoder diff --git a/docs-source/source/ssl.rst b/docs-source/source/ssl.rst new file mode 100644 index 000000000..7b7d0c661 --- /dev/null +++ b/docs-source/source/ssl.rst @@ -0,0 +1,152 @@ +.. currentmodule:: slideflow.simclr + +.. _simclr_ssl: + +Self-Supervised Learning (SSL) +============================== + +Slideflow provides easy access to training the self-supervised, contrastive learning framework `SimCLR `_. Self-supervised learning provides an avenue for learning useful visual representations in your dataset without requiring ground-truth labels. These visual representations can be exported as feature vectors and used for downstream analyses such as :ref:`dimensionality reduction ` or :ref:`multi-instance learning `. + +The ``slideflow.simclr`` module contains a `forked Tensorflow implementation `_ minimally modified to interface with Slideflow. SimCLR models can be trained with :meth:`slideflow.Project.train_simclr`, and SimCLR features can be calculated as with other models using :meth:`slideflow.Project.generate_features`. + +Training SimCLR +*************** + +First, determine the SimCLR training parameters with :func:`slideflow.simclr.get_args`. This function accepts parameters via keyword arguments, such as ``learning_rate`` and ``temperature``, and returns a configured :class:`slideflow.simclr.SimCLR_Args`. + +.. code-block:: python + + from slideflow import simclr + + args = simclr.get_args( + temperature=0.1, + learning_rate=0.3, + train_epochs=100, + image_size=299 + ) + +Next, assemble a training and (optionally) a validation dataset. The validation dataset is used to assess contrastive loss during training, but is not required. + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset + P = sf.load_project('path') + dataset = P.dataset(tile_px=299, tile_um=302) + + # Split dataset into training/validation + train_dts, val_dts = dataset.split( + val_fraction=0.3, + model_type='classification', + labels='subtype') + +Finally, SimCLR can be trained with :meth:`slideflow.Project.train_simclr`. You can train with a single dataset: + +.. code-block:: python + + P.train_simclr(args, dataset) + +You can train with an optional validation dataset: + +.. code-block:: python + + P.train_simclr( + args, + train_dataset=train_dts, + val_dataset=val_dts + ) + +And you can also optionally provide labels for training the supervised head. To train a supervised head, you'll also need to set the SimCLR argument ``lineareval_while_pretraining=True``. + +.. code-block:: python + + # SimCLR args + args = simclr.get_args( + ..., + lineareval_while_pretraining=True + ) + + # Train with validation & supervised head + P.train_simclr( + args, + train_dataset=train_dts, + val_dataset=val_dts, + outcomes='subtype' + ) + +The SimCLR model checkpoints and final saved model will be saved in the ``simclr/`` folder within the project root directory. + +.. _dinov2: + +Training DINOv2 +*************** + +A lightly modified version of `DINOv2 `__ with Slideflow integration is available on `GitHub `_. This version facilitates training DINOv2 with Slideflow datasets and adds stain augmentation to the training pipeline. + +To train DINOv2, first install the package: + +.. code-block:: bash + + pip install git+https://github.com/jamesdolezal/dinov2.git + +Next, configure the training parameters and datsets by providing a configuration YAML file. This configuration file should contain a ``slideflow`` section, which specifies the Slideflow project and dataset to use for training. An example YAML file is shown below: + +.. code-block:: yaml + + train: + dataset_path: slideflow + batch_size_per_gpu: 32 + slideflow: + project: "/mnt/data/projects/TCGA_THCA_BRAF" + dataset: + tile_px: 299 + tile_um: 302 + filters: + brs_class: + - "Braf-like" + - "Ras-like" + seed: 42 + outcome_labels: "brs_class" + normalizer: "reinhard_mask" + interleave_kwargs: null + +See the `DINOv2 README `_ for more details on the configuration file format. + +Finally, train DINOv2 using the same command-line interface as the original DINOv2 implementation. For example, to train DINOv2 on 4 GPUs on a single node: + +.. code-block:: bash + + torchrun --nproc_per_node=4 -m "dinov2.train.train" \ + --config-file /path/to/config.yaml \ + --output-dir /path/to/output_dir + +The teacher weights will be saved in ``outdir/eval/.../teacher_checkpoint.pth``, and the final configuration YAML will be saved in ``outdir/config.yaml``. + +Generating features +******************* + +Generating features from a trained SSL is straightforward - use the same :meth:`slideflow.Project.generate_features` and :class:`slideflow.DatasetFeatures` interfaces as :ref:`previously described `, providing a path to a saved SimCLR model or checkpoint. + +.. code-block:: python + + import slideflow as sf + + # Create the SimCLR feature extractor + simclr = sf.build_feature_extractor( + 'simclr', + ckpt='/path/to/simclr.ckpt' + ) + + # Calculate SimCLR features for a dataset + features = P.generate_features(simclr, ...) + +For DINOv2 models, use ``'dinov2'`` as the first argument, and pass the model configuration YAML file to ``cfg`` and the teacher checkpoint weights to ``weights``. + +.. code-block:: python + + dinov2 = build_feature_extractor( + 'dinov2', + weights='/path/to/teacher_checkpoint.pth', + cfg='/path/to/config.yaml' + ) \ No newline at end of file diff --git a/docs-source/source/stats.rst b/docs-source/source/stats.rst index 33459dcfb..fbc9dfd28 100644 --- a/docs-source/source/stats.rst +++ b/docs-source/source/stats.rst @@ -3,131 +3,20 @@ slideflow.stats =============== -In addition to containing functions used during model training and evaluation, this module provides -the :class:`slideflow.SlideMap` class designed to assist with visualizing tiles and slides -in two-dimensional space. +This module contains internal utility functions for generating and evaluating model predictions and metrics. -Once a model has been trained, tile-level predictions and intermediate layer activations can be calculated -across an entire dataset with :class:`slideflow.DatasetFeatures`. -The :class:`slideflow.SlideMap` class can then perform dimensionality reduction on these dataset-wide -activations, plotting tiles and slides in two-dimensional space. Visualizing the distribution and clustering -of tile-level and slide-level layer activations can help reveal underlying structures in the dataset and shared -visual features among classes. +.. autofunction:: df_from_pred -The primary method of use is first generating an :class:`slideflow.DatasetFeatures` from a trained -model, then creating an instance of a :class:`slideflow.SlideMap` by using the ``from_features`` class -method: +.. autofunction:: eval_dataset -.. code-block:: python +.. autofunction:: group_reduce - df = sf.DatasetFeatures(model='/path/', ...) - slide_map = sf.SlideMap.from_features(df) - -Alternatively, if you would like to map slides from a dataset in two-dimensional space using pre-calculated *x* and *y* -coordinates, you can use the ``from_precalculated`` class method. In addition to X and Y, this method requires supplying -tile-level metadata in the form of a list of dicts. Each dict must contain the name of the origin slide and the tile -index in the slide TFRecord. - -.. code-block:: python - - dataset = project.dataset(tile_px=299, tile_um=302) - slides = dataset.slides() - x = np.array(...) - y = np.array(...) - meta = [{'slide': ..., 'index': ...} for i in range(len(x))] - slide_map = sf.SlideMap.from_precalculated(slides, x, y, meta) - -.. automodule: slideflow.stats - :imported-members: - -SlideMap --------- - -.. autoclass:: slideflow.SlideMap - :inherited-members: - -basic_metrics ----------------------- -.. autofunction:: basic_metrics - -calculate_centroid ------------------- -.. autofunction:: calculate_centroid - -concordance_index ----------------------- -.. autofunction:: concordance_index - -filtered_prediction -------------------- -.. autofunction:: filtered_prediction - -generate_combined_roc ----------------------- -.. autofunction:: generate_combined_roc - -generate_roc ----------------------- -.. autofunction:: generate_roc - -generate_scatter ----------------------- -.. autofunction:: generate_scatter - -gen_umap --------- -.. autofunction:: gen_umap - -get_centroid_index ------------------- -.. autofunction:: get_centroid_index - -metrics_from_dataset ---------------------- .. autofunction:: metrics_from_dataset -metrics_from_pred ------------------ -.. autofunction:: metrics_from_pred - -normalize_layout ----------------- -.. autofunction:: normalize_layout - -read_predictions ----------------- -.. autofunction:: read_predictions - -predict_from_tensorflow ------------------------- -.. autofunction:: predict_from_tensorflow - -predict_from_torch ----------------------- -.. autofunction:: predict_from_torch - -save_histogram ----------------------- -.. autofunction:: save_histogram - -pred_to_df -------------------------- -.. autofunction:: pred_to_df - -to_onehot ----------------------- -.. autofunction:: to_onehot - - - - - - - - - - - +.. autofunction:: name_columns +.. autofunction:: predict_dataset +.. autofunction:: calculate_centroid +.. autofunction:: get_centroid_index \ No newline at end of file diff --git a/docs-source/source/studio.rst b/docs-source/source/studio.rst new file mode 100644 index 000000000..1507a25ef --- /dev/null +++ b/docs-source/source/studio.rst @@ -0,0 +1,350 @@ +.. _studio: + +Slideflow Studio: Live Visualization +==================================== + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/studio_preview.webm + :autoplay: + +| + +Slideflow Studio provides powerful tools for interactive visualization of whole-slide images, model predictions, and GAN-generated images. It's also fast - with an OpenGL renderer and highly optimized whole-slide image viewer, you'll get a smooth experience that can even run on a Raspberry Pi. + +If you have installed slideflow via PIP, you can run Studio from a terminal with: + +.. code-block:: bash + + slideflow-studio + +If you are running from source, you can start Studio using the following script in the GitHub repository: + +.. code-block:: bash + + python slideflow-studio.py + +If you encounter any issues with the initialization scripts, you can also start Studio by executing the submodule: + +.. code-block:: bash + + python -m slideflow.studio + +If you are using a Docker image, additional arguments are required to launch Studio. Start your docker container using the arguments ``-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix``. For example: + +.. code-block:: bash + + docker run -it --rm \ + -e DISPLAY=$DISPLAY \ + -v /tmp/.X11-unix:/tmp/.X11-unix \ + slideflow/slideflow:latest-tf + +A path to a whole-slide image can optionally be provided as the first argument. Use the ``--help`` flag to see a list of available arguments. + +You can also launch Studio by using the ``.view()`` function of :class:`slideflow.WSI`, :class:`slideflow.Heatmap`, and :class:`slideflow.Mosaic` functions. + +.. code-block:: python + + import slideflow + + wsi = sf.WSI('/path/to/slide.svs', tile_px=299, tile_um=302) + wsi.view() + + +Layout & design +*************** + +.. image:: studio_section_labels.jpg + +| + +The Slideflow Studio window has three primary areas: the main view, a tile preview, and the control panel. Fullscreen mode can be toggled with View -> Fullscreen or by pressing Alt+Enter. + +Main view +----------- +The main view is an interactive display for whole-slide images. Zoom in on a slide using the mouse wheel, and navigate around the slide by clicking and dragging. When a model is loaded, right clicking on the main view sets the prediction location, drawing a red box outlining the location that a tile was extracted and displaying the prediction underneath. + +Tile preview +------------ +When a model is loaded, right clicking on the main view will establish the location for a focal tile prediction. A tile will be extracted from this location of the whole-slide image at the pixel & micron size appropriate for the loaded model. The tile preview window shows the extracted image tile taken from this location. If the loaded model uses stain normalization, a post-normalization image is also shown on the right. The tile preview window can be hidden by clicking the X in the top right corner, or toggled via the menu item View -> Show -> Tile Preview. + +Control panel +------------- +The control panel shows relevant active widgets which contain information and controls for whole-slide images, loaded models, heatmaps, and loaded GANs. :ref:`Enabling an extension ` will add an additional icon and associated functionality. + +Projects +******** + + +A Slideflow :ref:`Project ` can be loaded to make it easier to find and load both slides and models. Load a project with either File -> Open Project, or click and drag a project folder onto the main view. Click the Project icon to view project information and browse both slides and models. + +.. video:: https://github.com/user-attachments/assets/e55339a9-69ce-4fa6-a3de-66a4a5244704 + :autoplay: + +| + +All slides associated with the project will be listed under the "Slides" subheader. Clicking a slide name will open the slide. Similarly, all trained models associated with the project are listed under the "Models" subheader and can be loaded by clicking a model name. Both Tensorflow and PyTorch models can be loaded, regardless of the active backend. + +.. _studio_wsi: + +Whole-slide images +****************** + +.. image:: studio_slide.jpg + +| + +Whole-slide images can be loaded directly with File -> Open Slide. You can also load a slide by dragging and dropping a file onto the main view or by using the Project interface. Use the mouse wheel to zoom, and click-and-drag to move. Slides can be closed with File -> Close Slide. + +The Slide section of the control panel shows slide properties, including dimensions, highest scanned magnification, slide scanner vendor, and how many annotated regions-of-interest (ROIs) are loaded for the slide. ROIs are loaded automatically if a Project is loaded and ROIs are available for the slide. + +A thumbnail of the loaded slide is shown in the upper right corner of the main view, and can be hidden with View -> Show -> Thumbnail. A magnification scale is shown in the bottom-left corner of the main view, and can be hidden with View -> Show -> Scale. + +.. _studio_roi: + +ROI Annotations +--------------- + +.. image:: studio_rois.jpg + +| + +Regions-of-Interest (ROIs) can be used to guide tile extraction. If a Slideflow project has been loaded (File -> Open Project), ROIs will be automatically loaded. You can use Studio to add, label, or remove ROIs with the annotation tool, under the subheader "ROIs". + +Click the plus (Add) icon to draw new ROIs with a lasso tool; right click and drag to create a new ROI. The pencil (Edit) icon allows you to edit any existing ROIs; right click an ROI while editing to delete the ROI or change its label. Once finished, ROIs can be exported in CSV format by clicking the floppy disk icon (Save). You can manually load an existing ROI file by clicking the folder icon (Load). + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/roi_label.mp4 + :autoplay: + +| + +Labels can be optionally supplied for each ROI. Labels can be set after creating an ROI and changed by right clicking an ROI while editing. Hover over an existing ROI to see its name and label. Labels are exported when saving ROIs. + +Slideflow 3.0 added a new polygon tool for drawing ROIs. Click the polygon icon to draw a polygon ROI. Right click to add points, and press Enter to close the polygon. The polygon tool can be used to draw complex shapes, and can be used in conjunction with the lasso tool. + +.. video:: https://github.com/user-attachments/assets/edf7c377-af40-4f8e-a4cb-f84024988e91 + :autoplay: + +When in Edit mode, click on an ROI to select it. Holding down the Control key will show the ROI vertices, which can then be selected and moved. Hold Shift and drag the mouse to select multiple vertices. Vertices can be moved by dragging them and deleted by pressing the Delete key. Click outside the ROI or press Esc to deselect. + +Slideflow can also import ROIs generated from external applications such as QuPath and ImageScope; see :ref:`regions_of_interest` for more information. + +Tile filtering +-------------- + +.. image:: tile_filter.jpg + +| + +A tile filtering strategy can be applied by checking "Tile filter" in the "Slide Processing" subsection. Click the ellipsis button to change grayspace fraction/threshold and whitespace fraction/threshold, to see how tuning these parameters alters tile-level filtering. If enabled, tile filtering will be performed when generating predictions from the slide. Once enabled, the tile filter can be previewed by checking the box "Show tile-level filter" in the "Display" subsection. + +Slide filtering +--------------- + +.. image:: slide_filter.jpg + +| + +Similarly, slide filtering can be enabled by checking "Slide filter". Available slide filtering / QC options include blur filtering, Otsu's thresholding, or both. If "Tile filter" and "Slide filter" are both selected, tiles will be filtered with both. The QC mask can be previewed by checking the box "QC Mask" in the "Display" subsection. + +.. _studio_segmentation: + +Tissue segmentation +------------------- + +.. video:: https://github.com/user-attachments/assets/6f0da6be-da47-443e-b08e-1bab978fb345 + :autoplay: + +| + +New in version 3.0, trained :ref:`segmentation models ` can be both trained and deployed directly within Studio using the new Segmentation widget. + +The Segmentation widget can be accessed by clicking the "Segmentation" icon in the left-hand toolbar. The widget allows you to load a segmentation model and apply it to the loaded slide, generating labeled ROIs. Trained models can also be loaded by dragging and dropping a model folder onto the main view. + +The Segmentation widget also contains a section for training models. In order to train models, a project must be loaded (File -> Open Project). The "Data Source" dropdown is used to select which slides in the project will be used for training. The "Data Processing" section is used to customize the model, including the tile size, magnification, stride, and margin. The "filter" option - which can be either "roi" or "otsu" - determines which tiles are used for training (either all tiles or only those within ROIs). The "Arch & Params" section is used to select the model architecture, hyperparameters, segmentation model type (binary, multiclass, or multilabel), and ROI classes that will be included in training. The "Train" button will begin training the model. Once training is complete, the "Export" button can be used to save the trained model to disk. "Generate ROIs" can then be used to apply the trained model to any loaded slide. + +Preview slide normalization +--------------------------- + +Stain normalization strategies can be quickly previewed by checking "Normalize", which will apply the associated normalization strategy to the main view. If a model is loaded, the model's normalizer will be used by default. The normalizer can be changed with the corresponding dropdown menu, allowing you to preview any normalization method. All normalizer methods shown except for the model normalizer will use the "v3" fit (see :py:mod:`slideflow.norm` for more information). Regardless of what is being previewed, the appropriate model normalizer will be used when generating predictions from the slide. + +Preview tile extraction +----------------------- + +.. image:: https://github-production-user-asset-6210df.s3.amazonaws.com/48372806/257349240-a4911b16-9b5a-4289-9d46-41c95f31acda.png + +| + +The "Display" subsection of the slide widget allows users to preview tile extraction, displaying outlines around tiles. Model predictions generated from the slide will only utilize the shown tiles. + +Models & predictions +******************** + +Slideflow models can be loaded with File -> Open Model, by clicking and dragging a model onto the main view, or by clicking "Load a Model" button of the model widget. Both Tensorflow and PyTorch models are supported. Multiple-instance learning (MIL) models require the MIL extension, :ref:`as discussed below `. Models can be closed with File -> Close Model. + +A summary of the loaded model is shown on the left side of the model widget, containing information about the model outcomes, tile size, image format (PNG/JPG), backend (Tensorflow/PyTorch), and the version of Slideflow used to train the model. Click the "HP" button to show a list of all hyperparameters used during model training. + +A model will be enabled by default once loaded, but can be disabled by clicking the gear icon in the Model section of the control panel, and then clicking "Close model". Similarly, to disable uncertainty quantification (UQ) for models trained with UQ, open the same gear menu and deselect "Enable UQ". + +Tile predictions +---------------- + +.. image:: studio_tile_preds.jpg + +| + +Once a model is loaded, right-click anywhere on the main view to set the tile extraction location for the tile preview. A tile will be extracted at this location matching the pixel and micron size of the loaded model. The extracted tile will be shown before and after stain normalization (if applicable) in the tile preview window. Right click and drag to slide the preview window. The model prediction at this location will be shown underneath the red box in the main view, and in histogram format in the control panel, along with the class label for classification models. + +Saliency +-------- + +.. image:: studio_saliency.jpg + +| + +Saliency maps for the given model and image tile can be previewed in real-time by selecting the checkbox under the "Saliency" subheader. The saliency map will replace the extracted image tile in the tile preview window. Alternatively, saliency can be viewed as an overlay on top of the extracted image tile by checking the box "Overlay". The dropdown menu below in this section can be used to change the saliency method. + + +Slide predictions +----------------- + +.. image:: studio_slide_preds.jpg + +| + +Click the "Predict Slide" button to generate a prediction for the whole-slide image. By default, this will show predictions across the slide as a heatmap in the main display, and the final prediction for the slide will be shown under the "Slide Prediction" subheader of the control panel. Histograms of predictions for each model outcome, as well as uncertainty (if applicable), will be shown in this same section of the control panel. Click the + and - buttons in this section to cycle through histograms for each outcome category. + + +.. _studio_mil: + +Multiple-Instance Learning +************************** + +Slideflow Studio includes support for multiple-instance learning (MIL) models with the MIL extension. In addition to generating predictions from MIL models, Studio can also be used to visualize associated attention heatmaps. Please see :ref:`mil` for more information. + +Start opening the MIL widget in the sidebar. Models are loaded by either clicking the "Load MIL model" button, selecting "File -> Load MIL Model...", or by dragging-and-dropping an MIL model folder onto the window. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/mil_attention.mp4 + :autoplay: + +| + +Information about the feature extractor and MIL model will be shown in the left-hand toolbar. MIL model architecture and hyperparameters can be viewed by clicking the "HP" button. Click "Predict Slide" to generate a whole-slide prediction. If applicable, attention will be displayed as a heatmap. The heatmap color and display can be customized in the Heatmap widget. + +Right-clicking for a focal prediction when an MIL model is loaded will display the tile-level attention along with the tile prediction. Tile-level attention can be displayed as a scaled colorbar, as shown in the video above, by specifying an attention range and thresholds in the MIL ``mil_params.json`` file. + +.. code-block:: python + + { + ... + "thresholds": { + "attention": { + "low": 0.3, + "high": 0.5, + "range": [0, 1] + } + }, + ... + } + + +Heatmaps +******** + +.. image:: studio_heatmap.jpg + +| + +The heatmap section of the control panel can be used to generate and customize whole-slide heatmaps. Heatmaps are generated using the settings configured in the Slide section of the control panel (including stride, tile filter, and slide filter). Click "Generate" in the heatmap widget to create the heatmap. The color scheme can be changed with the dropdown menu of the "Display" subheader, as can the alpha and gain. You can switch which outcome is being displayed as a heatmap by cycling through the available predictions. If the model was trained with uncertainty quantification (UQ), click the radio button next to UQ to show uncertainty as a heatmap. Press the left ALT key while hovering over the heatmap to show the raw heatmap values. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/heatmap.mp4 + :autoplay: + +| + +By default, heatmaps are calculated with multiprocessing pools, which may increase memory utilization. To decrease memory utilization at the cost of slower heatmap calculation, switch to low memory mode in the Settings section (described below), or by using the launch flag ``--low_memory``. + +Heatmaps can be saved in PNG format with File -> Export -> Heatmap (PNG). Heatmaps can also be exported in numpy format (NPZ) with File -> Export -> Heatmap (NPZ). The heatmap of predictions will be saved in the exported NPZ file under the key ``'logit'``, with the shape ``(y_dim, x_dim, num_classes)``. If the model was trained with uncertainty, the uncertainty heatmap will be saved under the key ``'uncertainty'``. + +Performance & Capture +********************* + +.. image:: studio_performance.jpg + +| + +Performance can be monitored in the Performance section of the control panel (lightning icon). This section shows frametimes for GUI display, image rendering, normalization, and model prediction. + +Export contents of the main view to a PNG file with File -> Export -> Main view. Similarly, the extracted image tile shown in the tile preview window can be exported with File -> Export -> Tile view. A screenshot of the entire window interface can be saved with File -> Export -> GUI view. + +Settings +******** + +Studio can be customized in the Settings section, which provides the ability to set a FPS limit (defaults to 60), enable vertical sync (enabled by default), and customize the theme. This section also includes an option to enter "Low lemory mode". In low memory mode, heatmaps are calculated with threadpools rather than multiprocessing pools, decreasing memory utilization at the cost of slower heatmap generation. + +.. _extensions: + +Extensions +********** + +.. image:: studio_extensions.jpg + +| + +Slideflow Studio includes an Extensions section for expanding functionality and adding additional features. Extensions may require additional software dependencies or have different licenses. The Extensions section can be accessed by clicking the puzzle icon in bottom-left section of the control panel. + +Four official extensions are included and described below, adding support for cell segmentation with Cellpose, generative adversarial networks (StyleGAN), mosaic maps, and multiple-instance learning. Development is underway to add support for community extensions that can be shared and downloaded. Please reach out to us `on GitHub `_ if you are interested in building and deploying an extension based on your research. + +Cell segmentation +----------------- + +The Cell Segmentation extension adds support for interactive cell segmentation with Cellpose. Please see :ref:`cellseg` for more information. + +StyleGAN +-------- + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/stylegan.webm + :autoplay: + +| + +The StyleGAN extension adds support for visualizing trained StyleGAN2 or StyleGAN3 networks. Once enabled, GAN ``*.pkl`` files can be loaded with File -> Load GAN, or with drag-and-drop. Generated images are shown in the tile preview window. Model predictions on GAN images operate similarly to predictions on whole-slide images. Predictions on GAN images are generated in real-time, and you can watch the predictions change in the control panel. + +By default, Studio will generate predictions on the full GAN image (after resizing to match the model's ``tile_px`` value). If a ``training_options.json`` file is found in the same directory as the GAN .pkl, the tile size used to train the GAN will be read from this file (slideflow_kwargs/tile_px and ../tile_um). If the GAN was trained on images with a different ``tile_um`` value, the GAN image will be cropped to match the model's ``tile_um`` before resizing. The cropped/resized (and stain normalized) image will be shown to the right of the raw GAN image in the tile preview window. + +The StyleGAN widget can be used to travel the GAN latent space, similar to the implementation in the official `NVIDIA StyleGAN3 repository `_. Set a specific seed in the input field next to "Seed", or click and drag the "Drag" button. If the model was trained with class conditioning, manually set the class with the "Class" field (the default value of -1 selects a random class). Press left or right on your keyboard to quickly move through seeds. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/gan_seeds.mp4 + :autoplay: + +| + +The style mixing section can be used to mix styles between seeds, styles between classes, or both. You can control the degree of mixing with the mixing slider. You can finetune which GAN layers are used during the mixing by clicking the ellipsis button and selection which layers should be traversed during style mixing. + +Save the current seed by clicking the "Save" button; all saved seeds will be listed in the "Saved Seeds" subsection. Click any seed to load it. Once any seed has been saved, options will appear to export a list of saved seeds in CSV format. Previously exported seeds can be loaded by clicking "Load Seeds". + +StyleGAN requires the ``slideflow-noncommercial`` package: + +.. code-block:: bash + + pip install slideflow-noncommercial + +Mosaic maps +----------- + +The Mosaic Maps extension, which is enabled by default, adds support for interactively viewing mosaic maps. You can use the :meth:`slideflow.Mosaic.view` function to launch Studio and load the mosaic. + +.. code-block:: python + + import slideflow as sf + + mosaic = sf.Mosaic(...) + mosaic.view() + +Alternatively, a mosaic map can be saved to disk with :meth:`slideflow.Mosaic.export`, and then loaded into Studio with File -> Load Mosaic. + +.. image:: studio_mosaic.jpg + +| + +Once loaded,the mosaic map can be navigated using the same controls as WSI navigation - click and drag to pan, and use the mouse wheel to zoom. The UMAP used to generate the mosaic map will be shown in a window in the bottom-right corner, with a red box indicating the section of the UMAP currently in view. If a Project is loaded, hovering over an image tile will reveal a popup containing a larger corresponding section from the associated whole-slide image. This popup also contains the name of the slide and tile location coordinates. + +Use the control panel to increase or decrease the mosaic grid size, or to change the background color. diff --git a/docs-source/source/studio_extensions.jpg b/docs-source/source/studio_extensions.jpg new file mode 100644 index 000000000..be2a8ba34 Binary files /dev/null and b/docs-source/source/studio_extensions.jpg differ diff --git a/docs-source/source/studio_heatmap.jpg b/docs-source/source/studio_heatmap.jpg new file mode 100644 index 000000000..2f5feb8e7 Binary files /dev/null and b/docs-source/source/studio_heatmap.jpg differ diff --git a/docs-source/source/studio_module.rst b/docs-source/source/studio_module.rst new file mode 100644 index 000000000..e2daaa635 --- /dev/null +++ b/docs-source/source/studio_module.rst @@ -0,0 +1,9 @@ +.. currentmodule:: slideflow.studio + +slideflow.studio +================ + +This module contains the Slideflow Studio visualization tool. See :ref:`studio` for more information. + +.. automodule:: slideflow.studio + :members: diff --git a/docs-source/source/studio_mosaic.jpg b/docs-source/source/studio_mosaic.jpg new file mode 100644 index 000000000..416c78ed1 Binary files /dev/null and b/docs-source/source/studio_mosaic.jpg differ diff --git a/docs-source/source/studio_performance.jpg b/docs-source/source/studio_performance.jpg new file mode 100644 index 000000000..fcd4bfc10 Binary files /dev/null and b/docs-source/source/studio_performance.jpg differ diff --git a/docs-source/source/studio_preview.webm b/docs-source/source/studio_preview.webm new file mode 100644 index 000000000..aca049394 --- /dev/null +++ b/docs-source/source/studio_preview.webm @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09f488b4a91f6d37810d79099ba1116349ddaf420ac1e933b623fd6f7f865ee0 +size 3427049 diff --git a/docs-source/source/studio_projects.jpg b/docs-source/source/studio_projects.jpg new file mode 100644 index 000000000..96d727572 Binary files /dev/null and b/docs-source/source/studio_projects.jpg differ diff --git a/docs-source/source/studio_rois.jpg b/docs-source/source/studio_rois.jpg new file mode 100644 index 000000000..8e03aa9df Binary files /dev/null and b/docs-source/source/studio_rois.jpg differ diff --git a/docs-source/source/studio_saliency.jpg b/docs-source/source/studio_saliency.jpg new file mode 100644 index 000000000..ce82e8ef3 Binary files /dev/null and b/docs-source/source/studio_saliency.jpg differ diff --git a/docs-source/source/studio_section_labels.jpg b/docs-source/source/studio_section_labels.jpg new file mode 100644 index 000000000..0ebbdae69 Binary files /dev/null and b/docs-source/source/studio_section_labels.jpg differ diff --git a/docs-source/source/studio_slide.jpg b/docs-source/source/studio_slide.jpg new file mode 100644 index 000000000..b123e6ce5 Binary files /dev/null and b/docs-source/source/studio_slide.jpg differ diff --git a/docs-source/source/studio_slide_preds.jpg b/docs-source/source/studio_slide_preds.jpg new file mode 100644 index 000000000..a7ffb0dc1 Binary files /dev/null and b/docs-source/source/studio_slide_preds.jpg differ diff --git a/docs-source/source/studio_tile_preds.jpg b/docs-source/source/studio_tile_preds.jpg new file mode 100644 index 000000000..acbc22d20 Binary files /dev/null and b/docs-source/source/studio_tile_preds.jpg differ diff --git a/docs-source/source/stylegan.png b/docs-source/source/stylegan.png new file mode 100644 index 000000000..42ad36da4 --- /dev/null +++ b/docs-source/source/stylegan.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9904c0800d554fed98c24d276b9fabf24434b9fab1c034251581dd456601e6b +size 951809 diff --git a/docs-source/source/stylegan.rst b/docs-source/source/stylegan.rst new file mode 100644 index 000000000..8fe9cb2ef --- /dev/null +++ b/docs-source/source/stylegan.rst @@ -0,0 +1,179 @@ +.. currentmodule:: slideflow.gan + +.. _stylegan: + +Generative Networks (GANs) +========================== + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/stylegan.webm + :autoplay: + +| + +Slideflow includes tools to easily interface with the PyTorch implementations of `StyleGAN2 `_ and `StyleGAN3 `_, allowing you to train these Generative Adversarial Networks (GANs). Slideflow additionally includes tools to assist with image generation, interpolation between class labels, and interactively visualize GAN-generated images and their predictions. See our manuscript on the use of GANs to `generate synthetic histology `_ for an example of how these networks might be used. + + +.. note:: + + StyleGAN requires PyTorch <0.13 and Slideflow-NonCommercial, which can be installed with: + + .. code-block:: bash + + pip install slideflow-noncommercial + + +Training StyleGAN +***************** + +The easiest way to train StyleGAN2/StyleGAN3 is with :meth:`slideflow.Project.gan_train`. Both standard and class-conditional GANs are +supported. To train a GAN, pass a :class:`slideflow.Dataset`, experiment label, +and StyleGAN keyword arguments to this function: + +.. code-block:: python + + import slideflow as sf + + P = sf.Project('/project/path') + dataset = P.dataset(tile_px=512, tile_um=400) + + P.gan_train( + dataset=dataset, + model='stylegan3', + cfg='stylegan3-r', + exp_label="ExperimentLabel", + gpus=4, + batch=32, + ... + ) + +The trained networks will be saved in the ``gan/`` subfolder in the project directory. + +StyleGAN2/3 can only be trained on images with sizes that are powers of 2. You can crop and/or resize images from a Dataset to match this requirement by using the ``crop`` and/or ``resize`` arguments: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, ...) + + # Train a GAN on images resized to 256x256 + P.gan_train( + ..., + resize=256, + ) + +See the :meth:`slideflow.Project.gan_train` documentation for additional +keyword arguments to customize training. + +Class conditioning +------------------ + +GANs can also be trained with class conditioning. To train a class-conditional GAN, simply provide a list of categorical +outcome labels to the ``outcomes`` argument of :meth:`slideflow.Project.gan_train`. For example, to train a GAN with class conditioning on ER status: + +.. code-block:: python + + P.gan_train( + ..., + outcomes='er_status' + ) + +Tile-level labels +----------------- + +In addition to class conditioning with slide-level labels, StyleGAN2/StyleGAN3 can be trained with tile-level class conditioning. Tile-level labels can be generated through ROI annotations, as described in :ref:`tile_labels`. + +Prepare a pandas dataframe, indexed with the format ``{slide}-{x}-{y}``, where ``slide`` is the name of the slide (without extension), ``x`` is the corresponding tile x-coordinate, and ``y`` is the tile y-coordinate. The dataframe should have a single column, ``label``, containing onehot-encoded category labels. For example: + +.. code-block:: python + + import pandas as pd + + df = pd.DataFrame( + index=[ + 'slide1-251-425', + 'slide1-560-241', + 'slide1-321-502', + ... + ], + data={ + 'label': [ + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + ... + ] + } + ) + +This dataframe can be generated, as described in :ref:`tile_labels`, through the :meth:`slideflow.Dataset.get_tile_dataframe` function. For GAN conditioning, the ``label`` column should be onehot-encoded. + +Once the dataframe is complete, save it in parquet format: + +.. code-block:: python + + df.to_parquet('tile_labels.parquet') + +And supply this file to the ``tile_labels`` argument of :meth:`slideflow.Project.gan_train`: + +.. code-block:: python + + P.gan_train( + ..., + tile_labels='tile_labels.parquet' + ) + +Generating images +***************** + +Images can be generated from a trained GAN and exported either as loose images +in PNG or JPG format, or alternatively stored in TFRecords. Images are generated from a list +of seeds (list of int). Use the :meth:`slideflow.Project.gan_generate` function +to generate images, with ``out`` set to a directory path if exporting loose images, +or ``out`` set to a filename ending in ``.tfrecords`` if saving images in +TFRecord format: + +.. code-block:: python + + network_pkl = '/path/to/trained/gan.pkl' + P.gan_generate( + network_pkl, + out='target.tfrecords', + seeds=range(100), + ... + ) + +The image format is set with the ``format`` argument: + +.. code-block:: python + + P.gan_generate( + ..., + format='jpg', + ) + +Class index (for class-conditional GANs) is set with ``class_idx``: + +.. code-block:: python + + P.gan_generate( + ..., + class_idx=1, + ) + +Finally, images can be resized after generation to match a target tile size: + +.. code-block:: python + + P.gan_generate( + ..., + gan_px=512, + gan_um=400, + target_px=299, + target_um=302, + ) + +Interactive visualization +------------------------- + +Slideflow Studio can be used to interactively visualize GAN-generated images (see :ref:`studio`). Images can be directly exported from this interface. This tool also enables you to visualize real-time predictions for GAN generated images when as inputs to a trained classifier. + +For more examples of using Slideflow to work with GAN-generated images, see `our GitHub repository `_ for code accompanying the previously referenced manuscript. \ No newline at end of file diff --git a/docs-source/source/stylegan.webm b/docs-source/source/stylegan.webm new file mode 100644 index 000000000..3520892a8 --- /dev/null +++ b/docs-source/source/stylegan.webm @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef12101adcbafe99343be081752234f795d8963cc64ee34bbd8205cbb50ae80a +size 8878938 diff --git a/docs-source/source/tfrecords.rst b/docs-source/source/tfrecords.rst new file mode 100644 index 000000000..5f16f02e6 --- /dev/null +++ b/docs-source/source/tfrecords.rst @@ -0,0 +1,296 @@ +.. _tfrecords: + +TFRecords: Reading and Writing +============================== + +TFRecords are binary files designed for storing large amounts of data. In Slideflow, TFRecords are used to store compressed image tiles extracted from whole-slide images. TFRecords are used instead of loose image files (such as ``*.jpg`` or ``*.png``) because they are compact, more easily distributed, and significantly improve data reading efficiency during model training. TFRecords were originally designed for Tensorflow, but they can also be used with PyTorch. + +The following sections describe the TFRecord data format and provide examples of how to create, read, and manipulate TFRecords using Slideflow. + +TFRecord Format +*************** + +TFRecords are binary files that contain a sequence of records, where each record represents an individual image tile. Each record contains a serialized `protocol buffer `_ with a list of named features. Each feature can be a list of bytes, floats, or integers. TFRecords are expected to have the following features: + +- **"image_raw"**: Bytes containing the image data (either JPG or PNG). +- **"slide"**: Bytes containing the slide name (in UTF-8 format). +- **"loc_x"**: Integer containing the x-coordinate of the tile (optional). +- **"loc_y"**: Integer containing the y-coordinate of the tile (optional). + +Slideflow expects each TFRecord to contain images from only a single slide, with the TFRecord name matching the slide name. The ``loc_x`` and ``loc_y`` features are optional, but are required for some operations (such as generating TFRecord heatmaps). + +.. note:: + + When reading TFRecords with Tensorflow, records are internally decoded using ``tf.train.Example``. When Tensorflow is not being used (such as when using the PyTorch backend), tfrecords are decoded using ``sf.util.example_pb2.Example``, providing an alternative decoder that does not require Tensorflow. Tensorflow's ``tf.train.Example`` and Slideflow's ``sf.util.example_pb2.Example`` are identical, except that ``sf.util.example_pb2.Example`` does not require Tensorflow and supports ``protobuf`` version 4. + + +TFRecord Indices +**************** + +Slideflow uses TFRecord index files to keep track of the internal structure of each TFRecord, improving efficiency of data reading. These index files are automatically built and stored in the same directory as the TFRecords upon first use. A TFRecord index is an ``*.npz`` file with the same name as the TFRecord, but with the ``*.index.npz`` extension. A TFRecord index contains the following fields: + +- **"arr_0"**: An array of shape ``(n_tiles, 2)`` containing the starting bytes and length of each record. +- **"locations"**: An array of shape ``(n_tiles, 2)`` containing the x- and y-coordinates of each tile. + +Index files for an entire dataset can be rebuilt using :meth:`slideflow.Dataset.rebuild_index()`. You can manually create an index file for a single TFRecord using :func:`sf.util.tfrecord2idx.create_index()`. + +Creating TFRecords +****************** + +From a Dataset +-------------- + +The typical way to create TFRecords is to use the :meth:`slideflow.Dataset.extract_tiles` function, as described in :ref:`filtering`. TFRecords will be exported to the destination configured in the :class:`slideflow.Dataset` object (see: :ref:`datasets_and_validation`). + +From a slide +------------ + +A TFRecord file for a single slide can be manually created using :meth:`slideflow.WSI.extract_tiles()` function. The first argument of this function is the TFRecord destination folder. + +From a directory of images +-------------------------- + +A directory of loose image files can be assembled into a TFRecord using :func:`slideflow.io.write_tfrecords_single()`: + +.. code-block:: python + + sf.io.write_tfrecords_single( + '/path/to/images', + '/path/to/destination', + filename='filename', + slide='slide', + ) + +A nested directory of loose image tiles, organized into subdirectory by slide name, can be simultaneously assembled into multiple TFRecords (one for each slide) using :func:`slideflow.io.write_tfrecords_multi()`. Slide names are determined from the subdirectory names: + +.. code-block:: python + + sf.io.write_tfrecords_multi( + '/path/to/nested_images', + '/path/to/destination' + ) + +Inspecting TFRecords +******************** + +Individual TFRecords +-------------------- + +The quickest way to inspect a TFRecord is to use :class:`slideflow.TFRecord`: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfr = sf.TFRecord('/path/to/tfrecord') + +An index file will be automatically created if one is not found. To disable automatic index creation, set ``create_index=False``. + +The TFRecord object has several useful attributes: + + >>> tfr.fields + ['image_raw', 'slide', 'loc_x', 'loc_y'] + >>> tfr.img_format + 'jpeg' + >>> tfr.length + 1000 + >>> tfr.locations + [(768, 256), (768, 512), ...] + +The ``fields`` attribute is a list of the fields in the TFRecord. + +The ``img_format`` attribute is the image format of the TFRecord (either ``"jpeg"`` or ``"png"``). + +The ``length`` attribute is the number of tiles in the TFRecord. + +The ``locations`` attribute is a list of the x- and y- center coordinates of each tile, if available, otherwise None. + +Inspecting Datasets +------------------- + +The :class:`slideflow.Dataset` object provides several methods for inspecting the TFRecords in a dataset generated through :meth:`slideflow.Dataset.extract_tiles`. + +The :meth:`slideflow.Dataset.summary()` method provides a summary of the dataset, including the location TFRecords are stored and the number of total number of tiles across all TFRecords in the dataset. + +.. code-block:: python + + # Prepare a dataset of image tiles. + dataset = project.dataset( + tile_px=299, # Tile size, in pixels. + tile_um='10x' # Tile size, in microns or magnification. + ) + dataset.summary() + + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 284114 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +The :meth:`slideflow.Dataset.tfrecords()` method returns a list of paths to tfrecords. + +.. code-block:: python + + >>> tfrecords = dataset.tfrecords() + >>> len(tfrecords) + 941 + >>> tfrecords[0] + '/path/to/tfrecords1' + +The ``slideflow.Dataset.num_tiles`` attribute returns the total number of tiles across all TFRecords in the dataset. + +.. code-block:: python + + >>> dataset.num_tiles + 284114 + +Finally, the :meth:`slideflow.Dataset.manifest()` method returns a dictionary mapping TFRecord paths to the number tiles in each TFRecord. Each value returned by the dictionary is a nested dictionary with two keys: ``"total"``, which is the total number of tiles in the TFRecords, and ``"clipped"``, which is the number of tiles that will be taken from the TFRecord as a result of :ref:`clipping/undersampling `. + +.. code-block:: python + + >>> dataset.manifest() + {'/path/to/tfrecords1': {'total': 1000, 'clipped': 512}, + '/path/to/tfrecords2': {'total': 2000, 'clipped': 512}, + ...} + +Reading TFRecords +***************** + +Slideflow provides several tools for reading and parsing TFRecords. These tools are intended for debugging and development, and are not recommended for model training. Higher-level dataloaders, which supervise sampling, shuffling, sharding, batching, labeling, and augmenting, are discussed in :ref:`dataloaders`. + +Reading a single image tile +--------------------------- + +To get a single parsed record according to its index, use :meth:`slideflow.TFRecord.__getitem__()`, which returns a dictionary of the parsed record: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfr = sf.TFRecord('/path/to/tfrecord') + >>> tfr[0] + {'image_raw': b'...', 'slide': 'SLIDE_NAME', 'loc_x': 0, 'loc_y': 0} + +The ``'image_raw'`` field contains raw image bytes, in either JPG or PNG format. + +To get a single parsed record according to its location, use :meth:`slideflow.TFRecord.get_record_by_xy()`, which returns the slide name and image bytes: + +.. code-block:: python + + >>> tfr.get_record_by_xy(768, 256) + ('SLIDE_NAME', b'...') + +Image bytes can be decoded into Tensors (according to the active backend) using :func:`slideflow.io.decode_image()`: + +.. code-block:: python + + >>> import slideflow as sf + >>> slide, image = tfr.get_record_by_xy(768, 256) + >>> print(type(image)) + + >>> sf.io.decode_image(image) + >> import slideflow as sf + >>> tfr = '/path/to/tfrecords' + >>> sf.io.tfrecord2idx.create_index(tfr) + >>> index = sf.io.tfrecord2idx.load_index(tfr) + +Then, use :func:`slideflow.tfrecord_loader()` to create a generator that yields parsed records from the TFRecord: + +.. code-block:: python + + >>> loader = sf.tfrecord.tfrecord_loader(tfr, index) + >>> record = next(iter(loader)) + {'image_raw': , 'slide': , 'loc_x': [0], 'loc_y': [0]} + +Both ``"image_raw"`` and ``"slide"`` fields are returned as bytes in numpy arrays. The ``"loc_x"`` and ``"loc_y"`` fields are returned as integers. The image and slide name can be decoded using :func:`slideflow.io.decode_image()` and ``.decode('utf-8')``, respectively: + +.. code-block:: python + + >>> image = sf.io.decode_image(bytes(record['image_raw'])) + >>> slide = bytes(record['slide']).decode('utf-8') + +This iterator can be used to read all images from a TFRecord in sequence: + +.. code-block:: python + + >>> for record in loader: + ... image = sf.io.decode_image(bytes(record['image_raw'])) + ... slide = bytes(record['slide']).decode('utf-8') + +The iterator can be split into separate shards (data partitions) with the ``shard`` argument, a tuple of ``(shard_id, n_shards)``. This is useful for parallelizing data reading across multiple processes, threads, or compute nodes: + +.. code-block:: python + + >>> loader = sf.tfrecord.tfrecord_loader(tfr, index, shard=(0, 2)) + +Data sharding ensures that each shard reads a unique subset of the data, and that each record is read exactly once. + +An index file is recommended for improving efficiency of data reading, and required if using data sharding. + +Interleaving multiple TFRecords +------------------------------- + +You can also interleave multiple TFRecords using :func:`slideflow.multi_tfrecord_loader()`. This function takes a list of TFRecord paths and a list of corresponding TFRecord indices, and returns a generator that randomly samples from TFRecords and parses the records: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfrs = ['/path/to/tfrecord1', '/path/to/tfrecord2'] + >>> indices = [sf.io.tfrecord2idx.load_index(tfr) for tfr in tfrs] + >>> loader = sf.tfrecord.multi_tfrecord_loader(tfrs, indices) + >>> record = next(iter(loader)) + {'image_raw': , 'slide': , 'loc_x': [0], 'loc_y': [0]} + +By default, records are sampled from TFRecords with equal probability (i.e. uniform sampling). You can also specify a list of weights to sample from TFRecords with different probabilities (i.e. weighted sampling) via the ``weights`` argument. The weights should be a list of floats, one for each TFRecord, that sum to 1.0: + +.. code-block:: python + + >>> loader = sf.tfrecord.multi_tfrecord_loader(tfrs, indices, weights=[0.5, 0.5]) + +Records will be sampled infinitely by default. To disable infinite sampling, set ``infinite=False``. + +TFRecord sharding is also supported for ``multi_tfrecord_loader()`` via the ``shard`` argument. + diff --git a/docs-source/source/tile_extraction.png b/docs-source/source/tile_extraction.png index 100db0687..e6e452032 100644 Binary files a/docs-source/source/tile_extraction.png and b/docs-source/source/tile_extraction.png differ diff --git a/docs-source/source/tile_extraction_overview.png b/docs-source/source/tile_extraction_overview.png new file mode 100644 index 000000000..2aeee0e9c --- /dev/null +++ b/docs-source/source/tile_extraction_overview.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ec007daa2078147ed77be82802172cfece47e6b7d9a8391adfa731aa4fa8565 +size 157161 diff --git a/docs-source/source/tile_filter.jpg b/docs-source/source/tile_filter.jpg new file mode 100644 index 000000000..1f084fdb0 Binary files /dev/null and b/docs-source/source/tile_filter.jpg differ diff --git a/docs-source/source/tile_labels.rst b/docs-source/source/tile_labels.rst new file mode 100644 index 000000000..d3aff16c0 --- /dev/null +++ b/docs-source/source/tile_labels.rst @@ -0,0 +1,129 @@ +.. _tile_labels: + +Strong Supervision with Tile Labels +==================================== + +Pathology deep learning models are commonly trained with weak supervision, where the labels for individual image tiles are inherited from the parent slide. The end goal for such models is to predict the label for the entire slide, rather than individual tiles. + +However, it is also possible to train models with strong supervision, where the labels for individual +image tiles are determined through :ref:`Region of Interest (ROI) ` labels. This note describes the process by which such labels are generated, and how they can be used to train a model. Training models with strong supervision requires PyTorch and is not supported in TensorFlow. + +Labeling ROIs +************* + +The first step is to create regions of interest (ROIs). The fastest way to create labeled ROIs is with :ref:`Slideflow Studio `, which includes integrated tools for quickly assigning labels to both new and existing ROIs. However, it is also possible to create ROIs with other tools, such as QuPath or ImageScope (as described :ref:`here `), and modify the generated ROI CSV file to add labels. + +ROI CSV files are formatted with three required columns: "roi_name", "x_base", and "y_base". Each row is a single point in an ROI, with the "x_base" and "y_base" columns specifying the X/Y coordinates in the slide's lowest (base) dimension. Individual ROIs are grouped by the "roi_name" column, with each ROI having a unique name. An optional fourth column, "label", can be used to assign a label to each ROI. For example: + +.. code-block:: csv + + roi_name,x_base,y_base,label + 1,100,100,tumor + 1,104,165,tumor + 1,532,133,tumor + 1,101,101,tumor + 2,200,200,stroma + 2,200,235,stroma + 2,222,267,stroma + 2,202,201,stroma + +When ROIs are saved in Slideflow Studio, they are exported in this file format and saved in either the current working directory or, if a project is loaded, in the configured project directory . + +Building tile labels +******************** + +Once ROIs have been generated, labeled, and saved in CSV format, the next step is to build a dataframe of tile labels. If not already done, start by :ref:`configuring a project ` and ensuring that ROIs are in the correct directory. You can verify that the ROIs are in the right place by confirming that :meth:`slideflow.Dataset.rois` returns the number of slides with ROIs: + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/path/to/project') + >>> dataset = P.dataset(tile_px=256, tile_um=256) + >>> len(dataset.rois()) + 941 + +Next, build a dataframe of tile labels with :meth:`slideflow.Dataset.get_tile_dataframe`. This will return a dataframe with tile coordinates (X/Y of tile center, in base dimension), slide grid index, and associated ROI name/label if the tile is in an ROI. For example: + +.. code-block:: python + + >>> df = dataset.get_tile_dataframe() + >>> df.head() + loc_x loc_y grid_x grid_y roi_name roi_desc label slide + slide1-608-608 608 608 0 0 ROI_0 None tumor slide1 + slide1-608-864 608 864 0 1 ROI_0 None tumor slide1 + slide1-608-1120 608 1120 0 2 ROI_0 None tumor slide1 + ... + +The index for this dataframe is the tile ID, a unique identifier built from a combination of the slide name and tile coordinates. + +When training with supervised labels, we'll want to exclude tiles that are either not in an ROI or are in an unlabeled ROI. This can be done by filtering the dataframe to only include rows where the "label" column is not None: + +.. code-block:: python + + >>> df = df.loc[df.label.notnull()] + +Finally, we'll only need the "label" column and tile ID for training, so all other columns can be dropped. This step is optional but may reduce memory usage. + +.. code-block:: python + + >>> df = df[['label']] + >>> df.head() + label + slide1-608-608 tumor + slide1-608-864 tumor + slide1-608-1120 tumor + ... + +This dataframe can now be used to train a model with strong supervision. + +Training a model +**************** + +Training a model with strong supervision requires using a :class:`slideflow.model.Trainer`, as described in :ref:`tutorial2`. The only difference when training with strong supervision is that the trainer should be initialized with the tile dataframe for the labels: + +.. code-block:: python + + >>> trainer = sf.model.build_trainer(..., labels=df) + >>> trainer.train(...) + +Once training has finished, the saved model can be used interchangeably with models trained with weak supervision for evaluation, inference, feature generation, etc. + +Complete example +**************** + +Below is a complete example of training a model with strong supervision. This example assumes that a project has already been configured, tiles have been extracted, and ROIs have been generated and labeled. + +.. code-block:: python + + import slideflow as sf + + # Load project and dataset + P = sf.load_project('/path/to/project') + dataset = P.dataset(tile_px=256, tile_um=256) + + # Build tile label dataframe, and filter + # to only include tiles in an ROI. + df = dataset.get_tile_dataframe() + df = df.loc[df.label.notnull()] + + # Subsample our dataset to only include slides with ROI labels. + dataset = dataset.filter({'slide': list(df.slide.unique())}) + + # Split the dataset into training and validation. + train, val = dataset.split(val_fraction=0.3) + + # Build model hyperparameters + hp = sf.ModelParams( + tile_px=256, + tile_um=256, + model='xception', + batch_size=32 + ) + + # Train model + trainer = sf.model.build_trainer( + hp=hp, + outdir='/path/to/outdir', + labels=df + ) + trainer.train(train, val) diff --git a/docs-source/source/training.rst b/docs-source/source/training.rst index 25854b7ad..6e954d400 100644 --- a/docs-source/source/training.rst +++ b/docs-source/source/training.rst @@ -1,10 +1,23 @@ +.. _training: + Training ======== +Slideflow offers tools for training many types of neural networks, including: + +- **Weakly supervised, tile-based models**: Models trained on image tiles, with labels inherited from the parent slide. +- **Weakly supervised, multi-instance learning**: Models trained on feature vectors, with labels inherited from the parent slide. +- **Strongly supervised models**: Models trained on image tiles, with labels assigned by ROI. +- **Self-supervised pretraining**: Contrastive pretraining with or without labels (e.g. `SimCLR `_). +- **Generative adversarial networks**: Models trained to generate synthetic images (e.g. `StyleGAN2/3 `_). +- **Segmentation models**: Models trained to identify and classify tissue regions (e.g. `U-Net `_). + +In this section, we will walk through the process of training a weakly supervised tile-based model. :ref:`Strong supervision `, :ref:`Multi-instance learning (MIL) `, :ref:`self-supervised pretraining (SSL) `, :ref:`generative adversarial networks (GAN) `, and :ref:`segmentation` are described in other sections. + Prepare hyperparameters *********************** -The first step of model training is configuring a set of model parameters / training hyperparameters. There are two methods for configuring model parameters. If you intend to train a model using a single combination of hyperparameters, use the ``ModelParams`` class: +The first step of training a weakly-supervised model is configuring model parameters and hyperparameters with :class:`slideflow.ModelParams`. ``ModelParams`` determines the model architecture, loss, preprocessing augmentations, and training hyperparameters. .. code-block:: python @@ -18,116 +31,352 @@ The first step of model training is configuring a set of model parameters / trai ... ) -Alternatively, if you intend to perform a sweep across multiple hyperparameter combinations, use the ``Project.create_hp_sweep()`` function to automatically save a sweep to a JSON file. For example, the following would set up a batch_train file with two combinations; the first with a learning rate of 0.01, and the second with a learning rate of 0.001: +See the :class:`slideflow.ModelParams` API documentation for a list of available hyperparameters. + +.. note:: + + If you are using a continuous variable as an outcome measure, be sure to use a regression loss function. Regression loss functions can be viewed in ``slideflow.ModelParams.RegressionLossDict``, and all available loss functions are in ``slideflow.ModelParams.AllLossDict``. + +Training a model +**************** + +Slideflow provides two methods for training models: with the high-level :meth:`slideflow.Project.train` function or with the lower-level :class:`slideflow.model.Trainer`. The former provides an easier interface for executing complex training tasks with a single function call, while the latter provides lower-level access for greater customizability. + +.. _training_with_project: + +Training with a Project +----------------------- + +:meth:`slideflow.Project.train` provides an easy API for executing complex training plans and organizing results in the project directory. This is the recommended way to train models in Slideflow. There are two required arguments for this function: + +- ``outcomes``: Name (or list of names) of annotation header columns, from which to determine slide labels. +- ``params``: Model parameters. + +The default validation plan is three-fold cross-validation, but the validation strategy can be customized via keyword arguments (``val_strategy``, ``val_k_fold``, etc) as described in the API documentation. If crossfold validation is used, each model in the crossfold will be trained sequentially. Read more about :ref:`validation strategies `. + +By default, all slides in the project will be used for training. You can restrict your training/validation data to only a subset of slides in the project with one of two methods: either by providing ``filters`` or a filtered :class:`slideflow.Dataset`. + +For example, you can use the ``filters`` argument to train/validate only using slides labeled as "train_and_val" in the "dataset" column with the following syntax: + +.. code-block:: python + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + filters={"dataset": ["train_and_val"]} + ) + +Alternatively, you can restrict the training/validation dataset by providing a :class:`slideflow.Dataset` to the ``dataset`` argument: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + dataset = dataset.filter({"dataset": ["train_and_val"]}) + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + dataset=dataset + ) + +In both cases, slides will be further split into training and validation sets using the specified validation settings (defaulting to three-fold cross-validation). + +For more granular control over the validation dataset used, you can supply a :class:`slideflow.Dataset` to the ``val_dataset`` argument. Doing so will cause the rest of the validation keyword arguments to be ignored. + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + train_dataset = dataset.filter({"dataset": ["train"]}) + val_dataset = dataset.filter({"dataset": ["val"]}) + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + dataset=train_dataset + val_dataset=val_dataset + ) + +Performance metrics - including accuracy, loss, etc. - are returned as a dictionary and saved in ``results_log.csv`` in both the project directory and model directory. Additional data, including ROCs and scatter plots, are saved in the model directories. Pandas DataFrames containing tile-, slide-, and patient-level predictions are also saved in the model directory. + +At each designated epoch, models are saved in their own folders. Each model directory will include a copy of its hyperparameters in a ``params.json`` file, and a copy of its training/validation slide manifest in ``slide.log``. + +.. _training_with_trainer: + +Using a Trainer +--------------- + +You can also train models outside the context of a project by using :class:`slideflow.model.Trainer`. This lower-level interface provides greater flexibility for customization and allows models to be trained without requiring a Project to be set up. It lacks several convenience features afforded by using :meth:`slideflow.Project.train`, however, such as cross-validation, logging, and label preparation for easy multi-outcome support. + +For this training approach, start by building a trainer with :func:`slideflow.model.build_trainer`, which requires: + +- ``hp``: :class:`slideflow.ModelParams` object. +- ``outdir``: Directory in which to save models and checkpoints. +- ``labels``: Dictionary mapping slide names to outcome labels. + +:class:`slideflow.Dataset` provides a ``.labels()`` function that can generate this required labels dictionary. + +.. code-block:: python + + # Prepare dataset and labels + dataset = P.dataset(tile_px=299, tile_um=302) + labels, unique_labels = dataset.labels('tumor_type') + + # Split into training/validation + train_dataset = dataset.filter({"dataset": ["train"]}) + val_dataset = dataset.filter({"dataset": ["val"]}) + + # Determine model parameters + hp = sf.ModelParams( + tile_px=299, + tile_um=302, + batch_size=32, + ... + ) + + # Prepare a Trainer + trainer = sf.model.build_trainer( + hp=hp, + outdir='path', + labels=labels + ) + +Use :meth:`slideflow.model.Trainer.train` to train a model using your specified training and validation datasets. + +.. code-block:: python + + # Train a model + trainer.train(train_dataset, val_dataset) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + { + "epochs": { + "epoch3": { + "train_metrics": { + "loss": 0.497 + "accuracy": 0.806 + "val_loss": 0.719 + "val_accuracy": 0.778 + }, + "val_metrics": { + "loss": 0.727 + "accuracy": 0.770 + }, + "tile": { + "Outcome 0": [ + 0.580 + 0.580 + ] + }, + "slide": { + "Outcome 0": [ + 0.658 + 0.658 + ] + }, + "patient": { + "Outcome 0": [ + 0.657 + 0.657 + ] + } + } + } + } + +Read more about the ``Trainer`` class and available keyword arguments in the :class:`API documentation `. + +Multiple outcomes +***************** + +Slideflow supports both classification and regression, as well as training to single or multiple outcomes at once. To train with multiple outcomes simultaneously, simply pass multiple annotation headers to the ``outcomes`` argument of :meth:`slideflow.Project.train`. + +Time-to-event / survival outcomes +********************************* + +Models can also be trained to a time series outcome using Cox Proportional Hazards (CPH) and negative log likelihood loss. For time-to-event / survival models, use ``'negative_log_likelihood'`` loss and set ``outcomes`` equal to the annotation column indicating event *time*. Specify the event *type* (0 or 1) by passing the event type annotation column to the argument ``input_header``. If you are using multiple clinical inputs, the first header passed to ``input_header`` must be event type. Survival models are not compatible with multiple outcomes. + +.. note:: + Survival models are currently only available with the Tensorflow backend. PyTorch support for survival outcomes is in development. + +Multimodal models +***************** + +In addition to training using image data, clinical data can also be provided as model input by passing annotation column headers to the variable ``input_header``. This input is concatenated at the post-convolutional layer, prior to any configured hidden layers. + +If desired, models can also be trained with clinical input data alone, without images, by using the hyperparameter argument ``drop_images=True``. + +.. _hyperparameter_optimization: + +Hyperparameter optimization +*************************** + +Slideflow includes several tools for assisting with hyperparameter optimization, as described in the next sections. + +Testing multiple combinations +----------------------------- + +You can easily test a series of hyperparameter combinations by passing a list of ``ModelParams`` object to the ``params`` argument of :meth:`slideflow.Project.train`. + +.. code-block:: python + + hp1 = sf.ModelParams(..., batch_size=32) + hp2 = sf.ModelParams(..., batch_size=64) + + P.create_hp_sweep( + ..., + params=[hp1, hp2] + ) + +Grid-search sweep +----------------- + +You can also prepare a grid-search sweep, testing every permutation across a series of hyperparameter ranges. Use :meth:`slideflow.Project.create_hp_sweep`, which will calculate and save the sweep configuration to a JSON file. For example, the following would configure a sweep with only two combinations; the first with a learning rate of 0.01, and the second with a learning rate of 0.001: .. code-block:: python P.create_hp_sweep( - epochs=[5], - toplayer_epochs=0, + filename='sweep.json', model=['xception'], - pooling=['avg'], loss='sparse_categorical_crossentropy', learning_rate=[0.001, 0.0001], batch_size=64, - hidden_layers=[1], - optimizer='Adam', - augment='xyrj' ) -Available hyperparameters include: - -- **augment** - Image augmentations to perform, including flipping/rotating and random JPEG compression. Please see :class:`slideflow.model.ModelParams` for more details. -- **batch_size** - Batch size for training. -- **dropout** - Adds dropout layers after each fully-connected layer. -- **early_stop** - Stop training early if validation loss/accuracy is not decreasing. -- **early_stop_patience** - Number of epochs to wait before allowing early stopping. -- **early_stop_method** - mMtric to use for early stopping. Includes 'loss', 'accuracy', or 'manual'. -- **epochs** - Number of epochs to spend training the full model. -- **include_top** - Include the default, preconfigured, fully connected top layers of the specified model. -- **hidden_layers** - Number of fully-connected final hidden layers before softmax prediction. -- **hidden_layer_width** - Width of hidden layers. -- **l1** - Adds L1 regularization to all convolutional layers with this weight. -- **l1_dense** - Adds L1 regularization to all fully-conected Dense layers with this weight. -- **l2** - Adds L2 regularization to all convolutional layers with this weight. -- **l2_dense** - Adds L2 regularization to all fully-conected Dense layers with this weight. -- **learning_rate** - Learning rate for training. -- **learning_rate_decay** - lLarning rate decay during training. -- **learning_rate_decay_steps** - Number of steps after which to decay learning rate -- **loss** - loss function; please see `Keras loss documentation `_ for all options. -- **manual_early_stop_epoch** - Manually trigger early stopping at this epoch/batch. -- **manual_early_stop_batch** - Manually trigger early stopping at this epoch/batch. -- **model** - Model architecture; please see `Keras application documentation `_ for all options. -- **normalizer** - Normalization method to use on images. -- **normalizer_source** - Optional path to normalization image to use as the source. -- **optimizer** - Training optimizer; please see `Keras opt documentation `_ for all options. -- **pooling** - Pooling strategy to use before final fully-connected layers; either 'max', 'avg', or 'none'. -- **tile_px** - Size of extracted tiles in pixels. -- **tile_um** - Size of extracted tiles in microns. -- **toplayer_epochs** - Number of epochs to spend training just the final layer, with all convolutional layers "locked" (sometimes used for transfer learning). -- **trainable_layers** - Number of layers available for training, other layers will be frozen. If 0, all layers are trained. -- **training_balance** - Training input balancing strategy; please see :ref:`balancing` for more details. -- **uq** - Enable uncertainty quantification (UQ) during inference. Requires dropout to be non-zero. -- **validation_balance** - Validation input balancing strategy; please see :ref:`balancing` for more details. - -If you are using a continuous variable as an outcome measure, be sure to use a linear loss function. Linear loss functions can be viewed in ``slideflow.model.ModelParams.LinearLossDict``, and all available loss functions are in ``slideflow.model.ModelParams.AllLossDict``. - -Begin training -************** - -Once your hyperparameter settings have been chosen you may begin training using the ``train`` function. Documentation of the function is given below: - -.. autofunction:: slideflow.Project.train - :noindex: - -If you used the ``ModelParams`` class to configure a single combination of parameters, pass this object via the ``params`` argument. If you configured a hyperparameter sweep, set this argument to the name of your hyperparameter sweep file (saved by default to 'sweep.json'). - -Your outcome variable(s) are specified with the ``outcomes`` argument. You may filter slides for training using the ``filter`` argument, as previously described. - -For example, to train using only slides labeled as "train" in the "dataset" column, with the outcome variable defined by the column "category", use the following syntax: - -.. code-block:: python - - P.train( - outcomes="category", - filters={"dataset": ["train"]}, - params='sweep.json' +The sweep is then executed by passing the JSON path to the ``params`` argument of :meth:`slideflow.Project.train()`: + +.. code-block:: python + + P.train(params='sweep.json', ...) + +.. _bayesian_optimization: + +Bayesian optimization +--------------------- + +You can also perform Bayesian hyperparameter optimization using `SMAC3 `_, which uses a `configuration space `_ to determine the types and ranges of hyperparameters to search. + +Slideflow provides several functions to assist with building these configuration spaces. :func:`slideflow.util.create_search_space` allows you to define a range to search for each hyperparameter via keyword arguments: + +.. code-block:: python + + import slideflow as sf + + config_space = sf.util.create_search_space( + normalizer=['macenko', 'reinhard', 'none'], + dropout=(0.1, 0.5), + learning_rate=(1e-4, 1e-5) ) -If you would like to use a different validation plan than the default, pass the relevant keyword arguments to the training function. +:func:`slideflow.util.broad_search_space` and :func:`slideflow.util.shallow_search_space` provide preconfigured search spaces that will search a broad and narrow range of hyperparameters, respectively. You can also customize a preconfigured search space using keyword arguments. For example, to do a broad search but disable L1 searching: -Once training has finished, performance metrics - including accuracy, loss, etc. - can be found in the ``results_log.csv`` file in the project directory. Additional data, including ROCs and scatter plots, are saved in the model directories. +.. code-block:: python -At each designated epoch, models are saved in their own folders. Each model directory will include a copy of its hyperparameters in a ``params.json`` file, and a copy of its training/validation slide manifest in ``slide.log``. + import slideflow as sf -Multiple outcomes -***************** + config_space = sf.util.broad_search_space(l1=None) + +See the linked API documentation for each function for more details about the respective search spaces. -Slideflow supports both categorical and continuous outcomes, as well as training to single or multiple outcomes at once. To use multiple outcomes simultaneously, simply pass multiple annotation headers to the ``outcomes`` argument. +Once the search space is determined, you can perform the hyperparameter optimization by simply replacing :meth:`slideflow.Project.train` with :meth:`slideflow.Project.smac_search`, providing the configuration space to the argument ``smac_configspace``. By default, SMAC3 will optimize the tile-level AUROC, but the optimization metric can be customized with the keyword argument ``smac_metric``. -Multiple input variables -************************ +.. code-block:: python -In addition to training using image data, clinical data can also be provided as model input by passing annotation column headers to the variable ''input_header''. This input is merged at the post-convolutional layer, prior to any configured hidden layers. + # Base hyperparameters + hp = sf.ModelParams(tile_px=299, ...) -If desired, models can also be trained with clinical input data alone, without images, by using the hyperparameter argument ``drop_images=True``. + # Configuration space to optimize + config_space = sf.util.shallow_search_space() -Cox Proportional Hazards (CPH) models -************************************* + # Run the Bayesian optimization + best_config, history = P.smac_search( + outcomes='tumor_type', + params=hp, + smac_configspace=cs, + smac_metric='tile_auc', + ... + ) + print(history) -Models can also be trained to a time series outcome using CPH and negative log likelihood loss. For CPH models, use `'negative_log_likelihood'` loss and set ``outcomes`` equal to the annotation column indicating event *time*. Specify the event *type* (0 or 1) by passing the event type annotation column to the argument ``input_header``. If you are using multiple clinical inputs, the first header passed to ``input_header`` must be event type. CPH models are not compatible with multiple outcomes. +.. rst-class:: sphx-glr-script-out -.. note:: - CPH models are currently unavailable with the PyTorch backend. PyTorch support for CPH outcomes is in development. + .. code-block:: none + + dropout l1 l2 metric + 0 0.126269 0.306857 0.183902 0.271778 + 1 0.315987 0.014661 0.413443 0.283289 + 2 0.123149 0.311893 0.184439 0.250339 + 3 0.250000 0.250000 0.250000 0.247641 + 4 0.208070 0.018481 0.121243 0.257633 + +:meth:`slideflow.Project.smac_search` returns the best configuration and a history of models trained during the search. This history is a Pandas DataFrame with hyperparameters for columns, and a "metric" column with the optimization metric result for each trained model. The run history is also saved in CSV format in the associated model folder. -Distributed training across GPUs -******************************** +See the API documentation for available customization via keyword arguments. -If multiple GPUs are available, training can be distributed by passing the argument ``multi_gpu=True``. If provided, slideflow will use all available (and visible) GPUs for training. +.. _custom_loss: + +Customizing model or loss +************************* + +Slideflow supports dozens of model architectures, but you can also train with a custom architecture, as demonstrated in :ref:`tutorial3`. + +Similarly, you can also train with a custom loss function by supplying a dictionary to the ``loss`` argument in ``ModelParams``, with the keys ``type`` (which must be either ``'classification'``, ``'regression'``, or ``'survival'``) and ``fn`` (a callable loss function). + +For Tensorflow/Keras, the loss function must accept arguments ``y_true, y_pred``. For regression losses, ``y_true`` may need to be cast to ``tf.float32``. An example custom regression loss is given below: + +.. code-block:: python + + # Custom Tensorflow loss + def custom_regression_loss(y_true, y_pred): + y_true = tf.cast(y_true, tf.float32) + squared_difference = tf.square(y_true - y_pred) + return tf.reduce_mean(squared_difference, axis=-1) + + +For PyTorch, the loss function must return a nested loss function with arguments ``output, target``. An example regression loss is given below: + +.. code-block:: python + + # Custom PyTorch loss + def custom_regression_loss(): + def loss_fn(output, target): + return torch.mean((target - output) ** 2) + return loss_fn + + +In both cases, the loss function is applied as follows: + +.. code-block:: python + + hp = sf.ModelParams(..., loss={'type': 'regression', 'fn': custom_regression_loss}) + + +Using multiple GPUs +******************* + +Slideflow can perform distributed training if multiple GPUs are available. Enable distributed training by passing the argument ``multi_gpu=True``, which will allow Slideflow to use all available (and visible) GPUs. + +.. _from_wsi: + +Training without TFRecords +************************** + +It is also possible to train deep learning models directly from slides, without first generating TFRecords. This may be advantageous for rapidly prototyping models on a large dataset, or when tuning the tile size for a dataset. + +Use the argument ``from_wsi=True`` in either the :meth:`slideflow.Project.train` or :meth:`slideflow.model.Trainer.train` functions. Image tiles will be dynamically extracted from slides during training, and background will be automatically removed via Otsu's thresholding. + +.. note:: + + Using the :ref:`cuCIM backend ` will greatly improve performance when training without TFRecords. Monitoring performance ********************** +Tensorboard +----------- + During training, progress can be monitored using Tensorflow's bundled ``Tensorboard`` package by passing the argument ``use_tensorboard=True``. This functionality was disabled by default due to a recent bug in Tensorflow. To use tensorboard to monitor training, execute: .. code-block:: bash @@ -135,3 +384,14 @@ During training, progress can be monitored using Tensorflow's bundled ``Tensorbo $ tensorboard --logdir=/path/to/model/directory ... and open http://localhost:6006 in your web browser. + +Neptune.ai +---------- + +Experiments can be automatically logged with `Neptune.ai `_. To enable logging, first locate your Neptune API token and workspace ID, and configure the environmental variables ``NEPTUNE_API_TOKEN`` and ``NEPTUNE_WORKSPACE``. + +With the environmental variables set, Neptune logs are enabled by passing ``use_neptune=True`` to ``sf.load_project``. + +.. code-block:: python + + P = sf.load_project('/project/path', use_neptune=True) \ No newline at end of file diff --git a/docs-source/source/troubleshooting.rst b/docs-source/source/troubleshooting.rst index 08dd04de5..26429f222 100644 --- a/docs-source/source/troubleshooting.rst +++ b/docs-source/source/troubleshooting.rst @@ -8,7 +8,15 @@ To check for errors in your environment or installation, you can also use the te Testing ******* -To test all pipeline functions, use the ``test.py`` script, providing a path to a directory containing slides to use for testing: +To troubleshoot environment or installation issues, start by running unit tests, +which do not require any sample slides. Use the ``test.py`` script without any +arguments: + +.. code-block:: bash + + $ python3 test.py + +For a more comprehensive test of all pipeline functions, provide a path to a directory containing sample slides via ``--slides``, setting ``--all=True`` to run all tests: .. code-block:: bash @@ -25,4 +33,21 @@ To view a list of all tests that will be run (and thus can be skipped), pass the Issue Reporting *************** -If the issue is still unclear, please submit an Issue on the `project Github page `_. \ No newline at end of file +If the issue is still unclear, please submit an Issue on the `project Github page `_. Be sure to include the following information: + +* The version of Slideflow you are using, which can be displayed with ``sf.about()``: + +.. code-block:: bash + + $ python3 -c "import slideflow; slideflow.about()" + ╭=======================╮ + │ Slideflow │ + │ Version: 2.1.0 │ + │ Backend: tensorflow │ + │ Slide Backend: cucim │ + │ https://slideflow.dev │ + ╰=======================╯ + +* The active deep learning backend (``sf.backend()``) and slide backend (``sf.slide_backend()``) +* The version of Python you are using (``python3 --version``) +* The operating system you are using (``uname -a``) diff --git a/docs-source/source/tutorial1.rst b/docs-source/source/tutorial1.rst index b01f590d1..25d8f2b5a 100644 --- a/docs-source/source/tutorial1.rst +++ b/docs-source/source/tutorial1.rst @@ -3,15 +3,14 @@ Tutorial 1: Model training (simple) ===================================== -In this first tutorial, we will walk through the steps needed to take an example project from start to finish, using -the bundled ``run_project.py`` script to execute pipeline functions. As with all of these tutorials, we will use +In this first tutorial, we will walk through the steps needed to take an example project from start to finish. As with all of these tutorials, we will use publicly available data from `The Cancer Genome Atlas (TCGA) `_. In this first tutorial, we will train a model to predict ER status from breast cancer slides. Examples will be given assuming project files are in the directory ``/home/er_project`` and slides are in ``/home/brca_slides``, although you will need to customize these paths according to your needs. -Project Planning +Create a Project **************** First, download slides and annotations for the TCGA-BRCA project using the `legacy GDC portal @@ -19,45 +18,22 @@ First, download slides and annotations for the TCGA-BRCA project using the `lega patients. Our outcome of interest is "er_status_by_ihc", of which 1011 have a documented result (either "Positive" or "Negative"), giving us our final patient count of 1011. -To create a new project, use the ``run_project.py`` script: +Create a new project, and pass the path to the downloaded slides to the argument ``slides``. -.. code-block:: bash +.. code-block:: python + + import slideflow as sf + + P = sf.create_project( + root='/home/er_project', + slides='/path/to/slides' + ) + +After the project is created, we can load the project with: + +.. code-block:: python - $ python3 run_project.py -p /home/er_project - -We will then be taken through an interactive prompt asking for project settings. When prompted, use the -following settings (mostly defaults): - -+-------------------------------+-------------------------------------------------------+ -| **name** | Breast_ER | -+-------------------------------+-------------------------------------------------------+ -| **annotations** | ./annotations.csv (default) | -+-------------------------------+-------------------------------------------------------+ -| **dataset_config** | ./datasets.json (default) | -+-------------------------------+-------------------------------------------------------+ -| **sources** | BRCA | -+-------------------------------+-------------------------------------------------------+ -| **models_dir** | ./models (default) | -+-------------------------------+-------------------------------------------------------+ -| **eval_dir** | ./eval | -+-------------------------------+-------------------------------------------------------+ - -After a blank datasets.json file is created, we will be prompted to add a new dataset source. Use the following -configuration for the added dataset source: - -+-------------------------------+-------------------------------------------------------+ -| **source** | BRCA | -+-------------------------------+-------------------------------------------------------+ -| **slides** | /home/brca_slides | -+-------------------------------+-------------------------------------------------------+ -| **roi** | /home/brca_slides | -+-------------------------------+-------------------------------------------------------+ -| **tiles** | /home/er_project/tiles | -+-------------------------------+-------------------------------------------------------+ -| **tfrecords** | /home/er_project/tfrecords | -+-------------------------------+-------------------------------------------------------+ - -For simplicity, we will not be using annotated tumor regions of interest (ROI), instead training on whole-slide images. + P = sf.load_project('/home/er_project') Setting up annotations ********************** @@ -66,9 +42,9 @@ With our project initialized, we can set up our annotations file. Use the downlo CSV file, with a column "patient" indicating patient name (in the case of TCGA, these are in the format TCGA-SS-XXXX, where SS indicates site of origin and XXXX is the patient identifier), and a column "er_status_by_ihc" containing our outcome of interest. Add a third column "slide" containing the name of the slide associated with the -patient. If there are multiple slides per patient, list each slide on a separate row. Finally, add a column "dataset" -to indicate whether the slide should be used for training or evaluation. Set aside somewhere around 10-30% of the -dataset for evaluation. +patient (without the file extension). If there are multiple slides per patient, list each slide on a separate row. +Finally, add a column "dataset" to indicate whether the slide should be used for training or evaluation. Set aside +somewhere around 10-30% of the dataset for evaluation. .. note:: @@ -91,21 +67,18 @@ Your annotations file should look something like: | ... | ... | ... | ... | +-----------------------+--------------------+-----------+-----------------------------------+ +Save this CSV file in your project folder with the name ``annotations.csv``. Tile extraction *************** -The next step is to extract tiles from our slides. Find the sample ``actions.py`` file in the project folder, which we -will modify and use to execute our pipeline functions. Delete the commented-out examples in this file. - -For this example, we will use a 256px x 256px tile size, at 0.5 µm/pixel (128 um). Add the following -to the project ``actions.py`` file: +The next step is to extract tiles from our slides. For this example, we will use a 256px x 256px tile size, +at 0.5 µm/pixel (128 um). .. code-block:: python - def main(P): - # Extract tiles at 256 pixels, 0.5 um/px - P.extract_tiles(tile_px=256, tile_um=128) + # Extract tiles at 256 pixels, 0.5 um/px + P.extract_tiles(tile_px=256, tile_um=128) .. hint:: Tile extraction speed is greatly improved when slides are on an SSD or ramdisk; slides can be automatically @@ -119,22 +92,18 @@ Training ******** After tiles are extracted, the dataset will be ready for training. We will train with a single set of manually defined -hyperparameters, which we can configure with :class:`slideflow.model.ModelParams`. We will use the +hyperparameters, which we can configure with :class:`slideflow.ModelParams`. We will use the `Xception `_ model with a batch size of 32, otherwise keeping defaults. .. code-block:: python - def main(P): - from slideflow.model import ModelParams - ... - - hp = ModelParams( - tile_px=256, - tile_um=128, - model='xception', - batch_size=32, - epochs=[3] - ) + hp = sf.ModelParams( + tile_px=256, + tile_um=128, + model='xception', + batch_size=32, + epochs=[3] + ) For training, we will use 5-fold cross-validation on the training dataset. To set up training, invoke the :meth:`slideflow.Project.train` function with the outcome of interest, our hyperparameters, and our validation plan. @@ -143,17 +112,14 @@ to only include patients with documented ER status (otherwise a blank "" would b .. code-block:: python - def main(P): - ... - - # Train with 5-fold cross-validation - P.train( - 'ER_status', - params=hp, - val_k_fold=5, - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) + # Train with 5-fold cross-validation + P.train( + 'er_status_by_ihc', + params=hp, + val_k_fold=5, + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) After cross validation is complete, we will want to have a model trained across the entire dataset, so we can assess performance on our held-out evaluation set. To train a model across the entire training dataset without validation, @@ -161,59 +127,56 @@ we will set ``val_strategy`` to ``None``: .. code-block:: python - def main(P): - ... - - # Train across the entire training dataset - P.train( - 'ER_status', - params=hp, - val_strategy='none', - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) + # Train across the entire training dataset + P.train( + 'er_status_by_ihc', + params=hp, + val_strategy='none', + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) -Now, it's time to start our pipeline. To review, our ``actions.py`` file at this point should look like: +Now, it's time to start our pipeline. To review, our complete script should look like: .. code-block:: python - def main(P): - from slideflow.model import ModelParams - - # Extract tiles at 256 pixels, 0.5 um/px - P.extract_tiles(tile_px=256, tile_um=128) - - hp = ModelParams( - tile_px=256, - tile_um=128, - model='xception', - batch_size=32, - epochs=[3, 5, 10] - ) - - # Train with 5-fold cross-validation - P.train( - 'ER_status', - params=hp, - val_k_fold=5, - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) - - # Train across the entire training dataset - P.train( - 'ER_status', - params=hp, - val_strategy='none', - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) - -To execute these functions, use the ``run_project.py`` script, passing the project directory with the ``-p`` flag. + import slideflow as sf + + # Create a new project + P = sf.create_project( + root='/home/er_project', + slides='/path/to/slides' + ) + + # Extract tiles at 256 pixels, 0.5 um/px + P.extract_tiles(tile_px=256, tile_um=128) + + hp = ModelParams( + tile_px=256, + tile_um=128, + model='xception', + batch_size=32, + epochs=[3, 5, 10] + ) + + # Train with 5-fold cross-validation + P.train( + 'er_status_by_ihc', + params=hp, + val_k_fold=5, + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) + + # Train across the entire training dataset + P.train( + 'er_status_by_ihc', + params=hp, + val_strategy='none', + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) -.. code-block:: bash - - $ python3 run_project.py -p /home/er_project The final training results should should show an average AUROC of around 0.87, with average AP around 0.83. Tile, slide, and patient-level receiver operator curves are saved in the model folder, along with precision-recall curves (not shown): @@ -239,20 +202,3 @@ Tensorboard-formatted training and validation logs are saved the model directory $ tensorboard --logdir=/project_path/models/00001-outcome-HP0 Tensorboard can then be accessed by navigating to ``https://localhost:6006`` in a browser. - -Monitoring with Neptune -*********************** - -Experiments can be automatically logged with `Neptune.ai `_. To enable logging, first locate your Neptune API token and workspace ID, and configure the environmental variables ``NEPTUNE_API_TOKEN`` and ``NEPTUNE_WORKSPACE``. - -With the environmental variables set, Neptune logs are enabled either by passing a ``-n`` flag to the ``run_project.py`` script: - -.. code-block:: bash - - $ python3 run_project.py -n -p /project_path/ - -or by passing ``use_neptune=True`` to the ``slideflow.Project`` class: - -.. code-block:: python - - P = sf.Project('/project/path', use_neptune=True) \ No newline at end of file diff --git a/docs-source/source/tutorial2.rst b/docs-source/source/tutorial2.rst index 799087b97..17d6b051b 100644 --- a/docs-source/source/tutorial2.rst +++ b/docs-source/source/tutorial2.rst @@ -1,3 +1,5 @@ +.. _tutorial2: + Tutorial 2: Model training (advanced) ======================================= @@ -76,12 +78,12 @@ We can use the dataset to get our ER status labels. The :meth:`slideflow.Dataset We can see the slideflow logs showing us that 234 slides with the outcome label "Negative" were assigned to the numerical outcome "0", and 842 "Positive" slides were assigned "1". -Next, we'll need to split this dataset into a training and validation set. We'll start by training on the first of 3 k-folds for cross-validated training. To split a dataset, use the :meth:`slideflow.Dataset.train_val_split` method. We'll need to provide our labels to ensure that the outcome categories are balanced in the training and validation sets. +Next, we'll need to split this dataset into a training and validation set. We'll start by training on the first of 3 k-folds for cross-validated training. To split a dataset, use the :meth:`slideflow.Dataset.split` method. We'll need to provide our labels to ensure that the outcome categories are balanced in the training and validation sets. .. code-block:: python - >>> train_dts, val_dts = dataset.train_val_split( - ... model_type='categorical', + >>> train_dts, val_dts = dataset.split( + ... model_type='classification', ... labels=labels, ... val_strategy='k-fold', ... val_k_fold=3, @@ -107,12 +109,11 @@ At this point, we can also add categorical balancing to our dataset (see :ref:`b Training ******** -Now that our dataset is prepared, we can begin setting up our model and trainer. Our model training parameters are configured with :class:`slideflow.model.ModelParams`. +Now that our dataset is prepared, we can begin setting up our model and trainer. Our model training parameters are configured with :class:`slideflow.ModelParams`. .. code-block:: python - >>> from slideflow.model import ModelParams, Trainer - >>> hp = ModelParams( + >>> hp = sf.ModelParams( ... tile_px=256, ... tile_um=128, ... model='xception', @@ -124,14 +125,13 @@ In addition to the above model parameters, our trainer will need the outcome lab .. code-block:: python - >>> trainer = Trainer( + >>> trainer = sf.model.build_trainer( ... hp=hp, ... outdir='/some/directory', ... labels=labels, - ... patients=dataset.patients() ... ) -Finally, we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assinging the output to a new variable ``results`` +Now we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assigning the output to a new variable ``results`` .. code-block:: python @@ -176,4 +176,4 @@ You'll see logs recording model structure, training progress across epochs, and } } -Training results are separated with nested dictionaries according to epoch. The raw training metrics and validation metrics are stored with the keys ``"train_metrics"`` and ``"val_metrics"``, and tile-, slide-, and patient-level metrics (AUC for categorical data, R-squared for linear outcomes, and concordance index for CPH models) is reported under the ``"tile"``, ``"slide"``, and ``"patient"`` keys for each outcome, respectively. \ No newline at end of file +Training results are separated with nested dictionaries according to epoch. The raw training metrics and validation metrics are stored with the keys ``"train_metrics"`` and ``"val_metrics"``, and tile-, slide-, and patient-level metrics (AUROC for classification, R-squared for regression outcomes, and concordance index for survival models) is reported under the ``"tile"``, ``"slide"``, and ``"patient"`` keys for each outcome, respectively. \ No newline at end of file diff --git a/docs-source/source/tutorial3.rst b/docs-source/source/tutorial3.rst index 28d2406a1..c6ecda42f 100644 --- a/docs-source/source/tutorial3.rst +++ b/docs-source/source/tutorial3.rst @@ -1,3 +1,5 @@ +.. _tutorial3: + Tutorial 3: Using a custom architecture ======================================= diff --git a/docs-source/source/tutorial4.rst b/docs-source/source/tutorial4.rst index 65bb3912a..316d2e5fa 100644 --- a/docs-source/source/tutorial4.rst +++ b/docs-source/source/tutorial4.rst @@ -84,7 +84,7 @@ If the referenced model was trained with digital stain normalization, this will The ``resolution`` parameter indicates the stride at which tiles should be extracted from slides to generate predictions. ``"low"`` resolution yields predictions on non-overlapping slides (stride_div=1). ``"medium"`` resolutions uses tiles with 50% overlap (stide_div=2), and ``"high"`` resolution uses tiles with 75% overlap (stride_div=4). -Heatmaps are colored and scaled in a manner optimized for categorical outcomes, with the colorscale 0 (blue) -> 0.5 (white) -> 1.0 (red). To change this colorscaling (particularly important for linear outcomes), set ``vmin``, ``vcenter``, and ``vmax`` accordingly. +Heatmaps are colored and scaled in a manner optimized for categorical outcomes, with the colorscale 0 (blue) -> 0.5 (white) -> 1.0 (red). To change this colorscaling (particularly important for regression outcomes), set ``vmin``, ``vcenter``, and ``vmax`` accordingly. Heatmaps are displayed without any color interpolation by default. To generate a smoothed heatmap, interpolate colors with any strategy supported by matplotlib (including, for example, "bicubic", "nearest", "bilnear", and many more) with the argument ``interpolation``. diff --git a/docs-source/source/tutorial5.rst b/docs-source/source/tutorial5.rst index 1e2dd39b4..719338c4f 100644 --- a/docs-source/source/tutorial5.rst +++ b/docs-source/source/tutorial5.rst @@ -115,7 +115,7 @@ Layer activations calculated on very large datasets may result in high memory us max_tiles=100 ) -This function will return an instance of :class:`slideflow.DatasetFeatures`, which contains tile-level predictions (in ``DatasetFeatures.logits``), tile X,Y locations from their respective slides (in ``DatasetFeatures.locations``), layer activations (in ``DatasetFeatures.activations``), and uncertainty (if applicable, in ``DatasetFeatures.uncertainty``). +This function will return an instance of :class:`slideflow.DatasetFeatures`, which contains tile-level predictions (in ``DatasetFeatures.predictions``), tile X,Y locations from their respective slides (in ``DatasetFeatures.locations``), layer activations (in ``DatasetFeatures.activations``), and uncertainty (if applicable, in ``DatasetFeatures.uncertainty``). Create the mosaic map @@ -147,14 +147,14 @@ Save corresponding UMAPs Now that we have the mosaic generated, we need to create corresponding labeled UMAP plots to aid in interpretability. UMAP plots are stored in :class:`slideflow.SlideMap` objects. A mosaic's underlying ``SlideMap`` can be accessed via ``mosaic.slide_map``. -The :class:`slideflow.SlideMap` class provides several functions useful for labeling. To start, we will label the umap according to the raw logits for each tile image. As this is a binary categorical outcome, there will be two logits. We will label the UMAP according to the second logit (id=1), and then save the image to disc. +The :class:`slideflow.SlideMap` class provides several functions useful for labeling. To start, we will label the umap according to the raw predictions for each tile image. As this is a binary categorical outcome, there will be two post-softmax predictions. We will label the UMAP according to the second logit (id=1), and then save the image to disc. .. code-block:: python - # Label by raw logits + # Label by raw predictions umap = mosaic.slide_map - umap.label_by_logits(1) - umap.save('umap_logits.png') + umap.label_by_preds(1) + umap.save('umap_preds.png') .. image:: https://i.imgur.com/FT7nH90.png @@ -162,7 +162,7 @@ Next, we will discretize the predictions, showing the final prediction as a cate .. code-block:: python - # Label by raw logits + # Label by raw preds umap.label_by_meta('prediction') umap.save('umap_predictions.png') @@ -175,7 +175,7 @@ For reference, let's see the ground truth categorical labels. For this, we will # Get slide labels labels, unique = P.dataset().labels('cohort') - # Label by raw logits + # Label with slide labels umap.label_by_slide(labels) umap.save('umap_labels.png') @@ -185,7 +185,7 @@ Finally, if we are a using a model that was trained with uncertainty quantificat .. code-block:: python - # Label by raw logits + # Label by uncertainty umap.label_by_uncertainty() umap.save('umap_uncertainty.png') @@ -195,7 +195,6 @@ In all cases, the UMAP plots can be customized by passing keyword arguments acce .. code-block:: python - # Label by raw logits umap.save( 'umap_uncertainty.png', # Save path title='Uncertainty', # Title for plot diff --git a/docs-source/source/tutorial6.rst b/docs-source/source/tutorial6.rst new file mode 100644 index 000000000..6a077503c --- /dev/null +++ b/docs-source/source/tutorial6.rst @@ -0,0 +1,50 @@ +.. currentmodule:: slideflow.slide + +.. _tutorial6: + +Tutorial 6: Custom slide filtering +================================== + +In this brief tutorial, we'll take a look at how you can implement and preview bespoke slide-level filtering methods. + +The slide-level filtering (QC) methods Slideflow currently supports include Otsu's thresholding and Gaussian blur filtering, which can be applied to a :class:`WSI` object with :meth:`WSI.qc`. If you have a custom filtering algorithm you would like to apply to a slide, you can now use :meth:`WSI.apply_qc_mask()` to apply a boolean mask to filter a slide. + +For the purposes of this tutorial, we will generate a boolean mask using the already-available Otsu's thresholding algorithm, but you can replace this with whatever masking algorithm you like. + +First, we'll load a slide: + +.. code-block:: python + + import numpy as np + import slideflow as sf + + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + +Next, we'll apply Otsu's thresholding to get the boolean mask we'll use in subsequent steps, then remove the QC once we have the mask: + +.. code-block:: python + + wsi.qc('otsu') + qc_mask = np.copy(wsi.qc_mask) + wsi.remove_qc() + +Our mask should have two dimensions (y, x) and have a dtype of bool: + +.. code-block:: bash + + >>> qc_mask.shape + (1010, 2847) + >>> qc_mask.dtype + dtype('bool') + +Our :class:`WSI` object now has no QC applied. We can manually apply this boolean mask with :meth:`WSI.apply_qc_mask()`: + +.. code-block:: python + + wsi.apply_qc_mask(qc_mask) + +And that's it! We can preview how our mask affects tile filtering by using :meth:`WSI.preview()`: + +.. code-block:: python + + wsi.preview().show() diff --git a/docs-source/source/tutorial7.rst b/docs-source/source/tutorial7.rst new file mode 100644 index 000000000..8ac1697f4 --- /dev/null +++ b/docs-source/source/tutorial7.rst @@ -0,0 +1,89 @@ +.. _tutorial7: + +Tutorial 7: Training with custom augmentations +============================================== + +In this tutorial, we'll take a look at how you can use custom image augmentations when training a model with Slideflow. This tutorial builds off of :ref:`tutorial2`, so if you haven't already, you should read that tutorial first. + +Our goal will be to train a model on a sparse outcome, such as ER status (roughly 4:1 positive:negative), with a custom augmentation that will oversample the minority class. This tutorial will use PyTorch, but the same principles apply when using Tensorflow. + +.. code-block:: python + + >>> import os + >>> os.environ['SF_BACKEND'] = 'torch' + +First, we'll start by loading a project and preparing our datasets, just like in :ref:`tutorial2`: + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/home/er_project') + >>> full_dataset = P.dataset( + ... tile_px=256, + ... tile_um=128, + ... filters={ + ... 'er_status_by_ihc': ['Positive', 'Negative'] + ... }) + >>> labels, _ = full_dataset.labels('er_status_by_ihc') + >>> train, val = full_dataset.split( + ... labels='er_status_by_ihc', + ... val_strategy='k-fold', + ... val_k_fold=3, + ... k_fold_iter=1 + ... ) + +If tiles have not yet been extracted from slides, do that now. + +.. code-block:: python + + >>> dataset.extract_tiles(qc='otsu') + +By default, Slideflow will equally sample from all slides / TFRecords during training, resulting in oversampling of slides with fewer tiles. In this case, we want to oversample the minority class (ER negative), so we'll use category-level balancing. Sampling strategies are discussed in detail in the :ref:`Developer Notes `. + +.. code-block:: python + + >>> train = train.balance('er_status_by_ihc', strategy='category') + +Next, we'll set up our model hyperparameters, using the same parameters as in :ref:`tutorial2`. We still want to use Slideflow's default augmentation (random flip/rotation and JPEG compression), so we'll use the hyperparameter ``augment=True``. Our custom augmentation will be applied after the default augmentation. + +.. code-block:: python + + >>> hp = sf.ModelParams( + ... tile_px=256, + ... tile_um=128, + ... model='xception', + ... batch_size=32, + ... epochs=[3], + ... augment=True + ... ) + +Now, we'll define our custom augmentation. Augmentations are functions that take a single Tensor (:class:`tf.Tensor` or :class:`torch.Tensor`) as input and return a single Tensor as output. Our training augmentation will include a random color jitter, random gaussian blur, and random auto-contrast. + +.. code-block:: python + + >>> import torch + >>> from torchvision import transforms + >>> augment = transforms.Compose([ + ... transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5), + ... transforms.RandomAutocontrast(), + ... transforms.GaussianBlur(3) + ... ]) + +Transformations can be applied to training or validation data by passing a dictionary - with the keys 'train' and/or 'val' - to the ``transform`` argument of :class:`slideflow.Trainer`. If a transformation should be applied to both training and validation, it can be passed directly to the ``transform`` argument. In this case, we'll apply our custom augmentation to the training dataset only. + +.. code-block:: python + + >>> trainer = sf.model.build_trainer( + ... hp=hp, + ... outdir='/some/directory', + ... labels=labels, + ... transform={'train': augment}, + ... ) + +Now we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assigning the output to a new variable ``results``. + +.. code-block:: python + + >>> results = trainer.train(train, val) + +And that's it! You've trained a model with a custom augmentation. You can now use the model to make predictions on new data, or use the model to make predictions on the validation dataset. \ No newline at end of file diff --git a/docs-source/source/tutorial8.rst b/docs-source/source/tutorial8.rst new file mode 100644 index 000000000..b5f51d329 --- /dev/null +++ b/docs-source/source/tutorial8.rst @@ -0,0 +1,165 @@ +.. _tutorial8: + +Tutorial 8: Multiple-Instance Learning +====================================== + +In contrast with tutorials 1-4, which focused on training and evaluating traditional tile-based models, this tutorial provides an example of training a multiple-instance learning (MIL) model. MIL models are particularly useful for heterogeneous tumors, when only parts of a whole-slide image may carry a distinctive histological signature. In this tutorial, we'll train a MIL model to predict the ER status of breast cancer patients from whole slide images. Note: MIL models require PyTorch. + +We'll start the same way as :ref:`tutorial1`, loading a project and preparing a dataset. + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/home/er_project') + >>> dataset = P.dataset( + ... tile_px=256, + ... tile_um=128, + ... filters={ + ... 'er_status_by_ihc': ['Positive', 'Negative'] + ... }) + +If tiles have not yet been :ref:`extracted ` for this dataset, do that now. + +.. code-block:: python + + >>> dataset.extract_tiles(qc='otsu') + +Once a dataset has been prepared, the next step in training an MIL model is :ref:`converting images into features `. For this example, we'll use the pretrained `Virchow `_ feature extractor, a vision transformer pretrained on 1.5M whole-slide images. Virchow has an input size of 224x224, so our images will be resized to match. + +.. code-block:: python + + >>> virchow = sf.build_feature_extractor('virchow', center_crop=True) + >>> virchow.cite() + @misc{vorontsov2024virchowmillionslidedigitalpathology, + title={Virchow: A Million-Slide Digital Pathology Foundation Model}, + author={Eugene Vorontsov and Alican Bozkurt and Adam Casson and George Shaikovski and Michal Zelechowski and Siqi Liu and Kristen Severson and Eric Zimmermann and James Hall and Neil Tenenholtz and Nicolo Fusi and Philippe Mathieu and Alexander van Eck and Donghun Lee and Julian Viret and Eric Robert and Yi Kan Wang and Jeremy D. Kunz and Matthew C. H. Lee and Jan Bernhard and Ran A. Godrich and Gerard Oakley and Ewan Millar and Matthew Hanna and Juan Retamero and William A. Moye and Razik Yousfi and Christopher Kanan and David Klimstra and Brandon Rothrock and Thomas J. Fuchs}, + year={2024}, + eprint={2309.07778}, + archivePrefix={arXiv}, + primaryClass={eess.IV}, + url={https://arxiv.org/abs/2309.07778}, + } + >>> virchow.num_features + 2560 + +The Virchow feature extractor produces a 2560-dimensional vector for each tile. We can generate and export :ref:`bags ` of these features for all slides in our dataset using :func:`slideflow.Project.generate_feature_bags`. + +.. code-block:: python + + >>> P.generate_feature_bags( + ... virchow, + ... dataset, + ... outdir='/bags/path' + ... ) + +The output directory, ``/bags/path``, should look like: + +.. code-block:: bash + + /bags/path + ├── slide1.pt + ├── slide1.indez.npz + ├── slide2.pt + ├── slide2.index.npz + ├── ... + └── bags_config.json + +The ``*.pt`` files contain the feature vectors for tiles in each slide, and the ``*.index.npz`` files contain the corresponding X, Y coordinates for each tile. The ``bags_config.json`` file contains the feature extractor configuration. + +The next step is to create an MIL model configuration using :func:`slideflow.mil.mil_config`, specifying the architecture and relevant hyperparameters. For the architecture, we'll use :class:`slideflow.mil.models.Attention_MIL`. For the hyperparameters, we'll use a learning rate of 1e-4, a batch size of 32, 1cycle learning rate scheduling, and train for 10 epochs. + +.. code-block:: python + + >>> from slideflow.mil import mil_config + >>> config = mil_config( + ... model='attention_mil', + ... lr=1e-4, + ... batch_size=32, + ... epochs=10, + ... fit_one_cycle=True + ... ) + +Finally, we can train the model using :func:`slideflow.mil.train_mil`. We'll split our dataset into 70% training and 30% validation, training to the outcome "er_status_by_ihc" and saving the model to ``/model/path``. + +.. code-block:: python + + >>> from slideflow.mil import train_mil + >>> train, val = dataset.split(labels='er_status_by_ihc', val_fraction=0.3) + >>> train_mil( + ... config, + ... train_dataset=train, + ... val_dataset=val, + ... outcomes='er_status_by_ihc', + ... bags='/bags/path', + ... outdir='/model/path' + ... ) + +During training, you'll see the training/validation loss and validation AUROC for each epoch. At the end of training, you'll see the validation metrics for each outcome. + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + [18:51:01] INFO Training FastAI MIL model with config: + INFO TrainerConfigFastAI( + aggregation_level='slide' + lr=0.0001 + wd=1e-05 + bag_size=512 + fit_one_cycle=True + epochs=10 + batch_size=32 + model='attention_mil' + apply_softmax=True + model_kwargs=None + use_lens=True + ) + [18:51:02] INFO Training dataset: 272 merged bags (from 272 possible slides) + INFO Validation dataset: 116 merged bags (from 116 possible slides) + [18:51:04] INFO Training model Attention_MIL (in=1024, out=2, loss=CrossEntropyLoss) + epoch train_loss valid_loss roc_auc_score time + 0 0.328032 0.285096 0.580233 00:01 + Better model found at epoch 0 with valid_loss value: 0.2850962281227112. + 1 0.319219 0.266496 0.733721 00:01 + Better model found at epoch 1 with valid_loss value: 0.266496479511261. + 2 0.293969 0.230561 0.859690 00:01 + Better model found at epoch 2 with valid_loss value: 0.23056122660636902. + 3 0.266627 0.190546 0.927519 00:01 + Better model found at epoch 3 with valid_loss value: 0.1905461698770523. + 4 0.236985 0.165320 0.939147 00:01 + Better model found at epoch 4 with valid_loss value: 0.16532012820243835. + 5 0.215019 0.153572 0.946512 00:01 + Better model found at epoch 5 with valid_loss value: 0.153572216629982. + 6 0.199093 0.144464 0.948837 00:01 + Better model found at epoch 6 with valid_loss value: 0.1444639265537262. + 7 0.185597 0.141776 0.952326 00:01 + Better model found at epoch 7 with valid_loss value: 0.14177580177783966. + 8 0.173794 0.141409 0.951938 00:01 + Better model found at epoch 8 with valid_loss value: 0.14140936732292175. + 9 0.167547 0.140791 0.952713 00:01 + Better model found at epoch 9 with valid_loss value: 0.14079126715660095. + [18:51:18] INFO Predictions saved to {...}/predictions.parquet + INFO Validation metrics for outcome brs_class: + [18:51:18] INFO slide-level AUC (cat # 0): 0.953 AP: 0.984 (opt. threshold: 0.544) + INFO slide-level AUC (cat # 1): 0.953 AP: 0.874 (opt. threshold: 0.458) + INFO Category 0 acc: 88.4% (76/86) + INFO Category 1 acc: 83.3% (25/30) + +After training has completed, the output directory, ``/model/path``, should look like: + +.. code-block:: bash + + /model/path + ├── attention + │ ├── slide1_att.npz + │ └── ... + ├── models + │ └── best_valid.pth + ├── history.csv + ├── mil_params.json + ├── predictions.parquet + └── slide_manifest.csv + +The final model weights are saved in ``models/best_valid.pth``. Validation dataset predictions are saved in the "predictions.parquet" file. A manifest of training/validation data is saved in the "slide_manifest.csv" file, and training history is saved in the "history.csv" file. Attention values for all tiles in each slide are saved in the ``attention/`` directory. + +The final saved model can be used for evaluation (:class:`slideflow.mil.eval_mil`) or inference (:class:`slideflow.mil.predict_slide` or :ref:`Slideflow Studio `). The saved model path should be referenced by the parent directory (in this case, "/model/path") rather than the model file itself. For more information on MIL models, see :ref:`mil`. \ No newline at end of file diff --git a/docs-source/source/umap_example.png b/docs-source/source/umap_example.png index 211b1e719..cf6222c9b 100644 Binary files a/docs-source/source/umap_example.png and b/docs-source/source/umap_example.png differ diff --git a/docs-source/source/umap_example_centroid.png b/docs-source/source/umap_example_centroid.png index 728bb7db2..e37824511 100644 Binary files a/docs-source/source/umap_example_centroid.png and b/docs-source/source/umap_example_centroid.png differ diff --git a/docs-source/source/uq.rst b/docs-source/source/uq.rst index 2d2fad31d..fa334d7a7 100644 --- a/docs-source/source/uq.rst +++ b/docs-source/source/uq.rst @@ -1,9 +1,11 @@ -Uncertainty quantification +.. _uncertainty: + +Uncertainty Quantification ========================== Several uncertainty quantification (UQ) methods have been developed for deep learning models and tested in digital histopathology, including MC Dropout, deep ensembles, hyper-deep ensembles, and test-time augmentation. -In verison 1.1, we implemented a dropout-based method of uncertainty estimation (`arXiv paper `_). MC dropout UQ methods exploit the observation that neural networks with dropout approximate sampling of the Bayesian posterior. Images undergo multiple forward passes in a dropout-enabled network during inference, which results in a distribution of predictions. The standard deviation of such a distribution represents the uncertainty estimate. +Slideflow includes a dropout-based method of uncertainty estimation. MC dropout UQ methods exploit the observation that neural networks with dropout approximate sampling of the Bayesian posterior. Images undergo multiple forward passes in a dropout-enabled network during inference, which results in a distribution of predictions. The standard deviation of such a distribution represents the uncertainty estimate. Training with UQ **************** @@ -34,7 +36,134 @@ Uncertainty heatmaps If a model was trained with UQ enabled, the :meth:`slideflow.Project.generate_heatmaps()` function will automatically create uncertainty heatmaps alongside the prediction heatmaps. -Slide-level confidence & uncertainty thresholding -************************************************* +Uncertainty thresholding +************************ + +Uncertainty information can be exploited to separate slide- and patient-level predictions into low- and high-confidence. We developed an uncertainty thresholding algorithm (`BISCUIT `_) to accomplish this task, which is available in :mod:`slideflow.biscuit`. Algorithmic details and validation studies can be found in our `manuscript `_ detailing the method. + +Here, we will run through an example of how to apply this UQ thresholding strategy for a weakly-supervised classification model. At present, ``biscuit`` only supports uncertainty estimation and confidence thresholding for binary classification. + +Prepare an Experiment +--------------------- + +Start by creating a Slideflow project and then initializing a ``biscuit`` experiment, including the outcome target and the two classes. We will be training models to predict ``"HPV_status"``, with the two classes ``"positive"`` and ``"negative"``. + +.. code-block:: python + + import slideflow as sf + from slideflow import biscuit + + # Create a Slideflow project + P = sf.Project(...) + + # Initialize a biscuit experiment + experiment = biscuit.Experiment( + train_project=P, + outcome='HPV_status', + outcome1='negative', + outcome2='positive' + ) + +Next, prepare the model hyperparameters. Here, we will use the hyperparameters used in the original manuscript. + +.. code-block:: python + + hp = biscuit.hp.nature2022() + +Train with cross-validation +--------------------------- + +We'll start by training models in cross-validation on the full dataset. We'll use the default three-fold cross-validation strategy. We need to supply a label for experiment model tracking, which will be used for the rest of our experiments. + +.. code-block:: python + + # Train outer cross-validation models. + experiment.train(hp=hp, label='HPV') + +Models will be saved in the project model folder. + +Train inner cross-validation +---------------------------- + +Next, for each of the three cross-validation models trained, we will perform 5-fold nested cross-validation. Uncertainty thresholds are determined from nested cross-validation results. + +.. code-block:: python + + # Train inner, nested cross-validation models. + experiment.train_nested_cv(hp=hp, label='HPV') + +Models will again be saved in the project model directory. We can view a summary of the results from these cross-validation studies using the :func:`biscuit.find_cv()` and :func:`biscuit.get_model_results()` functions. + +.. code-block:: python + + from slideflow.biscuit import find_cv, get_model_results + + # Print results from outer cross-validation + cv_models = find_cv( + project=P, + label='HPV', + outcome='HPV_status' + ) + for m in cv_models: + results = get_model_results(m, outcome='HPV_status', epoch=1) + print(m, results['pt_auc']) + +Uncertainty thresholds are calculated using results from the inner cross-validation studies. :func:`biscuit.Experiment.thresholds_from_nested_cv` will calculate and return uncertainty and prediction thresholds. + +.. code-block:: python + + # Calculate uncertainty thresholds + df, thresh = experiment.thresholds_from_nested_cv(label='HPV') + print(thresh) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + {'tile_uq': 0.02726791, + 'slide_uq': 0.0147878695, + 'tile_pred': 0.41621968, + 'slide_pred': 0.4756707} + + +Apply thresholds to test set +---------------------------- + +Finally, we can apply these thresholds to a held out test set. First, generate predictions for a held-out test set as described in :ref:`evaluation`. Locate the parquet file containing the saved tile-level predictions and load it into a DataFrame. Rename the columns in the dataframe so that ground-truth is ``y_true``, predictions are ``y_pred``, and uncertainty is ``uncertainty``. + +.. code-block:: python + + import pandas as pd + + # Load tile-level predictions from a test set evaluation + df = pd.read_parquet('/path/to/tile_predictions.parquet.gzip') + + # Rename the columns to y_true, y_pred, and uncertainty + df.rename(columns={ + 'HPV_status-y_true': 'y_true, + 'HPV_status-y_pred1': 'y_pred', + 'HPV_status-uncertainty1': 'uncertainty' + ' + }) + +Use :func:`biscuit.threshold.apply` to apply the previously-determined thresholds to these predictions. This will return classifier metrics (AUROC, accuracy, sensitivity, specificity) for high-confidence predictions and a dataframe of slide-level high-confidence predictions. Slides with low-confidence predictions will be omitted. The percentage of slides with high-confidence predictions will be reported as ``'percent_incl'``. + +.. code-block:: python + + # Calculate high-confidence slide-level predictions + metrics, high_conf_df = biscuit.threshold.apply( + df, # Dataframe of tile-level predictions + **thresh, # Uncertainty thresholds + level='slide' # We want slide-level predictions + ) + print(metrics) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none -Uncertainty information can be exploited to separate slide- and patient-level predictions into low- and high-confidence. We developed an uncertainty thresholding algorithm (`BISCUIT `_) to accomplish this task. Further details about slide-level confidence estimation and uncertainty thresholding can be found in our manuscript `detailing the method `_. \ No newline at end of file + {'auc': 0.9703296703296704, + 'percent_incl': 0.907051282051282, + 'acc': 0.9222614840989399, + 'sensitivity': 0.9230769230769231, + 'specificity': 0.9214285714285714} \ No newline at end of file diff --git a/docs-source/source/val_er_roc_patient.png b/docs-source/source/val_er_roc_patient.png index f1896fad7..d4d754d3d 100644 Binary files a/docs-source/source/val_er_roc_patient.png and b/docs-source/source/val_er_roc_patient.png differ diff --git a/docs-source/source/val_er_roc_tile.png b/docs-source/source/val_er_roc_tile.png index 68a606f9e..cf01f8651 100644 Binary files a/docs-source/source/val_er_roc_tile.png and b/docs-source/source/val_er_roc_tile.png differ diff --git a/docs-source/source/validation.png b/docs-source/source/validation.png new file mode 100644 index 000000000..4ddc8d20a --- /dev/null +++ b/docs-source/source/validation.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc400f9157bd25befd6257321db0b57fd453d54bbe1a1c5724b3016645bf6e0b +size 46139 diff --git a/docs-source/source/validation.rst b/docs-source/source/validation.rst deleted file mode 100644 index 4c62b509c..000000000 --- a/docs-source/source/validation.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. _validation_planning: - -Validation Planning -=================== - -An important first step in creating a new project is to determine the validation plan. Three groups of data are required: - -1) **Training data** - data used for learning during training -2) **Validation data** - data used for testing during training, and early stopping (if applicable) -3) **Evaluation data** - data used for final evaluation once training has completed. Preferably an external cohort. - -Validation data is used to assess model performance and generalizability during training. Once the model and parameters have been tuned with training/validation, the final model's performance is assessed on the held-out evaluation set. - -Configuring a validation plan -***************************** - -There are several ways you can plan to validate your data. The validation settings available include: - -- **strategy**: *'bootstrap'*, *'k-fold'*, *k-fold-manual'*, *'k-fold-preserved-site'*, *'fixed'*, *'none'* -- **fraction**: (float between 0-1) [not used for k-fold validation] -- **k_fold**: int - -The default strategy is 'k-fold', with k=3. - -Validation strategy -^^^^^^^^^^^^^^^^^^^ - -The ``strategy`` option determines how the validation data is selected. - -If **fixed**, a certain percentage of your training data is set aside for testing (determined by ``fraction``). The chosen validation subset is saved to a log file and will be re-used for all training iterations. - -If **bootstrap**, validation data will be selected at random (percentage determined by ``fraction``), and all training iterations will be repeated a number of times equal to ``k_fold``. The saved and reported model training metrics will be an average of all bootstrap iterations. - -If **k-fold**, training data will be automatically separated into *k* number of groups (where *k* is equal to ``k_fold``), and all training iterations will be repeated *k* number of times using k-fold cross validation. The saved and reported model training metrics will be an average of all k-fold iterations. - -If you would like to manually separate your data into k-folds, you may do so with the **k-fold-manual** strategy. Assign each slide to a k-fold cohort in the annotations file, and designate the appropriate column header with ``k_fold_header`` - -The **k-fold-preserved-site** strategy is a cross-validation strategy that ensures site is preserved across the training/validation sets, in order to reduce bias from batch effect as described by `Howard, et al `_. This strategy is recommended when using data from The Cancer Genome Atlas (`TCGA `_). - -.. note:: - Preserved-site cross-validation requires `CPLEX `_. The original implementation of the preserved-site cross-validation algorithm described by Howard et al can be found `on GitHub `_. - -If **none**, no validation testing will be performed. - -Selecting an evaluation cohort -****************************** - -Designating an evaluation cohort is done using the project annotations file, with a column indicating whether a slide is set aside for evaluation. -The training and evaluation functions include a ``filter`` argument which will allow you to restrict your training or evaluation according to these annotations. This will be discussed in greater detail in subsequent sections. diff --git a/docs-source/source/workbench_preview.png b/docs-source/source/workbench_preview.png new file mode 100644 index 000000000..b79ccafca --- /dev/null +++ b/docs-source/source/workbench_preview.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8eb475122a6ad36a07d55bed7439e7d8483b0c9480b3a3afe1f725ae3e2c01d +size 4182433 diff --git a/docs-source/versions.html b/docs-source/versions.html new file mode 100644 index 000000000..049a2ea12 --- /dev/null +++ b/docs-source/versions.html @@ -0,0 +1,30 @@ + + + + + + + + + + + + + +
+
+

Slideflow Documentation

+
+

Pick a version

+ +
+
+ + diff --git a/docs/.gitattributes b/docs/.gitattributes new file mode 100644 index 000000000..09a5c9e3b --- /dev/null +++ b/docs/.gitattributes @@ -0,0 +1,7 @@ +stylegan.webm filter=lfs diff=lfs merge=lfs -text +studio_preview.webm filter=lfs diff=lfs merge=lfs -text +gan_seeds.mp4 filter=lfs diff=lfs merge=lfs -text +heatmap.mp4 filter=lfs diff=lfs merge=lfs -text +mil_attention.mp4 filter=lfs diff=lfs merge=lfs -text +roi_label.mp4 filter=lfs diff=lfs merge=lfs -text +tissue_seg.mp4 filter=lfs diff=lfs merge=lfs -text diff --git a/docs/_images/3d_umap.png b/docs/_images/3d_umap.png deleted file mode 100644 index c9c01aef3..000000000 Binary files a/docs/_images/3d_umap.png and /dev/null differ diff --git a/docs/_images/att_heatmap.jpg b/docs/_images/att_heatmap.jpg new file mode 100644 index 000000000..37abb737d Binary files /dev/null and b/docs/_images/att_heatmap.jpg differ diff --git a/docs/_images/blur.png b/docs/_images/blur.png new file mode 100644 index 000000000..2d91ce2a8 Binary files /dev/null and b/docs/_images/blur.png differ diff --git a/docs/_images/boxplot_example.png b/docs/_images/boxplot_example.png deleted file mode 100644 index fa8eb3f19..000000000 Binary files a/docs/_images/boxplot_example.png and /dev/null differ diff --git a/docs/_images/cell_masked.png b/docs/_images/cell_masked.png new file mode 100644 index 000000000..ac6c24457 Binary files /dev/null and b/docs/_images/cell_masked.png differ diff --git a/docs/_images/cell_segmentation.png b/docs/_images/cell_segmentation.png new file mode 100644 index 000000000..75959b7bb Binary files /dev/null and b/docs/_images/cell_segmentation.png differ diff --git a/docs/_images/cell_unmasked.png b/docs/_images/cell_unmasked.png new file mode 100644 index 000000000..987fd20c0 Binary files /dev/null and b/docs/_images/cell_unmasked.png differ diff --git a/docs/_images/cellseg_workbench_advanced.png b/docs/_images/cellseg_workbench_advanced.png new file mode 100644 index 000000000..fffb5ce8d Binary files /dev/null and b/docs/_images/cellseg_workbench_advanced.png differ diff --git a/docs/_images/cellseg_workbench_flows.png b/docs/_images/cellseg_workbench_flows.png new file mode 100644 index 000000000..67b3f087e Binary files /dev/null and b/docs/_images/cellseg_workbench_flows.png differ diff --git a/docs/_images/cellseg_workbench_masks.png b/docs/_images/cellseg_workbench_masks.png new file mode 100644 index 000000000..b52384cd5 Binary files /dev/null and b/docs/_images/cellseg_workbench_masks.png differ diff --git a/docs/_images/cellseg_workbench_outlines.png b/docs/_images/cellseg_workbench_outlines.png new file mode 100644 index 000000000..b18c04ae3 Binary files /dev/null and b/docs/_images/cellseg_workbench_outlines.png differ diff --git a/docs/_images/cellseg_workbench_panel.png b/docs/_images/cellseg_workbench_panel.png new file mode 100644 index 000000000..11d225901 Binary files /dev/null and b/docs/_images/cellseg_workbench_panel.png differ diff --git a/docs/_images/example_report_small.jpg b/docs/_images/example_report_small.jpg new file mode 100644 index 000000000..d2c026a43 Binary files /dev/null and b/docs/_images/example_report_small.jpg differ diff --git a/docs/_images/heatmap_inset.jpg b/docs/_images/heatmap_inset.jpg new file mode 100644 index 000000000..b1e0bbc13 Binary files /dev/null and b/docs/_images/heatmap_inset.jpg differ diff --git a/docs/_images/otsu.png b/docs/_images/otsu.png new file mode 100644 index 000000000..5ec4a9919 Binary files /dev/null and b/docs/_images/otsu.png differ diff --git a/docs/_images/roi_filter.jpg b/docs/_images/roi_filter.jpg new file mode 100644 index 000000000..de7e15545 Binary files /dev/null and b/docs/_images/roi_filter.jpg differ diff --git a/docs/_images/saliency_heatmap.jpg b/docs/_images/saliency_heatmap.jpg new file mode 100644 index 000000000..09e8290e5 Binary files /dev/null and b/docs/_images/saliency_heatmap.jpg differ diff --git a/docs/_images/saliency_overlay.jpg b/docs/_images/saliency_overlay.jpg new file mode 100644 index 000000000..c7963636e Binary files /dev/null and b/docs/_images/saliency_overlay.jpg differ diff --git a/docs/_images/saliency_source.jpg b/docs/_images/saliency_source.jpg new file mode 100644 index 000000000..e5b324295 Binary files /dev/null and b/docs/_images/saliency_source.jpg differ diff --git a/docs/_images/slide_filter.jpg b/docs/_images/slide_filter.jpg new file mode 100644 index 000000000..f33468f6b Binary files /dev/null and b/docs/_images/slide_filter.jpg differ diff --git a/docs/_images/studio_extensions.jpg b/docs/_images/studio_extensions.jpg new file mode 100644 index 000000000..be2a8ba34 Binary files /dev/null and b/docs/_images/studio_extensions.jpg differ diff --git a/docs/_images/studio_heatmap.jpg b/docs/_images/studio_heatmap.jpg new file mode 100644 index 000000000..2f5feb8e7 Binary files /dev/null and b/docs/_images/studio_heatmap.jpg differ diff --git a/docs/_images/studio_mosaic.jpg b/docs/_images/studio_mosaic.jpg new file mode 100644 index 000000000..416c78ed1 Binary files /dev/null and b/docs/_images/studio_mosaic.jpg differ diff --git a/docs/_images/studio_performance.jpg b/docs/_images/studio_performance.jpg new file mode 100644 index 000000000..fcd4bfc10 Binary files /dev/null and b/docs/_images/studio_performance.jpg differ diff --git a/docs/_images/studio_projects.jpg b/docs/_images/studio_projects.jpg new file mode 100644 index 000000000..96d727572 Binary files /dev/null and b/docs/_images/studio_projects.jpg differ diff --git a/docs/_images/studio_rois.jpg b/docs/_images/studio_rois.jpg new file mode 100644 index 000000000..8e03aa9df Binary files /dev/null and b/docs/_images/studio_rois.jpg differ diff --git a/docs/_images/studio_saliency.jpg b/docs/_images/studio_saliency.jpg new file mode 100644 index 000000000..ce82e8ef3 Binary files /dev/null and b/docs/_images/studio_saliency.jpg differ diff --git a/docs/_images/studio_section_labels.jpg b/docs/_images/studio_section_labels.jpg new file mode 100644 index 000000000..0ebbdae69 Binary files /dev/null and b/docs/_images/studio_section_labels.jpg differ diff --git a/docs/_images/studio_slide.jpg b/docs/_images/studio_slide.jpg new file mode 100644 index 000000000..b123e6ce5 Binary files /dev/null and b/docs/_images/studio_slide.jpg differ diff --git a/docs/_images/studio_slide_preds.jpg b/docs/_images/studio_slide_preds.jpg new file mode 100644 index 000000000..a7ffb0dc1 Binary files /dev/null and b/docs/_images/studio_slide_preds.jpg differ diff --git a/docs/_images/studio_tile_preds.jpg b/docs/_images/studio_tile_preds.jpg new file mode 100644 index 000000000..acbc22d20 Binary files /dev/null and b/docs/_images/studio_tile_preds.jpg differ diff --git a/docs/_images/tile_extraction_overview.png b/docs/_images/tile_extraction_overview.png new file mode 100644 index 000000000..d69dcc824 Binary files /dev/null and b/docs/_images/tile_extraction_overview.png differ diff --git a/docs/_images/tile_filter.jpg b/docs/_images/tile_filter.jpg new file mode 100644 index 000000000..1f084fdb0 Binary files /dev/null and b/docs/_images/tile_filter.jpg differ diff --git a/docs/_images/tile_macenko_v1.jpg b/docs/_images/tile_macenko_v1.jpg new file mode 100644 index 000000000..53e0f31f6 Binary files /dev/null and b/docs/_images/tile_macenko_v1.jpg differ diff --git a/docs/_images/tile_macenko_v2.jpg b/docs/_images/tile_macenko_v2.jpg new file mode 100644 index 000000000..ce326bd35 Binary files /dev/null and b/docs/_images/tile_macenko_v2.jpg differ diff --git a/docs/_images/tile_norm_compare.jpg b/docs/_images/tile_norm_compare.jpg new file mode 100644 index 000000000..29ad22b62 Binary files /dev/null and b/docs/_images/tile_norm_compare.jpg differ diff --git a/docs/_images/tile_reinhard_v1.jpg b/docs/_images/tile_reinhard_v1.jpg new file mode 100644 index 000000000..8403e846f Binary files /dev/null and b/docs/_images/tile_reinhard_v1.jpg differ diff --git a/docs/_images/tile_reinhard_v2.jpg b/docs/_images/tile_reinhard_v2.jpg new file mode 100644 index 000000000..aed77cf7c Binary files /dev/null and b/docs/_images/tile_reinhard_v2.jpg differ diff --git a/docs/_images/tile_unnormalized.jpg b/docs/_images/tile_unnormalized.jpg new file mode 100644 index 000000000..df7ace34a Binary files /dev/null and b/docs/_images/tile_unnormalized.jpg differ diff --git a/docs/_images/tile_vahadane_spams_v1.jpg b/docs/_images/tile_vahadane_spams_v1.jpg new file mode 100644 index 000000000..924babfd5 Binary files /dev/null and b/docs/_images/tile_vahadane_spams_v1.jpg differ diff --git a/docs/_images/tile_vahadane_spams_v2.jpg b/docs/_images/tile_vahadane_spams_v2.jpg new file mode 100644 index 000000000..144096254 Binary files /dev/null and b/docs/_images/tile_vahadane_spams_v2.jpg differ diff --git a/docs/_images/tile_vahadane_v1.jpg b/docs/_images/tile_vahadane_v1.jpg new file mode 100644 index 000000000..12868127a Binary files /dev/null and b/docs/_images/tile_vahadane_v1.jpg differ diff --git a/docs/_images/tile_vahadane_v2.jpg b/docs/_images/tile_vahadane_v2.jpg new file mode 100644 index 000000000..73e9cbc3b Binary files /dev/null and b/docs/_images/tile_vahadane_v2.jpg differ diff --git a/docs/_images/umap_example.png b/docs/_images/umap_example.png new file mode 100644 index 000000000..211b1e719 Binary files /dev/null and b/docs/_images/umap_example.png differ diff --git a/docs/_images/validation.png b/docs/_images/validation.png new file mode 100644 index 000000000..74d76ea3e Binary files /dev/null and b/docs/_images/validation.png differ diff --git a/docs/_images/workbench_preview.png b/docs/_images/workbench_preview.png new file mode 100644 index 000000000..4e2bbe227 Binary files /dev/null and b/docs/_images/workbench_preview.png differ diff --git a/docs/_images/wsi_macenko_v1.jpg b/docs/_images/wsi_macenko_v1.jpg new file mode 100644 index 000000000..fe83a4963 Binary files /dev/null and b/docs/_images/wsi_macenko_v1.jpg differ diff --git a/docs/_images/wsi_macenko_v2.jpg b/docs/_images/wsi_macenko_v2.jpg new file mode 100644 index 000000000..a6acf26e8 Binary files /dev/null and b/docs/_images/wsi_macenko_v2.jpg differ diff --git a/docs/_images/wsi_norm_compare.jpg b/docs/_images/wsi_norm_compare.jpg new file mode 100644 index 000000000..e15265f97 Binary files /dev/null and b/docs/_images/wsi_norm_compare.jpg differ diff --git a/docs/_images/wsi_reinhard_v1.jpg b/docs/_images/wsi_reinhard_v1.jpg new file mode 100644 index 000000000..e99014945 Binary files /dev/null and b/docs/_images/wsi_reinhard_v1.jpg differ diff --git a/docs/_images/wsi_reinhard_v2.jpg b/docs/_images/wsi_reinhard_v2.jpg new file mode 100644 index 000000000..cbd4e7257 Binary files /dev/null and b/docs/_images/wsi_reinhard_v2.jpg differ diff --git a/docs/_images/wsi_unnormalized.jpg b/docs/_images/wsi_unnormalized.jpg new file mode 100644 index 000000000..b2124eb0f Binary files /dev/null and b/docs/_images/wsi_unnormalized.jpg differ diff --git a/docs/_images/wsi_vahadane_spams_v1.jpg b/docs/_images/wsi_vahadane_spams_v1.jpg new file mode 100644 index 000000000..604a1655e Binary files /dev/null and b/docs/_images/wsi_vahadane_spams_v1.jpg differ diff --git a/docs/_images/wsi_vahadane_spams_v2.jpg b/docs/_images/wsi_vahadane_spams_v2.jpg new file mode 100644 index 000000000..604a1655e Binary files /dev/null and b/docs/_images/wsi_vahadane_spams_v2.jpg differ diff --git a/docs/_images/wsi_vahadane_v1.jpg b/docs/_images/wsi_vahadane_v1.jpg new file mode 100644 index 000000000..b44bb7939 Binary files /dev/null and b/docs/_images/wsi_vahadane_v1.jpg differ diff --git a/docs/_images/wsi_vahadane_v2.jpg b/docs/_images/wsi_vahadane_v2.jpg new file mode 100644 index 000000000..604a1655e Binary files /dev/null and b/docs/_images/wsi_vahadane_v2.jpg differ diff --git a/docs/_modules/index.html b/docs/_modules/index.html new file mode 100644 index 000000000..e289c077b --- /dev/null +++ b/docs/_modules/index.html @@ -0,0 +1,454 @@ + + + + + + + + + + + + Overview: module code — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+ + +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/biscuit/delong/index.html b/docs/_modules/slideflow/biscuit/delong/index.html new file mode 100644 index 000000000..ab07cae44 --- /dev/null +++ b/docs/_modules/slideflow/biscuit/delong/index.html @@ -0,0 +1,524 @@ + + + + + + + + + + + + slideflow.biscuit.delong — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.biscuit.delong

+import numpy as np
+import scipy.stats
+
+# AUC comparison adapted from
+# https://github.com/Netflix/vmaf/
+def compute_midrank(x):
+    """Computes midranks.
+
+    Args:
+       x - a 1D numpy array
+
+    Returns:
+       array of midranks
+
+    """
+    J = np.argsort(x)
+    Z = x[J]
+    N = len(x)
+    T = np.zeros(N, dtype=np.float32)
+    i = 0
+    while i < N:
+        j = i
+        while j < N and Z[j] == Z[i]:
+            j += 1
+        T[i:j] = 0.5*(i + j - 1)
+        i = j
+    T2 = np.empty(N, dtype=np.float32)
+    # Note(kazeevn) +1 is due to Python using 0-based indexing
+    # instead of 1-based in the AUC formula in the paper
+    T2[J] = T + 1
+    return T2
+
+
+
[docs]def fastDeLong(predictions_sorted_transposed, label_1_count): + """The fast version of DeLong's method for computing the covariance of + unadjusted AUC. + + Args: + predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples] + sorted such as the examples with label "1" are first + + Returns: + (AUC value, DeLong covariance) + + """ + # Short variables are named as they are in the paper + m = label_1_count + n = predictions_sorted_transposed.shape[1] - m + positive_examples = predictions_sorted_transposed[:, :m] + negative_examples = predictions_sorted_transposed[:, m:] + k = predictions_sorted_transposed.shape[0] + + tx = np.empty([k, m], dtype=np.float32) + ty = np.empty([k, n], dtype=np.float32) + tz = np.empty([k, m + n], dtype=np.float32) + for r in range(k): + tx[r, :] = compute_midrank(positive_examples[r, :]) + ty[r, :] = compute_midrank(negative_examples[r, :]) + tz[r, :] = compute_midrank(predictions_sorted_transposed[r, :]) + aucs = tz[:, :m].sum(axis=1) / m / n - float(m + 1.0) / 2.0 / n + v01 = (tz[:, :m] - tx[:, :]) / n + v10 = 1.0 - (tz[:, m:] - ty[:, :]) / m + sx = np.cov(v01) + sy = np.cov(v10) + delongcov = sx / m + sy / n + return aucs, delongcov
+ + +def calc_pvalue(aucs, sigma): + """Computes log(10) of p-values. + + Args: + aucs: 1D array of AUCs + sigma: AUC DeLong covariances + + Returns: + log10(pvalue) + """ + l = np.array([[1, -1]]) + z = np.abs(np.diff(aucs)) / np.sqrt(np.dot(np.dot(l, sigma), l.T)) + return np.log10(2) + scipy.stats.norm.logsf(z, loc=0, scale=1) / np.log(10) + + +def compute_ground_truth_statistics(ground_truth): + assert np.array_equal(np.unique(ground_truth), [0, 1]) + order = (-ground_truth).argsort() + label_1_count = int(ground_truth.sum()) + return order, label_1_count + + +
[docs]def delong_roc_variance(ground_truth, predictions): + """Computes ROC AUC variance for a single set of predictions + + Args: + ground_truth: np.array of 0 and 1 + predictions: np.array of floats of the probability of being class 1 + """ + order, label_1_count = compute_ground_truth_statistics(ground_truth) + predictions_sorted_transposed = predictions[np.newaxis, order] + aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count) + assert len(aucs) == 1, "There is a bug in the code, please forward this to the developers" + return aucs[0], delongcov
+ + +
[docs]def delong_roc_test(ground_truth, predictions_one, predictions_two): + """Computes log(p-value) for hypothesis that two ROC AUCs are different + + Args: + ground_truth: np.array of 0 and 1 + predictions_one: predictions of the first model, + np.array of floats of the probability of being class 1 + predictions_two: predictions of the second model, + np.array of floats of the probability of being class 1 + """ + order, label_1_count = compute_ground_truth_statistics(ground_truth) + predictions_sorted_transposed = np.vstack((predictions_one, predictions_two))[:, order] + aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count) + return calc_pvalue(aucs, delongcov)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/biscuit/experiment/index.html b/docs/_modules/slideflow/biscuit/experiment/index.html new file mode 100644 index 000000000..b2c09c640 --- /dev/null +++ b/docs/_modules/slideflow/biscuit/experiment/index.html @@ -0,0 +1,1509 @@ + + + + + + + + + + + + slideflow.biscuit.experiment — slideflow 2.3.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.biscuit.experiment

+import shutil
+import pandas as pd
+import seaborn as sns
+import matplotlib.pyplot as plt
+import matplotlib.ticker as plticker
+import numpy as np
+from skmisc.loess import loess
+from scipy import stats
+from tqdm import tqdm
+from statistics import mean
+from os.path import join, exists
+
+import slideflow as sf
+from slideflow.util import log
+from . import utils, threshold
+from . import hp as biscuit_hp
+from .errors import MatchError, ModelNotFoundError, ThresholdError
+
+# -----------------------------------------------------------------------------
+
+ALL_EXP = {
+    'AA': 'full',
+    'U': 800,
+    'T': 700,
+    'S': 600,
+    'R': 500,
+    'A': 400,
+    'L': 350,
+    'M': 300,
+    'N': 250,
+    'D': 200,
+    'O': 176,
+    'P': 150,
+    'Q': 126,
+    'G': 100,
+    'V': 90,
+    'W': 80,
+    'X': 70,
+    'Y': 60,
+    'Z': 50,
+    'ZA': 40,
+    'ZB': 30,
+    'ZC': 20,
+    'ZD': 10
+}
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Experiment: + def __init__( + self, + train_project, + eval_projects=None, + outcome='cohort', + outcome1='LUAD', + outcome2='LUSC', + outdir='results' + ): + """Supervises uncertainty thresholding experiments.""" + + if eval_projects is None: + eval_projects = [] + + if isinstance(train_project, str): + self.train_project = sf.Project(train_project) + elif isinstance(train_project, sf.Project): + self.train_project = train_project + else: + raise ValueError(f"Unrecognized value for train_project: {train_project}") + + self.eval_projects = [] + for ep in eval_projects: + if isinstance(ep, str): + self.eval_projects += [sf.Project(ep)] + elif isinstance(ep, sf.Project): + self.eval_projects += [ep] + else: + raise ValueError(f"Unrecognized value for eval_project: {eval_projects}") + + self.outcome = outcome + self.outcome1 = outcome1 + self.outcome2 = outcome2 + self.outdir = outdir + + def add(self, path, label, out1, out2, order='f', order_col='order', gan=0): + """Adds a sample size experiment to the given project annotations file. + + Args: + path (str): Path to project annotations file. + label (str): Experimental label. + out1 (int): Number of lung adenocarcinomas (LUAD) to include in the + experiment. + out2 (int): Number of lung squamous cell carcinomas (LUSC) to include + in the experiment. + outcome (str, optional): Annotation header which indicates the outcome + of interest. Defaults to 'cohort'. + order (str, optional): 'f' (forward) or 'r' (reverse). Indicates which + direction to follow when sequentially adding slides. + Defaults to 'f'. + order_col (str, optional): Annotation header column to use when + sequentially adding slides. Defaults to 'order'. + gan (int, optional): Number of GAN slides to include in experiment. + Defaults to 0. + + Returns: + None + """ + + assert isinstance(out1, int) + assert isinstance(out2, int) + assert isinstance(gan, (int, float)) and 0 <= gan < 1 + assert order in ('f', 'r') + + ann = pd.read_csv(path, dtype=str) + print(f"Setting up exp. {label} with order {order} (sort by {order_col})") + ann[order_col] = pd.to_numeric(ann[order_col]) + ann.sort_values( + ['gan', self.outcome, order_col], + ascending=[True, True, (order != 'r')], + inplace=True + ) + gan_out1 = round(gan * out1) + gan_out2 = round(gan * out2) + out1_indices = np.where((ann['site'].to_numpy() != 'GAN') + & (ann[self.outcome] == self.outcome1))[0] + out2_indices = np.where((ann['site'].to_numpy() != 'GAN') + & (ann[self.outcome] == self.outcome2))[0] + gan_out1_indices = np.where((ann['site'].to_numpy() == 'GAN') + & (ann[self.outcome] == self.outcome1))[0] + gan_out2_indices = np.where((ann['site'].to_numpy() == 'GAN') + & (ann[self.outcome] == self.outcome2))[0] + + assert out1 <= out1_indices.shape[0] + assert out2 <= out2_indices.shape[0] + assert gan_out1 <= gan_out1_indices.shape[0] + assert gan_out2 <= gan_out2_indices.shape[0] + + include = np.array(['exclude' for _ in range(len(ann))]) + include[out1_indices[:out1]] = 'include' + include[out2_indices[:out2]] = 'include' + include[gan_out1_indices[:gan_out1]] = 'include' + include[gan_out2_indices[:gan_out2]] = 'include' + ann[f'include_{label}'] = include + ann.to_csv(path, index=False) + + @staticmethod + def config(name_pattern, subset, ratio, **kwargs): + """Configures a set of experiments. + + Args: + name_pattern (str): String pattern for experiment naming. + subset (list(str)): List of experiment ID/labels. + ratio (float): Float 0-1. n_out1 / n_out2 (or n_out2 / n_out1) + """ + + if not isinstance(ratio, (int, float)) and ratio >= 1: + raise ValueError("Invalid ratio; must be float >= 1") + config = {} + for exp in ALL_EXP: + if exp not in subset: + continue + if exp == 'AA' and ratio != 1: + raise ValueError("Cannot create full dataset exp. with ratio != 1") + + exp_name = name_pattern.format(exp) + if ratio != 1: + n1 = round(ALL_EXP[exp] / (1 + (1/ratio))) + n2 = ALL_EXP[exp] - n1 + + config.update({ + exp_name: {'out1': n1, 'out2': n2, **kwargs}, + exp_name+'i': {'out1': n2, 'out2': n1, **kwargs} + }) + else: + if ALL_EXP[exp] == 'full': + n_out1 = 467 + n_out2 = 474 + else: + n_out1 = n_out2 = int(ALL_EXP[exp] / 2) + config.update({ + exp_name: {'out1': n_out1, 'out2': n_out2, **kwargs}, + }) + return config + + def display(self, df, eval_dfs, hue='uq', palette='tab10', relplot_uq_compare=True, + boxplot_uq_compare=True, ttest_uq_groups=['all', 'include'], + prefix=''): + """Creates plots from assmebled results, exports results to CSV. + + Args: + df (pandas.DataFrame): Cross-validation results metrics, as generated + by results() + eval_dfs (dict(pandas.DataFrame)): Dict of external eval dataset names + (keys) mapped to pandas DataFrame of result metrics (values). + hue (str, optional): Comparison to show with different hue on plots. + Defaults to 'uq'. + palette (str, optional): Seaborn color palette. Defaults to 'tab10'. + relplot_uq_compare (bool, optional): For the Relplot display, ensure + non-UQ and UQ results are generated from the same models/preds. + boxplot_uq_compare (bool, optional): For the boxplot display, ensure + non-UQ and UQ results are generated from the same models/preds. + ttest_uq_groups (list(str)): UQ groups to compare via t-test. Defaults + to ['all', 'include']. + prefix (str, optional): Prefix to use when saving figures. + Defaults to empty string. + + Returns: + None + """ + + if not len(df): + log.error("No results to display") + return + + # Filter out UQ results if n_slides < 100 + df = df.loc[~ ((df['n_slides'] < 100) + & (df['uq'].isin(['include', 'exclude'])))] + + # --- Paired t-tests --------------------------------------------------- + if ttest_uq_groups and len(ttest_uq_groups) != 2: + raise ValueError("Length of ttest_uq_groups must be exactly 2") + ttest_df = df.loc[df['uq'].isin(ttest_uq_groups)].copy() + ttest_df = ttest_df.sort_values(['id', 'fold']) + + def perform_paired_testing(level): + print(f"Paired t-tests ({level}-level):") + for n in sorted(ttest_df['n_slides'].unique()): + exp_df = ttest_df[ttest_df['n_slides'] == n] + try: + ttest_result = stats.ttest_rel( + exp_df.loc[exp_df['uq'] == ttest_uq_groups[0]][f'{level}_auc'], + exp_df.loc[exp_df['uq'] == ttest_uq_groups[1]][f'{level}_auc'], + alternative='less') + print(n, '\t', 'p =', ttest_result.pvalue) + except ValueError: + print(n, '\t', 'p = (error)') + + perform_paired_testing('patient') + perform_paired_testing('slide') + + # --- Cross-validation plots ------------------------------------------- + + if len(df): + # AUC (relplot) + if relplot_uq_compare: + rel_df = df.loc[df['uq'] != 'none'] + else: + rel_df = df + sns.relplot( + x='n_slides', + y='slide_auc', + data=rel_df, + hue=hue, + marker='o', + kind='line', + palette=palette + ) + plt.title('Cross-val AUC') + ax = plt.gca() + ax.set_ylim([0.5, 1]) + ax.grid(visible=True, which='both', axis='both', color='white') + ax.set_facecolor('#EAEAF2') + ax.xaxis.set_minor_locator(plticker.MultipleLocator(100)) + plt.subplots_adjust(top=0.9) + plt.savefig(join(self.outdir, f'{prefix}relplot.svg')) + + f, axes = plt.subplots(1, 3) + f.set_size_inches(18, 6) + + # AUC boxplot + if boxplot_uq_compare: + box_df = df.loc[df['uq'] != 'none'] + else: + box_df = df + sns.boxplot( + x='n_slides', + y='slide_auc', + hue=hue, + data=box_df, + ax=axes[0], + palette=palette + ) + axes[0].title.set_text('Cross-val AUC') + axes[0].set_ylabel('') + axes[0].tick_params(labelrotation=90) + + # AUC scatter - LOESS & standard error + df = df.sort_values(by=['n_slides']) + x = df['n_slides'].to_numpy().astype(np.float32) + y = df['slide_auc'].to_numpy() + lo = loess(x, y) + try: + lo.fit() + pred = lo.predict(x, stderror=True) + conf = pred.confidence() + z = pred.values + ll = conf.lower + ul = conf.upper + axes[1].plot(x, y, '+', ms=6) + axes[1].plot(x, z) + axes[1].fill_between(x, ll, ul, alpha=.33) + except ValueError: + pass + + axes[1].xaxis.set_minor_locator(plticker.MultipleLocator(20)) + axes[1].spines['bottom'].set_linewidth(0.5) + axes[1].spines['bottom'].set_color('black') + axes[1].tick_params(axis='x', colors='black') + axes[1].grid(visible=True, which='both', axis='both', color='white') + axes[1].set_facecolor('#EAEAF2') + axes[1].set_xscale('log') + axes[1].title.set_text('Cross-val AUC') + + # % slides included + sns.lineplot( + x='n_slides', + y='patient_uq_perc', + data=df, + marker='o', + ax=axes[2], + zorder=3 + ) + axes[2].set_ylabel('') + axes[2].title.set_text('% Patients Included with UQ (cross-val)') + axes[2].xaxis.set_minor_locator(plticker.MultipleLocator(100)) + axes[2].tick_params(labelrotation=90) + axes[2].grid(visible=True, which='both', axis='both', color='white', zorder=0) + axes[2].set_facecolor('#EAEAF2') + axes[2].set_xlim(100) + axes[2].scatter(x=df.groupby('n_slides', as_index=False).median().n_slides.values, y=df.groupby('n_slides').median().patient_uq_perc.values, marker='x', zorder=5) + + plt.subplots_adjust(bottom=0.2) + plt.savefig(join(self.outdir, f'{prefix}crossval.svg')) + + # --- Evaluation plots ---------------------------------------------------- + + if eval_dfs: + for eval_name, eval_df in eval_dfs.items(): + if not len(eval_df): + continue + has_uq = len(eval_df.loc[eval_df['uq'].isin(['include', 'exclude'])]) + + # Prepare figure + sns.set(rc={"xtick.bottom": True, "ytick.left": True}) + f, axes = plt.subplots(1, (4 if has_uq else 3)) + f.suptitle(f'{eval_name} Evaluation Dataset') + f.set_size_inches(16, 4) + + # AUC + if not len(eval_df): + continue + eval_df = eval_df.loc[~ ((eval_df['n_slides'] < 100) + & (eval_df['uq'].isin(['include', 'exclude'])))] + sns.lineplot( + x='n_slides', + y='patient_auc', + hue=hue, + data=eval_df, + marker="o", + ax=axes[0] + ) + sns.scatterplot( + x='n_slides', + y='slide_auc', + hue=hue, + data=eval_df, + marker="x", + ax=axes[0] + ) + axes[0].get_legend().remove() + axes[0].title.set_text('AUC') + + # Accuracy + sns.lineplot( + x='n_slides', + y='patient_acc', + hue=hue, + data=eval_df, + marker="o", + ax=axes[1] + ) + sns.scatterplot( + x='n_slides', + y='slide_acc', + hue=hue, + data=eval_df, + marker="x", + ax=axes[1] + ) + axes[1].get_legend().remove() + axes[1].title.set_text('Accuracy') + + # Youden's index + sns.lineplot( + x='n_slides', + y='patient_youden', + hue=hue, + data=eval_df, + marker="o", + ax=axes[2] + ) + sns.scatterplot( + x='n_slides', + y='slide_youden', + hue=hue, + data=eval_df, + marker="x", + ax=axes[2] + ) + axes[2].title.set_text("Youden's J") + axes[2].get_legend().remove() + + # % slides included + if has_uq: + sns.lineplot( + x='n_slides', + y='patient_incl', + data=eval_df.loc[eval_df['uq'] == 'include'], + marker='o' + ) + sns.scatterplot( + x='n_slides', + y='slide_incl', + data=eval_df.loc[eval_df['uq'] == 'include'], + marker='x' + ) + axes[3].title.set_text('% Included') + for ax in axes: + ax.set_ylabel('') + ax.xaxis.set_major_locator(plticker.MultipleLocator(base=100)) + ax.tick_params(labelrotation=90) + plt.subplots_adjust(top=0.8) + plt.subplots_adjust(bottom=0.2) + plt.savefig(join(self.outdir, f'{prefix}eval.svg')) + + def plot_uq_calibration(self, label, tile_uq, slide_uq, slide_pred, epoch=1): + """Plots a graph of predictions vs. uncertainty. + + Args: + label (str): Experiment label. + kfold (int): Validation k-fold. + tile_uq (float): Tile-level uncertainty threshold. + slide_uq (float): Slide-level uncertainty threshold. + slide_pred (float): Slide-level prediction threshold. + + Returns: + None + """ + + val_dfs = [ + pd.read_csv( + join( + utils.find_model(self.train_project, label, kfold=k, outcome=self.outcome), + f'tile_predictions_val_epoch{epoch}.csv'), + dtype={'slide': str}) + for k in range(1, 4) + ] + for v in range(len(val_dfs)): + utils.rename_cols(val_dfs[v], outcome=self.outcome) + _df = val_dfs[0] + _df = pd.concat([_df, val_dfs[1]], axis=0, join='outer', ignore_index=True) + _df = pd.concat([_df, val_dfs[2]], axis=0, join='outer', ignore_index=True) + + # Plot tile-level uncertainty + patients = self.train_project.dataset().patients() + _df, _ = threshold.process_tile_predictions(_df, patients=patients) + threshold.plot_uncertainty( + _df, + kind='tile', + threshold=tile_uq, + title=f'CV UQ Calibration: {label}' + ) + # Plot slide-level uncertainty + _df = _df[_df['uncertainty'] < tile_uq] + _s_df, _ = threshold.process_group_predictions( + _df, + pred_thresh=slide_pred, + level='slide' + ) + threshold.plot_uncertainty( + _s_df, + kind='slide', + threshold=slide_uq, + title=f'CV UQ Calibration: {label}' + ) + + def results(self, exp_to_run, uq=True, eval=True, plot=False): + """Assembles results from experiments, applies UQ thresholding, + and returns pandas dataframes with metrics. + + Args: + exp_to_run (list): List of experiment IDs to search for results. + uq (bool, optional): Apply UQ thresholds. Defaults to True. + eval (bool, optional): Calculate results of external evaluation models. + Defaults to True. + plot (bool, optional): Show plots. Defaults to False. + + Returns: + pandas.DataFrame: Cross-val results, + pandas.DataFrame: Dxternal eval results + """ + + # === Initialize projects & prepare experiments =========================== + + P = self.train_project + eval_Ps = self.eval_projects + df = pd.DataFrame() + eval_dfs = {val_P.name: pd.DataFrame() for val_P in eval_Ps} + prediction_thresholds = {} + slide_uq_thresholds = {} + tile_uq_thresholds = {} + pred_uq_thresholds = {} + + # === Show results from designated epoch ================================== + for exp in exp_to_run: + try: + models = utils.find_cv(P, f'EXP_{exp}', outcome=self.outcome) + except MatchError: + log.debug(f"Unable to find cross-val results for {exp}; skipping") + continue + for i, m in enumerate(models): + try: + results = utils.get_model_results(m, outcome=self.outcome, epoch=1) + except FileNotFoundError: + print(f"Unable to open cross-val results for {exp}; skipping") + continue + m_slides = sf.util.get_slides_from_model_manifest(m, dataset=None) + df = pd.concat([df, pd.DataFrame([{ + 'id': exp, + 'n_slides': len(m_slides), + 'fold': i+1, + 'uq': 'none', + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + 'tile_auc': results['tile_auc'], + 'tile_ap': results['tile_ap'], + }])], axis=0, join='outer', ignore_index=True) + + # === Add UQ Crossval results (non-thresholded) =========================== + for exp in exp_to_run: + try: + skip = False + models = utils.find_cv(P, f'EXP_{exp}_UQ', outcome=self.outcome) + except MatchError: + continue + all_pred_thresh = [] + for i, m in enumerate(models): + try: + results = utils.get_model_results(m, outcome=self.outcome, epoch=1) + all_pred_thresh += [results['opt_thresh']] + df = pd.concat([df, pd.DataFrame([{ + 'id': exp, + 'n_slides': len(sf.util.get_slides_from_model_manifest(m, dataset=None)), + 'fold': i+1, + 'uq': 'all', + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + 'tile_auc': results['tile_auc'], + 'tile_ap': results['tile_ap'], + }])], axis=0, join='outer', ignore_index=True) + except FileNotFoundError: + log.debug(f"Skipping UQ crossval (non-thresholded) results for {exp}; not found") + skip = True + break + if not skip: + prediction_thresholds[exp] = mean(all_pred_thresh) + + # === Get & Apply Nested UQ Threshold ===================================== + if uq: + pb = tqdm(exp_to_run) + for exp in pb: + # Skip UQ for experiments with n_slides < 100 + if exp in ('V', 'W', 'X', 'Y', 'Z', 'ZA', 'ZB', 'ZC', 'ZD'): + continue + pb.set_description(f"Calculating thresholds (exp {exp})...") + try: + _df, thresh = self.thresholds_from_nested_cv( + f'EXP_{exp}_UQ', id=exp + ) + df = pd.concat([df, _df], axis=0, join='outer', ignore_index=True) + except (MatchError, FileNotFoundError, ModelNotFoundError) as e: + log.debug(str(e)) + log.debug(f"Skipping UQ crossval results for {exp}; not found") + continue + except ThresholdError as e: + log.debug(str(e)) + log.debug(f'Skipping UQ crossval results for {exp}; could not find thresholds in cross-validation') + continue + + tile_uq_thresholds[exp] = thresh['tile_uq'] + slide_uq_thresholds[exp] = thresh['slide_uq'] + pred_uq_thresholds[exp] = thresh['slide_pred'] + # Show CV uncertainty calibration + if plot and exp == 'AA': + print("Plotting UQ calibration for cross-validation (exp. AA)") + self.plot_uq_calibration( + label=f'EXP_{exp}_UQ', + **thresh + ) + plt.show() + + # === Show external validation results ==================================== + if eval: + # --- Step 7A: Show non-UQ external validation results ---------------- + for val_P in eval_Ps: + name = val_P.name + pb = tqdm(exp_to_run, ncols=80) + for exp in pb: + pb.set_description(f'Working on {name} eval (EXP {exp})...') + + # Read and prepare model results + try: + eval_dir = utils.find_eval(val_P, f'EXP_{exp}_FULL', outcome=self.outcome) + results = utils.get_eval_results(eval_dir, outcome=self.outcome) + except (FileNotFoundError, MatchError): + log.debug(f"Skipping eval for exp {exp}; eval not found") + continue + if not utils.model_exists(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1): + log.debug(f'Skipping eval for exp {exp}; trained model not found') + continue + if exp not in prediction_thresholds: + log.warn(f"No predictions threshold for experiment {exp}; using slide-level pred threshold of 0.5") + pred_thresh = 0.5 + else: + pred_thresh = prediction_thresholds[exp] + + # Patient-level and slide-level predictions & metrics + patient_yt, patient_yp = utils.read_group_predictions( + join( + eval_dir, + f'patient_predictions_{self.outcome}_eval.csv' + ) + ) + patient_metrics = utils.prediction_metrics( + patient_yt, + patient_yp, + threshold=pred_thresh + ) + patient_metrics = { + f'patient_{m}': patient_metrics[m] + for m in patient_metrics + } + slide_yt, slide_yp = utils.read_group_predictions( + join( + eval_dir, + f'patient_predictions_{self.outcome}_eval.csv' + ) + ) + slide_metrics = utils.prediction_metrics( + slide_yt, + slide_yp, + threshold=pred_thresh + ) + slide_metrics = { + f'slide_{m}': slide_metrics[m] + for m in slide_metrics + } + model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + n_slides = len(sf.util.get_slides_from_model_manifest(model, dataset=None)) + eval_dfs[name] = pd.concat([eval_dfs[name], pd.DataFrame([{ + 'id': exp, + 'n_slides': n_slides, + 'uq': 'none', + 'incl': 1, + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + **patient_metrics, + **slide_metrics, + }])], axis=0, join='outer', ignore_index=True) + + # --- [end patient-level predictions] ------------------------- + + if exp not in prediction_thresholds: + log.debug(f"Unable to calculate eval UQ performance; no prediction thresholds found for exp {exp}") + continue + + # --- Step 7B: Show UQ external validation results ------------ + if uq: + if exp in tile_uq_thresholds: + for keep in ('high_confidence', 'low_confidence'): + tile_pred_df = pd.read_csv( + join( + eval_dir, + 'tile_predictions_eval.csv' + ), dtype={'slide': str} + ) + new_cols = { + f'{self.outcome}_y_pred1': 'y_pred', + f'{self.outcome}_y_true0': 'y_true', + f'{self.outcome}_uncertainty1': 'uncertainty' + } + tile_pred_df.rename(columns=new_cols, inplace=True) + thresh_tile = tile_uq_thresholds[exp] + thresh_slide = slide_uq_thresholds[exp] + + val_patients = val_P.dataset(verification=None).patients() + + def get_metrics_by_level(level): + return threshold.apply( + tile_pred_df, + tile_uq=thresh_tile, + slide_uq=thresh_slide, + tile_pred=0.5, + slide_pred=pred_uq_thresholds[exp], + plot=(plot and level == 'slide' and keep == 'high_confidence' and exp == 'AA'), + title=f'{name}: Exp. {exp} Uncertainty', + keep=keep, # Keeps only LOW or HIGH-confidence slide predictions + patients=val_patients, + level=level + ) + + s_results, _ = get_metrics_by_level('slide') + p_results, _ = get_metrics_by_level('patient') + if (plot and keep == 'high_confidence' and exp == 'AA'): + plt.savefig(join(self.outdir, f'{name}_uncertainty_v_preds.svg')) + + full_model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + n_slides = len(sf.util.get_slides_from_model_manifest(full_model, dataset=None)) + eval_dfs[name] = pd.concat([eval_dfs[name], pd.DataFrame([{ + 'id': exp, + 'n_slides': n_slides, + 'uq': ('include' if keep == 'high_confidence' else 'exclude'), + 'slide_incl': s_results['percent_incl'], + 'slide_auc': s_results['auc'], + 'slide_acc': s_results['acc'], + 'slide_sens': s_results['sensitivity'], + 'slide_spec': s_results['specificity'], + 'slide_youden': s_results['sensitivity'] + s_results['specificity'] - 1, + 'patient_incl': p_results['percent_incl'], + 'patient_auc': p_results['auc'], + 'patient_acc': p_results['acc'], + 'patient_sens': p_results['sensitivity'], + 'patient_spec': p_results['specificity'], + 'patient_youden': p_results['sensitivity'] + p_results['specificity'] - 1, + }])], axis=0, join='outer', ignore_index=True) + for eval_name in eval_dfs: + eval_dfs[eval_name].to_csv( + join(self.outdir, f'{eval_name}_results.csv'), + index=False + ) + else: + eval_dfs = None + df.to_csv(join(self.outdir, 'crossval_results.csv'), index=False) + return df, eval_dfs + + def run(self, exp_to_run, steps=None, hp='nature2022'): + """Trains the designated experiments. + + Args: + exp_to_run (dict): Dict containing experiment configuration, + as provided by config(). + steps (list(int)): Steps to run. Defaults to all steps, 1-6. + hp (slideflow.ModelParams, optional): Hyperparameters object. + Defaults to hyperparameters used for publication. + + Returns: + None + """ + + # === Initialize projects & prepare experiments =========================== + print(sf.util.bold("Initializing experiments...")) + P = self.train_project + eval_Ps = self.eval_projects + exp_annotations = join(P.root, 'experiments.csv') + if P.annotations != exp_annotations: + if not exists(exp_annotations): + shutil.copy(P.annotations, exp_annotations) + P.annotations = exp_annotations + exp_to_add = [ + e for e in exp_to_run + if f'include_{e}' not in pd.read_csv(exp_annotations).columns.tolist() + ] + for exp in exp_to_add: + self.add(exp_annotations, label=exp, **exp_to_run[exp]) + + full_epoch_exp = [e for e in exp_to_run if e in ('AA', 'A', 'D', 'G')] + + if hp == 'nature2022': + exp_hp = biscuit_hp.nature2022() + else: + exp_hp = hp + + # Configure steps to run + if steps is None: + steps = range(7) + + # === Step 1: Initialize full-epochs experiments ========================== + if 1 in steps: + print(sf.util.bold("[Step 1] Running full-epoch experiments...")) + exp_hp.epochs = [1, 3, 5, 10] + for exp in full_epoch_exp: + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 1 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 1] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}', + filters={f'include_{exp}': ['include']}, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 2: Run the rest of the experiments at the designated epoch ===== + if 2 in steps: + print(sf.util.bold("[Step 2] Running experiments at target epoch...")) + exp_hp.epochs = [1] + for exp in exp_to_run: + if exp in full_epoch_exp: + continue # Already done in Step 2 + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 2 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 2] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}', + filters={f'include_{exp}': ['include']}, + save_predictions=True, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 3: Run experiments with UQ & save predictions ================== + if 3 in steps: + print(sf.util.bold("[Step 3] Running experiments with UQ...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}_UQ', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 3 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 3] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}_UQ', + filters={f'include_{exp}': ['include']}, + save_predictions=True, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 4: Run nested UQ cross-validation ============================== + if 4 in steps: + print(sf.util.bold("[Step 4] Running nested UQ experiments...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + total_slides = exp_to_run[exp]['out2'] + exp_to_run[exp]['out1'] + if total_slides >= 50: + self.train_nested_cv( + hp=exp_hp, + label=f'EXP_{exp}_UQ', + val_strategy='k-fold' + ) + else: + print(f"[Step 4] Skipping UQ for {exp}, need >=50 slides") + + # === Step 5: Train models across full datasets =========================== + if 5 in steps: + print(sf.util.bold("[Step 5] Training across full datasets...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + if utils.model_exists(P, f'EXP_{exp}_FULL', outcome=self.outcome): + print(f'Skipping Step 5 for experiment {exp}; already done.') + else: + stop_batch = utils.find_cv_early_stop(P, f'EXP_{exp}', outcome=self.outcome, k=3) + print(f"Using detected early stop batch {stop_batch}") + self.train( + hp=exp_hp, + label=f'EXP_{exp}_FULL', + filters={f'include_{exp}': ['include']}, + save_model=True, + val_strategy='none', + steps_per_epoch_override=stop_batch + ) + + # === Step 6: External validation ======================================== + if 6 in steps: + for val_P in eval_Ps: + print(sf.util.bold(f"[Step 6] Running eval ({val_P.name})...")) + for exp in exp_to_run: + full_model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + if utils.eval_exists(val_P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1): + print(f'Skipping eval for experiment {exp}; already done.') + else: + filters = {self.outcome: [self.outcome1, self.outcome2]} + val_P.evaluate( + full_model, + self.outcome, + filters=filters, + save_predictions=True, + ) + + def thresholds_from_nested_cv(self, label, outer_k=3, inner_k=5, id=None, + threshold_params=None, epoch=1, + tile_filename='tile_predictions_val_epoch1.csv', + y_true=None, y_pred=None, uncertainty=None): + """Detects tile- and slide-level UQ thresholds and slide-level prediction + thresholds from nested cross-validation.""" + + if id is None: + id = label + patients = self.train_project.dataset(verification=None).patients() + if threshold_params is None: + threshold_params = { + 'tile_pred': 'detect', + 'slide_pred': 'detect', + 'plot': False, + 'patients': patients + } + all_tile_uq_thresh = [] + all_slide_uq_thresh = [] + all_slide_pred_thresh = [] + df = pd.DataFrame() + for k in range(1, outer_k+1): + + try: + dfs = utils.df_from_cv( + self.train_project, + f'{label}-k{k}', + outcome=self.outcome, + k=inner_k, + y_true=y_true, + y_pred=y_pred, + uncertainty=uncertainty) + except ModelNotFoundError: + log.warn(f"Could not find {label} k-fold {k}; skipping") + continue + + val_path = join( + utils.find_model(self.train_project, f'{label}', kfold=k, outcome=self.outcome), + tile_filename + ) + if not exists(val_path): + log.warn(f"Could not find {label} k-fold {k}; skipping") + continue + tile_uq = threshold.from_cv( + dfs, + tile_uq='detect', + slide_uq=None, + **threshold_params + )['tile_uq'] + thresholds = threshold.from_cv( + dfs, + tile_uq=tile_uq, + slide_uq='detect', + **threshold_params + ) + all_tile_uq_thresh += [tile_uq] + all_slide_uq_thresh += [thresholds['slide_uq']] + all_slide_pred_thresh += [thresholds['slide_pred']] + if sf.util.path_to_ext(val_path).lower() == 'csv': + tile_pred_df = pd.read_csv(val_path, dtype={'slide': str}) + elif sf.util.path_to_ext(val_path).lower() in ('parquet', 'gzip'): + tile_pred_df = pd.read_parquet(val_path) + else: + raise OSError(f"Unrecognized prediction filetype {val_path}") + utils.rename_cols(tile_pred_df, self.outcome, y_true=y_true, y_pred=y_pred, uncertainty=uncertainty) + + def uq_auc_by_level(level): + results, _ = threshold.apply( + tile_pred_df, + plot=False, + patients=patients, + level=level, + **thresholds + ) + return results['auc'], results['percent_incl'] + + pt_auc, pt_perc = uq_auc_by_level('patient') + slide_auc, slide_perc = uq_auc_by_level('slide') + model = utils.find_model( + self.train_project, + f'{label}', + kfold=k, + epoch=1, + outcome=self.outcome + ) + m_slides = sf.util.get_slides_from_model_manifest(model, dataset=None) + df = pd.concat([df, pd.DataFrame([{ + 'id': id, + 'n_slides': len(m_slides), + 'fold': k, + 'uq': 'include', + 'patient_auc': pt_auc, + 'patient_uq_perc': pt_perc, + 'slide_auc': slide_auc, + 'slide_uq_perc': slide_perc + }])], axis=0, join='outer', ignore_index=True) + + thresholds = { + 'tile_uq': None if not all_tile_uq_thresh else mean(all_tile_uq_thresh), + 'slide_uq': None if not all_slide_uq_thresh else mean(all_slide_uq_thresh), + 'slide_pred': None if not all_slide_pred_thresh else mean(all_slide_pred_thresh), + } + return df, thresholds + + def train(self, hp, label, filters=None, save_predictions='csv', + validate_on_batch=32, validation_steps=32, **kwargs): + r"""Train outer cross-validation models. + + Args: + hp (:class:`slideflow.ModelParams`): Hyperparameters object. + label (str): Experimental label. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + save_predictions (bool, optional): Save validation predictions to + model folder. Defaults to 'csv'. + + Keyword args: + validate_on_batch (int): Frequency of validation checks during + training, in steps. Defaults to 32. + validation_steps (int): Number of validation steps to perform + during each mid-training evaluation check. Defaults to 32. + **kwargs: All remaining keyword arguments are passed to + :meth:`slideflow.Project.train`. + + Returns: + None + """ + self.train_project.train( + self.outcome, + exp_label=label, + filters=filters, + params=hp, + save_predictions=save_predictions, + validate_on_batch=validate_on_batch, + validation_steps=validation_steps, + **kwargs + ) + + def train_nested_cv(self, hp, label, outer_k=3, inner_k=5, **kwargs): + r"""Train models using nested cross-validation (outer_k=3, inner_k=5), + skipping already-generated models. + + Args: + hp (slideflow.ModelParams): Hyperparameters object. + label (str): Experimental label. + + Keyword args: + outer_k (int): Number of outer cross-folds. Defaults to 3. + inner_k (int): Number of inner cross-folds. Defaults to 5. + **kwargs: All remaining keyword arguments are passed to + :meth:`slideflow.Project.train`. + + Returns: + None + """ + k_models = utils.find_cv(self.train_project, label, k=outer_k, outcome=self.outcome) + for ki, k_model in enumerate(k_models): + inner_k_to_run = [ + k for k in range(1, inner_k+1) + if not utils.model_exists(self.train_project, f'{label}-k{ki+1}', outcome=self.outcome, kfold=k) + ] + if not len(inner_k_to_run): + print(f'Skipping nested cross-val (inner k{ki+1} for experiment ' + f'{label}; already done.') + else: + if inner_k_to_run != list(range(1, inner_k+1)): + print(f'Only running k-folds {inner_k_to_run} for nested ' + f'cross-val k{ki+1} in experiment {label}; ' + 'some k-folds already done.') + train_slides = sf.util.get_slides_from_model_manifest( + k_model, dataset='training' + ) + self.train( + hp=hp, + label=f"{label}-k{ki+1}", + filters={'slide': train_slides}, + val_k_fold=inner_k, + val_k=inner_k_to_run, + save_predictions=True, + save_model=False, + **kwargs + )
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/biscuit/hp/index.html b/docs/_modules/slideflow/biscuit/hp/index.html new file mode 100644 index 000000000..b5219dca4 --- /dev/null +++ b/docs/_modules/slideflow/biscuit/hp/index.html @@ -0,0 +1,449 @@ + + + + + + + + + + + + slideflow.biscuit.hp — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.biscuit.hp

+"""Hyperparameters associated with the published manuscript."""
+
+import slideflow as sf
+
+
+
[docs]def nature2022(): + """Hyperparameters used in the associated manuscript. + + Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D. et al. + Uncertainty-informed deep learning models enable high-confidence + predictions for digital histopathology. Nat Commun 13, 6572 (2022). + https://doi.org/10.1038/s41467-022-34025-x + + Returns: + ``sf.ModelParams`` + + """ + if sf.backend() == 'tensorflow': + loss = 'sparse_categorical_crossentropy' + else: + loss = 'CrossEntropy' + return sf.ModelParams( + model='xception', + tile_px=299, + tile_um=302, + batch_size=128, + epochs=[1], # epochs 1, 3, 5, 10 used for initial sweep + early_stop=True, + early_stop_method='accuracy', + dropout=0.1, + uq=False, # to be enabled in separate sub-experiments + hidden_layer_width=1024, + optimizer='Adam', + learning_rate=0.0001, + learning_rate_decay_steps=512, + learning_rate_decay=0.98, + loss=loss, + normalizer='reinhard_fast', + include_top=False, + hidden_layers=2, + pooling='avg', + augment='xyrjb' + )
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/biscuit/threshold/index.html b/docs/_modules/slideflow/biscuit/threshold/index.html new file mode 100644 index 000000000..99c2dcf26 --- /dev/null +++ b/docs/_modules/slideflow/biscuit/threshold/index.html @@ -0,0 +1,967 @@ + + + + + + + + + + + + slideflow.biscuit.threshold — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.biscuit.threshold

+import warnings
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+import seaborn as sns
+
+from sklearn import metrics
+from sklearn.exceptions import UndefinedMetricWarning
+from slideflow.util import log
+
+from . import errors, utils
+
+
+
[docs]def plot_uncertainty(df, kind, threshold=None, title=None): + """Plots figure of tile or slide-level predictions vs. uncertainty. + + Args: + df (pandas.DataFrame): Processed dataframe containing columns + 'uncertainty', 'correct', 'y_pred'. + kind (str): Kind of plot. If 'tile', subsample to only 1000 points. + Included in title. + threshold (float, optional): Uncertainty threshold. + Defaults to None. + title (str, optional): Title for plots. Defaults to None. + + Returns: + None + """ + try: + from skmisc.loess import loess + except ImportError: + raise ImportError( + "Uncertainty plots with loess estimation require scikit-misc, " + "which is not installed." + ) + + # Subsample tile-level predictions + if kind == 'tile': + df = df.sample(n=1000) + + f, axes = plt.subplots(1, 3) + f.set_size_inches(15, 5) + palette = sns.color_palette("Set2") + tf_pal = {True: palette[0], False: palette[1]} + + # Left figure - KDE ------------------------------------------------------- + kde = sns.kdeplot( + x='uncertainty', + hue='correct', + data=df, + fill=True, + palette=tf_pal, + ax=axes[0] + ) + kde.set(xlabel='Uncertainty') + axes[0].title.set_text(f'Uncertainty density ({kind}-level)') + + # Middle figure - Scatter ------------------------------------------------- + + # - Above threshold + if threshold is not None: + axes[1].axhline(y=threshold, color='r', linestyle='--') + at_df = df.loc[(df['uncertainty'] >= threshold)] + c_a_df = at_df.loc[at_df['correct']] + ic_a_df = at_df.loc[~at_df['correct']] + axes[1].scatter( + x=c_a_df['y_pred'], + y=c_a_df['uncertainty'], + marker='o', + s=10, + color='gray' + ) + axes[1].scatter( + x=ic_a_df['y_pred'], + y=ic_a_df['uncertainty'], + marker='x', + color='#FC6D77' + ) + # - Below threshold + if threshold is not None: + bt_df = df.loc[(df['uncertainty'] < threshold)] + else: + bt_df = df + c_df = bt_df.loc[bt_df['correct']] + ic_df = bt_df.loc[~bt_df['correct']] + axes[1].scatter( + x=c_df['y_pred'], + y=c_df['uncertainty'], + marker='o', + s=10 + ) + axes[1].scatter( + x=ic_df['y_pred'], + y=ic_df['uncertainty'], + marker='x', + color='red' + ) + if title is not None: + axes[1].title.set_text(title) + + # Right figure - probability calibration ---------------------------------- + l_df = df[['uncertainty', 'correct']].sort_values(by=['uncertainty']) + x = l_df['uncertainty'].to_numpy() + y = l_df['correct'].astype(float).to_numpy() + ol = loess(x, y) + ol.fit() + pred = ol.predict(x, stderror=True) + conf = pred.confidence() + z = pred.values + ll = conf.lower + ul = conf.upper + axes[2].plot(x, y, '+', ms=6) + axes[2].plot(x, z) + axes[2].fill_between(x, ll, ul, alpha=.2) + axes[2].tick_params(labelrotation=90) + axes[2].set_ylim(-0.1, 1.1) + if threshold is not None: + axes[2].axvline(x=threshold, color='r', linestyle='--') + + # - Figure style + for ax in (axes[1], axes[2]): + ax.spines['bottom'].set_linewidth(0.5) + ax.spines['bottom'].set_color('black') + ax.tick_params(axis='x', colors='black') + ax.grid(visible=True, which='both', axis='both', color='white') + ax.set_facecolor('#EAEAF2')
+ + +
[docs]def process_tile_predictions(df, pred_thresh=0.5, patients=None): + '''Load and process tile-level predictions from CSV. + + Args: + df (pandas.DataFrame): Unprocessed DataFrame from reading tile-level + predictions. + pred_thresh (float or str, optional): Tile-level prediction threshold. + If 'detect', will auto-detect via Youden's J. Defaults to 0.5. + patients (dict, optional): Dict mapping slides to patients, used for + patient-level thresholding. Defaults to None. + + Returns: + pandas.DataFrame, tile prediction threshold + ''' + + # Tile-level AUC + if np.isnan(df['y_pred'].to_numpy()).sum(): + raise errors.PredsContainNaNError + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + fpr, tpr, thresh = metrics.roc_curve( + df['y_true'].to_numpy(), + df['y_pred'].to_numpy() + ) + tile_auc = metrics.auc(fpr, tpr) + try: + max_j = max(zip(tpr, fpr), key=lambda x: x[0]-x[1]) + opt_pred = thresh[list(zip(tpr, fpr)).index(max_j)] + except ValueError: + log.debug("Unable to calculate tile prediction threshold; using 0.5") + opt_pred = 0.5 + + if pred_thresh == 'detect': + log.debug(f"Auto-detected tile prediction threshold: {opt_pred:.4f}") + pred_thresh = opt_pred + else: + log.debug(f"Using tile prediction threshold: {pred_thresh:.4f}") + + if patients is not None: + df['patient'] = df['slide'].map(patients) + else: + log.warn('Patients not provided; assuming 1:1 slide:patient mapping') + + log.debug(f'Tile AUC: {tile_auc:.4f}') + # Calculate tile-level prediction accuracy + df['error'] = abs(df['y_true'] - df['y_pred']) + df['correct'] = ( + ((df['y_pred'] < pred_thresh) & (df['y_true'] == 0)) + | ((df['y_pred'] >= pred_thresh) & (df['y_true'] == 1)) + ) + df['incorrect'] = (~df['correct']).astype(int) + df['y_pred_bin'] = (df['y_pred'] >= pred_thresh).astype(int) + return df, pred_thresh
+ + +
[docs]def process_group_predictions(df, pred_thresh, level): + '''From a given dataframe of tile-level predictions, calculate group-level + predictions and uncertainty.''' + + if any(c not in df.columns for c in ('y_true', 'y_pred', 'uncertainty')): + raise ValueError('Missing columns. Expected y_true, y_pred, uncertainty.' + f'Got: {df.columns}') + + # Calculate group-level predictions + log.debug(f'Calculating {level}-level means from {len(df)} predictions') + levels = [l for l in pd.unique(df[level]) if l is not np.nan] + reduced_df = df[[level, 'y_pred', 'y_true', 'uncertainty']] + grouped = reduced_df.groupby(level, as_index=False).mean() + yp = np.array([ + grouped.loc[grouped[level] == lev]['y_pred'].to_numpy()[0] + for lev in levels + ]) + yt = np.array([ + grouped.loc[grouped[level] == lev]['y_true'].to_numpy()[0] + for lev in levels + ], dtype=np.uint8) + u = np.array([ + grouped.loc[grouped[level] == lev]['uncertainty'].to_numpy()[0] + for lev in levels + ]) + if not len(yt): + raise errors.ROCFailedError("Unable to generate ROC; preds are empty.") + + # Slide-level AUC + log.debug(f'Calculating {level}-level ROC') + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + l_fpr, l_tpr, l_thresh = metrics.roc_curve(yt, yp) + log.debug('Calculating AUC') + level_auc = metrics.auc(l_fpr, l_tpr) + log.debug('Calculating optimal threshold') + + if pred_thresh == 'detect': + try: + max_j = max(zip(l_tpr, l_fpr), key=lambda x: x[0]-x[1]) + pred_thresh = l_thresh[list(zip(l_tpr, l_fpr)).index(max_j)] + except ValueError: + raise errors.ROCFailedError(f"Unable to generate {level}-level ROC") + log.debug(f"Using detected prediction threshold: {pred_thresh:.4f}") + else: + log.debug(f"Using {level} prediction threshold: {pred_thresh:.4f}") + log.debug(f'{level} AUC: {level_auc:.4f}') + + correct = (((yp < pred_thresh) & (yt == 0)) + | ((yp >= pred_thresh) & (yt == 1))) + incorrect = pd.Series( + ((yp < pred_thresh) & (yt == 1)) + | ((yp >= pred_thresh) & (yt == 0)) + ).astype(int) + + l_df = pd.DataFrame({ + level: pd.Series(levels), + 'error': pd.Series(abs(yt - yp)), + 'uncertainty': pd.Series(u), + 'correct': correct, + 'incorrect': incorrect, + 'y_true': pd.Series(yt), + 'y_pred': pd.Series(yp), + 'y_pred_bin': pd.Series(yp >= pred_thresh).astype(int) + }) + return l_df, pred_thresh
+ + +
[docs]def apply(df, tile_uq, slide_uq, tile_pred=0.5, + slide_pred=0.5, plot=False, keep='high_confidence', + title=None, patients=None, level='slide'): + + '''Apply pre-calculcated tile- and group-level uncertainty thresholds. + + Args: + df (pandas.DataFrame): Must contain columns 'y_true', 'y_pred', + and 'uncertainty'. + tile_uq (float): Tile-level uncertainty threshold. + slide_uq (float): Slide-level uncertainty threshold. + tile_pred (float, optional): Tile-level prediction threshold. + Defaults to 0.5. + slide_pred (float, optional): Slide-level prediction threshold. + Defaults to 0.5. + plot (bool, optional): Plot slide-level uncertainty. Defaults to False. + keep (str, optional): Either 'high_confidence' or 'low_confidence'. + Cohort to keep after thresholding. Defaults to 'high_confidence'. + title (str, optional): Title for uncertainty plot. Defaults to None. + patients (dict, optional): Dictionary mapping slides to patients. Adds + a 'patient' column in the tile prediction dataframe, enabling + patient-level thresholding. Defaults to None. + level (str, optional): Either 'slide' or 'patient'. Level at which to + apply threshold. If 'patient', requires patient dict be supplied. + Defaults to 'slide'. + + Returns: + Dictionary of results, with keys auc, percent_incl, accuracy, + sensitivity, and specificity + + DataFrame of thresholded group-level predictions + ''' + + assert keep in ('high_confidence', 'low_confidence') + assert not (level == 'patient' and patients is None) + + log.debug(f"Applying tile UQ threshold of {tile_uq:.5f}") + if patients: + df['patient'] = df['slide'].map(patients) + log.debug(f"Number of {level}s before tile UQ filter: {pd.unique(df[level]).shape[0]}") + log.debug(f"Number of tiles before tile-level filter: {len(df)}") + + df, _ = process_tile_predictions( + df, + pred_thresh=tile_pred, + patients=patients + ) + num_pre_filter = pd.unique(df[level]).shape[0] + + if tile_uq: + df = df[df['uncertainty'] < tile_uq] + + log.debug(f"Number of {level}s after tile-level filter: {pd.unique(df[level]).shape[0]}") + log.debug(f"Number of tiles after tile-level filter: {len(df)}") + + # Build group-level predictions + try: + s_df, _ = process_group_predictions( + df, + pred_thresh=slide_pred, + level=level + ) + except errors.ROCFailedError: + log.error("Unable to process slide predictions") + empty_results = {k: None for k in ['auc', + 'percent_incl', + 'acc', + 'sensitivity', + 'specificity']} + return empty_results, None + + if plot: + plot_uncertainty(s_df, threshold=slide_uq, kind=level, title=title) + + # Apply slide-level thresholds + if slide_uq: + log.debug(f"Using {level} uncertainty threshold of {slide_uq:.5f}") + if keep == 'high_confidence': + s_df = s_df.loc[s_df['uncertainty'] < slide_uq] + elif keep == 'low_confidence': + s_df = s_df.loc[s_df['uncertainty'] >= slide_uq] + else: + raise Exception(f"Unknown keep option {keep}") + + # Show post-filtering group-level predictions and AUC + auc = utils.auc(s_df['y_true'].to_numpy(), s_df['y_pred'].to_numpy()) + num_post_filter = len(s_df) + percent_incl = num_post_filter / num_pre_filter + log.debug(f"Percent {level} included: {percent_incl*100:.2f}%") + + # Calculate post-thresholded sensitivity/specificity + y_true = s_df['y_true'].to_numpy().astype(bool) + y_pred = s_df['y_pred'].to_numpy() > slide_pred + + tp = np.logical_and(y_true, y_pred).sum() + fp = np.logical_and(np.logical_not(y_true), y_pred).sum() + tn = np.logical_and(np.logical_not(y_true), np.logical_not(y_pred)).sum() + fn = np.logical_and(y_true, np.logical_not(y_pred)).sum() + acc = (tp + tn) / (tp + tn + fp + fn) + sensitivity = tp / (tp + fn) + specificity = tn / (tn + fp) + + log.debug(f"Accuracy: {acc:.4f}") + log.debug(f"Sensitivity: {sensitivity:.4f}") + log.debug(f"Specificity: {specificity:.4f}") + + results = { + 'auc': auc, + 'percent_incl': percent_incl, + 'acc': acc, + 'sensitivity': sensitivity, + 'specificity': specificity + } + return results, s_df
+ + +
[docs]def detect(df, tile_uq='detect', slide_uq='detect', tile_pred='detect', + slide_pred='detect', plot=False, patients=None): + '''Detect optimal tile- and slide-level uncertainty thresholds. + + Args: + df (pandas.DataFrame): Tile-level predictions. Must contain columns + 'y_true', 'y_pred', and 'uncertainty'. + tile_uq (str or float): Either 'detect' or float. If 'detect', + will detect tile-level uncertainty threshold. If float, will use + the specified tile-level uncertainty threshold. + slide_uq (str or float): Either 'detect' or float. If 'detect', + will detect slide-level uncertainty threshold. If float, will use + the specified slide-level uncertainty threshold. + tile_pred (str or float): Either 'detect' or float. If 'detect', + will detect tile-level prediction threshold. If float, will use the + specified tile-level prediction threshold. + slide_pred (str or float): Either 'detect' or float. If 'detect' + will detect slide-level prediction threshold. If float, will use + the specified slide-level prediction threshold. + plot (bool, optional): Plot slide-level uncertainty. Defaults to False. + patients (dict, optional): Dict mapping slides to patients. Required + for patient-level thresholding. + + Returns: + Dictionary with tile- and slide-level UQ and prediction threhsolds, + with keys: 'tile_uq', 'tile_pred', 'slide_uq', 'slide_pred' + + Float: Slide-level AUROC + ''' + + log.debug("Detecting thresholds...") + empty_thresh = {k: None + for k in ['tile_uq', 'slide_uq', 'tile_pred', 'slide_pred']} + try: + df, detected_tile_pred = process_tile_predictions( + df, + pred_thresh=tile_pred, + patients=patients + ) + except errors.PredsContainNaNError: + log.error("Tile-level predictions contain NaNs; unable to process.") + return empty_thresh, None + + if tile_pred == 'detect': + tile_pred = detected_tile_pred + + # Tile-level ROC and Youden's J + if isinstance(tile_uq, (float, np.float16, np.float32, np.float64)): + df = df[df['uncertainty'] < tile_uq] + elif tile_uq != 'detect': + log.debug("Not performing tile-level uncertainty thresholding.") + tile_uq = None + else: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + t_fpr, t_tpr, t_thresh = metrics.roc_curve( + df['incorrect'].to_numpy(), + df['uncertainty'].to_numpy() + ) + max_j = max(zip(t_tpr, t_fpr), key=lambda x: x[0] - x[1]) + tile_uq = t_thresh[list(zip(t_tpr, t_fpr)).index(max_j)] + log.debug(f"Tile-level optimal UQ threshold: {tile_uq:.4f}") + df = df[df['uncertainty'] < tile_uq] + + slides = list(set(df['slide'])) + log.debug(f"Number of slides after filter: {len(slides)}") + log.debug(f"Number of tiles after filter: {len(df)}") + + # Build slide-level predictions + try: + s_df, slide_pred = process_group_predictions( + df, + pred_thresh=slide_pred, + level='slide' + ) + except errors.ROCFailedError: + log.error("Unable to process slide predictions") + return empty_thresh, None + + # Slide-level thresholding + if slide_uq == 'detect': + if not s_df['incorrect'].to_numpy().sum(): + log.debug("Unable to calculate slide UQ threshold; no incorrect predictions made") + slide_uq = None + else: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + s_fpr, s_tpr, s_thresh = metrics.roc_curve( + s_df['incorrect'], + s_df['uncertainty'].to_numpy() + ) + max_j = max(zip(s_tpr, s_fpr), key=lambda x: x[0]-x[1]) + slide_uq = s_thresh[list(zip(s_tpr, s_fpr)).index(max_j)] + log.debug(f"Slide-level optimal UQ threshold: {slide_uq:.4f}") + if plot: + plot_uncertainty(s_df, threshold=slide_uq, kind='slide') + s_df = s_df[s_df['uncertainty'] < slide_uq] + else: + log.debug("Not performing slide-level uncertainty thresholding.") + slide_uq = 0.5 + if plot: + plot_uncertainty(s_df, threshold=slide_uq, kind='slide') + + # Show post-filtering slide predictions and AUC + auc = utils.auc(s_df['y_true'].to_numpy(), s_df['y_pred'].to_numpy()) + thresholds = { + 'tile_uq': tile_uq, + 'slide_uq': slide_uq, + 'tile_pred': tile_pred, + 'slide_pred': slide_pred + } + return thresholds, auc
+ + +
[docs]def from_cv(dfs, **kwargs): + '''Finds the optimal tile and slide-level thresholds from a set of nested + cross-validation experiments. + + Args: + dfs (list(DataFrame)): List of DataFrames with tile predictions, + containing headers 'y_true', 'y_pred', 'uncertainty', 'slide', + and 'patient'. + + Keyword args: + tile_uq (str or float): Either 'detect' or float. If 'detect', + will detect tile-level uncertainty threshold. If float, will use + the specified tile-level uncertainty threshold. + slide_uq (str or float): Either 'detect' or float. If 'detect', + will detect slide-level uncertainty threshold. If float, will use + the specified slide-level uncertainty threshold. + tile_pred (str or float): Either 'detect' or float. If 'detect', + will detect tile-level prediction threshold. If float, will use the + specified tile-level prediction threshold. + slide_pred (str or float): Either 'detect' or float. If 'detect' + will detect slide-level prediction threshold. If float, will use + the specified slide-level prediction threshold. + plot (bool, optional): Plot slide-level uncertainty. Defaults to False. + patients (dict, optional): Dict mapping slides to patients. Required + for patient-level thresholding. + + Returns: + Dictionary with tile- and slide-level UQ and prediction threhsolds, + with keys: 'tile_uq', 'tile_pred', 'slide_uq', 'slide_pred' + ''' + + required_cols = ('y_true', 'y_pred', 'uncertainty', 'slide', 'patient') + k_tile_thresh, k_slide_thresh = [], [] + k_tile_pred_thresh, k_slide_pred_thresh = [], [] + k_auc = [] + skip_tile = ('tile_uq_thresh' in kwargs + and kwargs['tile_uq_thresh'] is None) + skip_slide = ('slide_uq_thresh' in kwargs + and kwargs['slide_uq_thresh'] is None) + + for idx, df in enumerate(dfs): + log.debug(f"Detecting thresholds from fold {idx}") + if not all(col in df.columns for col in required_cols): + raise ValueError( + f"DataFrame missing columns, expected {required_cols}, got: " + f"{', '.join(df.columns.tolist())}" + ) + thresholds, auc = detect(df, **kwargs) + if thresholds['tile_uq'] is None or thresholds['slide_uq'] is None: + log.debug(f"Skipping CV #{idx}, unable to detect threshold") + continue + + k_tile_pred_thresh += [thresholds['slide_pred']] + k_slide_pred_thresh += [thresholds['tile_pred']] + k_auc += [auc] + + if not skip_tile: + k_tile_thresh += [thresholds['tile_uq']] + if not skip_slide: + k_slide_thresh += [thresholds['slide_uq']] + + if not skip_tile and not len(k_tile_thresh): + raise errors.ThresholdError('Unable to detect tile UQ threshold.') + if not skip_slide and not len(k_slide_thresh): + raise errors.ThresholdError('Unable to detect slide UQ threshold.') + + k_slide_pred_thresh = np.mean(k_slide_pred_thresh) + k_tile_pred_thresh = np.mean(k_tile_pred_thresh) + + if not skip_tile: + k_tile_thresh = np.min(k_tile_thresh) + if not skip_slide: + k_slide_thresh = np.max(k_slide_thresh) + + return { + 'tile_uq': k_tile_thresh, + 'slide_uq': k_slide_thresh, + 'tile_pred': k_tile_pred_thresh, + 'slide_pred': k_slide_pred_thresh + }
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/biscuit/utils/index.html b/docs/_modules/slideflow/biscuit/utils/index.html new file mode 100644 index 000000000..be9c4318b --- /dev/null +++ b/docs/_modules/slideflow/biscuit/utils/index.html @@ -0,0 +1,910 @@ + + + + + + + + + + + + slideflow.biscuit.utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.biscuit.utils

+import os
+from os.path import join
+from statistics import mean, variance
+
+import warnings
+import matplotlib.colors as colors
+import numpy as np
+import pandas as pd
+import slideflow as sf
+from scipy import stats
+from sklearn import metrics
+from sklearn.exceptions import UndefinedMetricWarning
+
+from .delong import delong_roc_variance
+from .errors import ModelNotFoundError, MultipleModelsFoundError
+
+# -----------------------------------------------------------------------------
+
+def uncertainty_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'uncertainty1'
+
+
+def y_true_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'y_true0'
+
+
+def y_pred_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'y_pred1'
+
+
+def rename_cols(df, outcome, *, y_true=None, y_pred=None, uncertainty=None):
+    """Renames columns of dataframe, in place."""
+    # Support for using underscore or dashes
+    if y_true is None:
+        y_true = y_true_header(
+            outcome,
+            underscore=(y_true_header(outcome, underscore=True) in df.columns))
+        if y_true not in df.columns:
+            y_true = str(outcome) + '-y_true'
+    if y_pred is None:
+        y_pred = y_pred_header(
+            outcome,
+            underscore=(y_pred_header(outcome, underscore=True) in df.columns))
+    if uncertainty is None:
+        uncertainty = uncertainty_header(
+            outcome,
+            underscore=(uncertainty_header(outcome, underscore=True) in df.columns))
+    new_cols = {
+        y_true: 'y_true',
+        y_pred: 'y_pred',
+        uncertainty: 'uncertainty'
+    }
+    df.rename(columns=new_cols, inplace=True)
+
+# --- General utility functions -----------------------------------------------
+
+
[docs]def truncate_colormap(cmap, minval=0.0, maxval=1.0, n=100): + """Truncates matplotlib colormap.""" + + new_cmap = colors.LinearSegmentedColormap.from_list( + 'trunc({n},{a:.2f},{b:.2f})'.format(n=cmap.name, a=minval, b=maxval), + cmap(np.linspace(minval, maxval, n))) + return new_cmap
+ + +
[docs]def get_model_results(path, epoch, outcome): + """Reads results/metrics from a trained model. + + Args: + path (str): Path to model. + outcome (str): Outcome name. + + Returns: + Dict of results with the keys: pt_auc, pt_ap, slide_auc, slide_ap, + tile_auc, tile_ap, opt_thresh + """ + csv = pd.read_csv(join(path, 'results_log.csv')) + result_rows = {} + for i, row in csv.iterrows(): + try: + row_epoch = int(row['model_name'].split('epoch')[-1]) + except ValueError: + continue + result_rows.update({ + row_epoch: row + }) + if epoch not in result_rows: + raise ModelNotFoundError(f"Unable to find results for epoch {epoch}") + model_res = result_rows[epoch] + pt_ap = mean(eval(model_res['patient_ap'])[outcome]) + pt_auc = eval(model_res['patient_auc'])[outcome][0] + slide_ap = mean(eval(model_res['slide_ap'])[outcome]) + slide_auc = eval(model_res['slide_auc'])[outcome][0] + tile_ap = mean(eval(model_res['tile_ap'])[outcome]) + tile_auc = eval(model_res['tile_auc'])[outcome][0] + + pred_path = join( + path, + f'patient_predictions_{outcome}_val_epoch{epoch}.csv' + ) + if os.path.exists(pred_path): + _, opt_thresh = auc_and_threshold(*read_group_predictions(pred_path)) + else: + try: + parquet_path = join(path, 'patient_predictions_val_epoch1.parquet.gzip') + _, opt_thresh = auc_and_threshold(*read_group_predictions(parquet_path)) + except OSError: + opt_thresh = None + return { + 'pt_auc': pt_auc, + 'pt_ap': pt_ap, + 'slide_auc': slide_auc, + 'slide_ap': slide_ap, + 'tile_auc': tile_auc, + 'tile_ap': tile_ap, + 'opt_thresh': opt_thresh + }
+ + +
[docs]def get_eval_results(path, outcome): + """Reads results/metrics from a trained model. + + Args: + path (str): Path to model. + outcome (str): Outcome name. + + Returns: + Dict of results with the keys: pt_auc, pt_ap, slide_auc, slide_ap, + tile_auc, tile_ap, opt_thresh + """ + csv = pd.read_csv(join(path, 'results_log.csv')) + for i, row in csv.iterrows(): + model_res = row + pt_ap = mean(eval(model_res['patient_ap'])[outcome]) + pt_auc = eval(model_res['patient_auc'])[outcome][0] + slide_ap = mean(eval(model_res['slide_ap'])[outcome]) + slide_auc = eval(model_res['slide_auc'])[outcome][0] + tile_ap = mean(eval(model_res['tile_ap'])[outcome]) + tile_auc = eval(model_res['tile_auc'])[outcome][0] + + pred_path = join( + path, + f'patient_predictions_{outcome}_eval.csv' + ) + if os.path.exists(pred_path): + _, opt_thresh = auc_and_threshold(*read_group_predictions(pred_path)) + else: + try: + parquet_path = join(path, 'patient_predictions_eval.parquet.gzip') + _, opt_thresh = auc_and_threshold(*read_group_predictions(parquet_path)) + except OSError: + opt_thresh = None + return { + 'pt_auc': pt_auc, + 'pt_ap': pt_ap, + 'slide_auc': slide_auc, + 'slide_ap': slide_ap, + 'tile_auc': tile_auc, + 'tile_ap': tile_ap, + 'opt_thresh': opt_thresh + }
+ + +
[docs]def find_cv_early_stop(project, label, outcome, k=3): + """Detects early stop batch from cross-val trained models. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + k (int, optional): Number of k-fold iterations. Defaults to 3. + outcome (str): Outcome name. + + Returns: + int: Early stop batch. + """ + cv_folders = find_cv(project, label, k=k, outcome=outcome) + early_stop_batch = [] + for cv_folder in cv_folders: + csv = pd.read_csv(join(cv_folder, 'results_log.csv')) + model_res = next(csv.iterrows())[1] + if 'early_stop_batch' in model_res: + early_stop_batch += [model_res['early_stop_batch']] + if len(early_stop_batch) == len(cv_folders): + # Only returns early stop if it was triggered in all crossfolds + return round(mean(early_stop_batch)) + else: + return None
+ + +
[docs]def df_from_cv(project, label, outcome, epoch=None, k=3, y_true=None, + y_pred=None, uncertainty=None): + """Loads tile predictions from cross-fold models & renames columns. + + Args: + project (sf.Project): Slideflow project. + label (str): Experimental label. + epoch (int, optional): Epoch number of saved model. Defaults to None. + k (int, optional): K-fold iteration. Defaults to 3. + outcome (str, optional): Outcome name. + y_true (str, optional): Column name for ground truth labels. + Defaults to {outcome}_y_true0. + y_pred (str, optional): Column name for predictions. + Defaults to {outcome}_y_pred1. + uncertainty (str, optional): Column name for uncertainty. + Defaults to {outcome}_y_uncertainty1. + + Returns: + list(DataFrame): DataFrame for each k-fold. + """ + dfs = [] + model_folders = find_cv(project, label, epoch=epoch, k=k, outcome=outcome) + patients = project.dataset().patients() + e = '' if epoch is None else '../' + + for folder in model_folders: + csv_path = join(folder, f'{e}tile_predictions_val_epoch1.csv') + parquet_path = join(folder, f'{e}tile_predictions_val_epoch1.parquet.gzip') + if os.path.exists(csv_path): + df = pd.read_csv(csv_path) + elif os.path.exists(parquet_path): + df = pd.read_parquet(parquet_path) + else: + raise OSError(f"Could not find tile predictions file at {folder}") + rename_cols(df, outcome, y_true=y_true, y_pred=y_pred, uncertainty=uncertainty) + if 'patient' not in df.columns: + df['patient'] = df['slide'].map(patients) + dfs += [df] + return dfs
+ + +# --- Utility functions for finding experiment models ------------------------- + +
[docs]def find_model(project, label, outcome, epoch=None, kfold=None): + """Searches for a model in a project model directory. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str): Outcome name. + epoch (int, optional): Epoch to search for. If not None, returns + path to the saved model. If None, returns path to parent model + folder. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + + Raises: + MultipleModelsFoundError: If multiple potential matches are found. + ModelNotFoundError: If no matching model is found. + + Returns: + str: Path to matching model. + """ + tail = '' if kfold is None else f'-kfold{kfold}' + model_name = f'{outcome}-{label}-HP0{tail}' + matching = [ + o for o in os.listdir(project.models_dir) + if o[6:] == model_name + ] + if len(matching) > 1: + raise MultipleModelsFoundError("Multiple matching models found " + f"matching {model_name}") + elif not len(matching): + raise ModelNotFoundError("No matching model found matching " + f"{model_name}.") + elif epoch is not None: + return join( + project.models_dir, + matching[0], + f'{outcome}-{label}-HP0{tail}_epoch{epoch}' + ) + else: + return join(project.models_dir, matching[0])
+ + +
[docs]def model_exists(project, label, outcome, epoch=None, kfold=None): + """Check if matching model exists. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + Returns: + bool: If model exists + """ + try: + find_model(project, label, outcome, kfold=kfold, epoch=epoch) + return True + except ModelNotFoundError: + return False
+ + +
[docs]def find_cv(project, label, outcome, epoch=None, k=3): + """Finds paths to cross-validation models. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + Returns: + list(str): Paths to cross-validation models. + """ + return [ + find_model(project, label, outcome, epoch=epoch, kfold=_k) + for _k in range(1, k+1) + ]
+ + +
[docs]def find_eval(project, label, outcome, epoch=1): + """Finds matching eval directory. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + + + Raises: + MultipleModelsFoundError: If multiple matches are found. + ModelNotFoundError: If no match is found. + + Returns: + str: path to eval directory + """ + matching = [ + o for o in os.listdir(project.eval_dir) + if o[11:] == f'{outcome}-{label}-HP0_epoch{epoch}' + ] + if len(matching) > 1: + raise MultipleModelsFoundError("Multiple matching eval experiments " + f"found for label {label}") + elif not len(matching): + raise ModelNotFoundError(f"No matching eval found for label {label}") + else: + return join(project.eval_dir, matching[0])
+ + +
[docs]def eval_exists(project, label, outcome, epoch=1): + """Check if matching eval exists. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + epoch (int, optional): Epoch number of saved model. Defaults to None. + + Returns: + bool: If eval exists + """ + try: + find_eval(project, label, outcome, epoch=epoch) + return True + except ModelNotFoundError: + return False
+ + +# --- Thresholding and metrics functions -------------------------------------- + +
[docs]def read_group_predictions(path): + '''Reads patient- or slide-level predictions CSV or parquet file, + returning y_true and y_pred. + + Expects a binary categorical outcome. + + Compatible with Slideflow 1.1 and 1.2. + ''' + if not os.path.exists(path): + raise OSError(f"Could not find predictions file at {path}") + if sf.util.path_to_ext(path).lower() == 'csv': + df = pd.read_csv(path) + elif sf.util.path_to_ext(path).lower() in ('parquet', 'gzip'): + df = pd.read_parquet(path) + else: + raise ValueError(f"Unrecognized extension for prediction file {path}") + if 'y_true1' in df.columns: + y_true = df['y_true1'].to_numpy() + else: + y_true_cols = [c for c in df.columns if c.endswith('y_true')] + if len(y_true_cols) == 1: + y_true = df[y_true_cols[0]].to_numpy() + else: + raise ValueError(f"Could not find y_true column at {path}") + if 'percent_tiles_positive1' in df.columns: + y_pred = df['percent_tiles_positive1'].to_numpy() + else: + y_pred_cols = [c for c in df.columns if 'y_pred' in c] + if len(y_pred_cols) == 2: + y_pred = df[y_pred_cols[1]].to_numpy() + else: + raise ValueError(f"Expected exactly 2 y_pred columns at {path}; " + f"got {len(y_pred_cols)}") + return y_true, y_pred
+ + +
[docs]def prediction_metrics(y_true, y_pred, threshold): + """Calculate prediction metrics (AUC, sensitivity/specificity, etc) + + Args: + y_true (np.ndarray): True labels. + y_pred (np.ndarray): Predictions. + threshold (_type_): Prediction threshold. + + Returns: + dict: Prediction metrics. + """ + yt = y_true.astype(bool) + yp = y_pred > threshold + + alpha = 0.05 + z = stats.norm.ppf((1 - alpha/2)) + tp = np.logical_and(yt, yp).sum() + fp = np.logical_and(np.logical_not(yt), yp).sum() + tn = np.logical_and(np.logical_not(yt), np.logical_not(yp)).sum() + fn = np.logical_and(yt, np.logical_not(yp)).sum() + acc = (tp + tn) / (tp + tn + fp + fn) + sensitivity = tp / (tp + fn) + specificity = tn / (tn + fp) + + # Youden's confidence interval, via BAC (bootstrap AC estimate) + # Bootstrapping performed with sample size n = 100 and iterations B = 500 + all_jac = [] + for _ in range(500): + bootstrap_i = np.random.choice(np.arange(yt.shape[0]), size=(150,)) + _yt = yt[bootstrap_i] + _yp = yp[bootstrap_i] + _tp = np.logical_and(_yt, _yp).sum() + _fp = np.logical_and(np.logical_not(_yt), _yp).sum() + _tn = np.logical_and(np.logical_not(_yt), np.logical_not(_yp)).sum() + _fn = np.logical_and(_yt, np.logical_not(_yp)).sum() + _jac = (((_tn + 0.5 * z**2) / (_tn + _fp + z**2)) + - ((_fn + 0.5 * z**2) / (_fn + _tp + z**2))) + all_jac += [_jac] + + jac = mean(all_jac) + jac_var = variance(all_jac) + jac_low = jac - z * np.sqrt(jac_var) + jac_high = jac + z * np.sqrt(jac_var) + + # AUC confidence intervals + if not np.array_equal(np.unique(y_true), [0, 1]): + sf.util.log.warn("Unable to calculate CI; NaNs exist") + ci = [None, None] + else: + delong_auc, auc_cov = delong_roc_variance(y_true, y_pred) + auc_std = np.sqrt(auc_cov) + lower_upper_q = np.abs(np.array([0, 1]) - alpha / 2) + ci = stats.norm.ppf(lower_upper_q, loc=delong_auc, scale=auc_std) + ci[ci > 1] = 1 + + return { + 'auc_low': ci[0], + 'auc_high': ci[1], + 'acc': acc, + 'sens': sensitivity, + 'spec': specificity, + 'youden': sensitivity + specificity - 1, + 'youden_low': jac_low, + 'youden_high': jac_high, + }
+ + +
[docs]def auc_and_threshold(y_true, y_pred): + """Calculates AUC and optimal threshold (via Youden's J) + + Args: + y_true (np.ndarray): Y true (labels). + y_pred (np.ndarray): Y pred (predictions). + + Returns: + float: AUC + float: Optimal threshold + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred) + roc_auc = metrics.auc(fpr, tpr) + max_j = max(zip(tpr, fpr), key=lambda x: x[0]-x[1]) + optimal_threshold = threshold[list(zip(tpr, fpr)).index(max_j)] + return roc_auc, optimal_threshold
+ + +
[docs]def auc(y_true, y_pred): + """Calculate Area Under Receiver Operator Curve (AUC / AUROC) + + Args: + y_true (np.ndarray): True labels. + y_pred (np.ndarray): Predictions. + + Returns: + Float: AUC + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + try: + fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred) + return metrics.auc(fpr, tpr) + except ValueError: + sf.util.log.warn("Unable to calculate ROC") + return np.nan
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/cellseg/index.html b/docs/_modules/slideflow/cellseg/index.html new file mode 100644 index 000000000..5356369dc --- /dev/null +++ b/docs/_modules/slideflow/cellseg/index.html @@ -0,0 +1,1142 @@ + + + + + + + + + + + + slideflow.cellseg — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.cellseg

+import time
+import rasterio
+import cv2
+import threading
+import multiprocessing as mp
+import numpy as np
+import cellpose
+import cellpose.models
+import logging
+import slideflow as sf
+import zarr
+import torch
+import shapely.affinity as sa
+from queue import Queue
+from numcodecs import Blosc
+from matplotlib.colors import to_rgb
+from tqdm import tqdm
+from typing import Tuple, Union, Callable, Optional, Iterable, TYPE_CHECKING, List
+from functools import partial
+from PIL import Image, ImageDraw
+from cellpose.utils import outlines_list
+from cellpose.models import Cellpose
+from cellpose import transforms, plot, dynamics
+from slideflow.slide.utils import draw_roi
+from slideflow.util import batch_generator, log
+from slideflow.model import torch_utils
+
+from . import seg_utils
+
+if TYPE_CHECKING:
+    from rich.progress import Progress, TaskID
+    import shapely.geometry
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Segmentation: + + def __init__( + self, + masks: np.ndarray, + *, + slide: Optional[sf.WSI] = None, + flows: Optional[np.ndarray] = None, + styles: Optional[np.ndarray] = None, + diams: Optional[np.ndarray] = None, + wsi_dim: Optional[Tuple[int, int]] = None, + wsi_offset: Optional[Tuple[int, int]] = None + ): + """Organizes a collection of cell segmentation masks for a slide. + + Args: + masks (np.ndarray): Array of masks, dtype int32, where 0 represents + non-segmented background, and each segmented mask is represented + by unique increasing integers. + + Keyword args: + slide (slideflow.WSI): If provided, ``Segmentation`` can coordinate + extracting tiles at mask centroids. Defaults to None. + flows (np.ndarray): Array of flows, dtype float32. Defaults to None. + wsi_dim (tuple(int, int)): Size of ``masks`` in the slide + pixel space (highest magnification). Used to align the mask + array to a corresponding slide. Required for calculating + centroids. Defaults to None. + wsi_offset (tuple(int, int)): Top-left starting location for + ``masks``, in slide pixel space (highest magnification). + Used to align the mask array to a corresponding slide. + Required for calculating centroids. Defaults to None. + styles (np.ndarray): Array of styles, currently ignored. + diams (np.ndarray): Array of diameters, currently ignored. + + """ + if not isinstance(masks, np.ndarray): + raise ValueError("First argument (masks) must be a numpy array.") + self.slide = slide + self.masks = masks + self.flows = flows + self._outlines = None + self._centroids = None + self.wsi_dim = wsi_dim + self.wsi_offset = wsi_offset + + @classmethod + def load(cls, path) -> "Segmentation": + """Alternate class initializer; load a saved Segmentation from *.zip. + + Args: + path (str): Path to *.zip containing saved Segmentation, as created + through :meth:`slideflow.cellseg.Segmentation.save`. + + """ + loaded = zarr.load(path) + if 'masks' not in loaded: + raise TypeError(f"Unable to load '{path}'; 'masks' index not found.") + flows = None if 'flows' not in loaded else loaded['flows'] + obj = cls(slide=None, masks=loaded['masks'], flows=flows) + obj.wsi_dim = loaded['wsi_dim'] + obj.wsi_offset = loaded['wsi_offset'] + if 'centroids' in loaded: + obj._centroids = loaded['centroids'] + return obj + + @property + def outlines(self) -> np.ndarray: + """Calculate and return mask outlines as ``np.ndarray``.""" + if self._outlines is None: + self.calculate_outlines() + return self._outlines + + @property + def wsi_ratio(self) -> Optional[float]: + """Ratio of WSI base dimension to the mask shape. + + Returns `None` if ``wsi_dim`` was not set. + """ + if self.wsi_dim is not None: + return self.wsi_dim[1] / self.masks.shape[0] + else: + return None + + def apply_rois( + self, + scale: float, + annpolys: List["shapely.geometry.Polygon"] + ) -> None: + """Apply regions of interest (ROIs), excluding masks outside ROIs. + + Args: + scale (float): ROI scale (roi size / WSI base dimension size). + annpolys (list(``shapely.geometry.Polygon``)): List of ROI + polygons, as available in ``slideflow.WSI.rois``. + + """ + if self.wsi_ratio is not None and len(annpolys): + roi_seg_scale = scale / self.wsi_ratio + scaled_polys = [ + sa.scale( + poly, + xfact=roi_seg_scale, + yfact=roi_seg_scale, + origin=(0, 0) + ) for poly in annpolys + ] + roi_seg_mask = rasterio.features.rasterize( + scaled_polys, + out_shape=self.masks.shape, + all_touched=False + ).astype(bool) + self.masks *= roi_seg_mask + self.calculate_centroids(force=True) + elif self.wsi_ratio is None: + log.warning("Unable to apply ROIs; WSI dimensions not set.") + return + else: + # No ROIs to apply + return + + def centroids(self, wsi_dim: bool = False) -> np.ndarray: + """Calculate and return mask centroids. + + Args: + wsi_dim (bool): Convert centroids from mask space to WSI space. + Requires that ``wsi_dim`` was provided during initialization. + + Returns: + A ``np.ndarray`` with shape ``(2, num_masks)``. + + """ + if self._centroids is None: + self.calculate_centroids() + if wsi_dim: + if self.wsi_dim is None: + raise ValueError("Unable to calculate wsi_dim for centroids - " + "wsi_dim is not set.") + ratio = self.wsi_dim[1] / self.masks.shape[0] + return ((self._centroids * ratio)[:, ::-1] + self.wsi_offset).astype(np.int32) + else: + return self._centroids + + def _draw_centroid(self, img, color='green'): + pil_img = Image.fromarray(img) + draw = ImageDraw.Draw(pil_img) + for c in self.centroids(): + x, y = np.int32(c[1]), np.int32(c[0]) + draw.ellipse((x-3, y-3, x+3, y+3), fill=color) + return np.asarray(pil_img) + + def calculate_centroids(self, force: bool = False) -> None: + """Calculate centroids. + + Centroid values are buffered into ``Segmentation._centroids`` to + reduce unnecessary recalculations. + + Args: + force (bool): Recalculate centroids, even if calculated before. + + """ + if self._centroids is not None and not force: + return + mask_s = seg_utils.sparse_mask(self.masks) + self._centroids = seg_utils.get_sparse_centroid(self.masks, mask_s) + + def calculate_outlines(self, force: bool = False) -> None: + """Calculate mask outlines. + + Mask outlines are buffered into ``Segmentation._outlines`` to + reduce unnecessary recalculations. + + Args: + force (bool): Recalculate outlines, even if calculated before. + + """ + if self._outlines is not None and not force: + return + self._outlines = outlines_list(self.masks) + + def centroid_to_image(self, color: str = 'green') -> np.ndarray: + """Render an image with the location of all centroids as a point. + + Args: + color (str): Centroid color. Defaults to 'green'. + + """ + img = np.zeros((self.masks.shape[0], self.masks.shape[1], 3), dtype=np.uint8) + return self._draw_centroid(img, color=color) + + def extract_centroids( + self, + slide: str, + tile_px: int = 128, + ) -> Callable: + """Return a generator which extracts tiles from a slide at mask centroids. + + Args: + slide (str): Path to a slide. + tile_px (int): Height/width of tile to extract at centroids. + Defaults to 128. + + Returns: + A generator which yields a numpy array, with shape + ``(tile_px, tile_px, 3)``, at each mask centroid. + """ + reader = sf.slide.wsi_reader(slide) + factor = reader.dimensions[1] / self.masks.shape[0] + + def generator(): + for c in self._centroids: + cf = c * factor + self.wsi_offset + yield reader.read_from_pyramid( + (cf[1]-(tile_px/2), cf[0]-(tile_px/2)), + (tile_px, tile_px), + (tile_px, tile_px), + convert='numpy', + flatten=True + ) + + return generator + + def mask_to_image(self, centroid=False, color='cyan', centroid_color='green'): + """Render an image of all masks. + + Masks are rendered on a black background. + + Args: + centroid (bool): Include centroids as points on the image. + Defaults to False. + color (str): Color of the masks. Defaults to 'cyan'. + centroid_color (str): Color of centroid points. Defaults to 'green'. + + Returns: + np.ndarray + """ + if isinstance(color, str): + color = [int(c * 255) for c in to_rgb(color)] + else: + assert len(color) == 3 + img = np.zeros((self.masks.shape[0], self.masks.shape[1], 3), dtype=np.uint8) + img[self.masks > 0] = color + if centroid: + return self._draw_centroid(img, color=centroid_color) + else: + return img + + def outline_to_image(self, centroid=False, color='red', centroid_color='green'): + """Render an image with the outlines of all masks. + + Args: + centroid (bool): Include centroids as points on the image. + Defaults to False. + color (str): Color of the mask outlines. Defaults to 'red'. + centroid_color (str): Color of centroid points. Defaults to 'green'. + + Returns: + np.ndarray + """ + empty = np.zeros((self.masks.shape[0], self.masks.shape[1], 3), dtype=np.uint8) + img = draw_roi(empty, self.outlines, color=color) + if centroid: + return self._draw_centroid(img, color=centroid_color) + else: + return img + + def save( + self, + filename: str, + centroids: bool = True, + flows: bool = True + ) -> None: + """Save segmentation masks and metadata to \*.zip. + + A :class:`slideflow.cellseg.Segmentation` object can be loaded from + this file with ``.load()``. + + Args: + filename (str): Destination filename (ends with \*.zip) + centroids (bool): Save centroid locations. + flows (bool): Save flows. + + """ + if not filename.endswith('zip'): + filename += '.zip' + save_dict = dict( + masks=self.masks, + compressor=Blosc( + cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE + ) + ) + if centroids: + self.calculate_centroids() + if self._centroids is not None and centroids: + save_dict['centroids'] = self._centroids + if self.flows is not None and flows: + save_dict['flows'] = self.flows + if self.wsi_dim is not None: + save_dict['wsi_dim'] = self.wsi_dim + if self.wsi_offset is not None: + save_dict['wsi_offset'] = self.wsi_offset + seg_utils.save_zarr_compressed(filename, **save_dict)
+ +# ----------------------------------------------------------------------------- + + +def follow_flows(dP_and_cellprob, cp_thresh, gpus=(0,), **kwargs): + dP, cellprob = dP_and_cellprob + if gpus is not None: + _id = mp.current_process()._identity + proc = 0 if not len(_id) else _id[0] + kwargs['device'] = torch.device(f'cuda:{gpus[proc % len(gpus)]}') + if np.any(cellprob > cp_thresh): + return dynamics.follow_flows( + dP * (cellprob > cp_thresh) / 5., + use_gpu=(gpus is not None), + **kwargs + ) + else: + return (None, None) + + +def remove_bad_flow(mask_and_dP, flow_threshold, gpus=(0,), **kwargs): + mask, dP = mask_and_dP + if gpus is not None: + _id = mp.current_process()._identity + proc = 0 if not len(_id) else _id[0] + kwargs['device'] = torch.device(f'cuda:{gpus[proc % len(gpus)]}') + if mask.max() > 0 and flow_threshold is not None and flow_threshold > 0: + mask = dynamics.remove_bad_flow_masks( + mask, + dP, + threshold=flow_threshold, + use_gpu=(gpus is not None), + **kwargs + ) + return mask + + +def resize_and_clean_mask(mask, target_size=None): + # Resizing + recast = mask.max() >= 2**16-1 + if target_size: + if recast: + mask = mask.astype(np.float32) + else: + mask = mask.astype(np.uint16) + mask = cv2.resize( + mask, + (target_size, target_size), + interpolation=cv2.INTER_NEAREST + ).astype(np.uint32) + elif not recast: + mask = mask.astype(np.uint16) + mask = dynamics.utils.fill_holes_and_remove_small_masks(mask, min_size=15) + if mask.dtype == np.uint32 and mask.max() == 65535: + log.warn(f'more than 65535 masks in image, masks returned as np.uint32') + return mask + + +def get_empty_mask(shape): + mask = np.zeros(shape, np.uint16) + p = np.zeros((len(shape), *shape), np.uint16) + return mask, p + + +def normalize_img(X): + X = X.float() + i99 = torch.quantile(X, 0.99) + i1 = torch.quantile(X, 0.01) + return (X - i1) / (i99 - i1) + + +def process_image(img, nchan): + return transforms.convert_image( + img, + channels=[[0, 0]], + channel_axis=None, + z_axis=None, + do_3D=False, + normalize=False, + invert=False, + nchan=nchan) + + +def process_batch(img_batch): + # Ensure Ly and Lx are divisible by 4 + assert not (img_batch.shape[1] % 16 or img_batch.shape[2] % 16) + + # Normalize and permute axes. + img_batch = normalize_img(img_batch) + img_batch = torch.permute(img_batch, (0, 3, 1, 2)) + return img_batch + + +def get_masks(args, cp_thresh): + p, inds, cellprob = args + if inds is None: + mask, p = get_empty_mask(cellprob.shape) + else: + mask = dynamics.get_masks(p, iscell=(cellprob > cp_thresh)) + return mask, p + + +def tile_processor(slide, q, batch_size, nchan): + tiles = batch_generator( + slide.torch( + incl_loc='grid', + num_threads=4, + to_tensor=False, + grayspace_fraction=1, + lazy_iter=True + ), + batch_size + ) + for tile_dict in tiles: + imgs = [t['image_raw'] for t in tile_dict] + imgs = np.array([process_image(img, nchan) for img in imgs]) + c = [(t['loc_x'], t['loc_y']) for t in tile_dict] + q.put((imgs, c)) + q.put(None) + + +
[docs]def segment_slide( + slide: Union[sf.WSI, str], + model: Union["cellpose.models.Cellpose", str] = 'cyto2', + *, + diam_um: Optional[float] = None, + diam_mean: Optional[int] = None, + window_size: Optional[int] = None, + downscale: Optional[float] = None, + batch_size: int = 8, + gpus: Optional[Union[int, Iterable[int]]] = (0,), + spawn_workers: bool = True, + pb: Optional["Progress"] = None, + pb_tasks: Optional[List["TaskID"]] = None, + show_progress: bool = True, + save_flow: bool = True, + cp_thresh: float = 0.0, + flow_threshold: float = 0.4, + interp: bool = True, + tile: bool = True, + verbose: bool = True, + device: Optional[str] = None, +) -> Segmentation: + """Segment cells in a whole-slide image, returning masks and centroids. + + Args: + slide (str, :class:`slideflow.WSI`): Whole-slide image. May be a path + (str) or WSI object (`slideflow.WSI`). + + Keyword arguments: + model (str, :class:`cellpose.models.Cellpose`): Cellpose model to use + for cell segmentation. May be any valid cellpose model. Defaults + to 'cyto2'. + diam_um (float, optional): Cell diameter to detect, in microns. + Determines tile extraction microns-per-pixel resolution to match + the given pixel diameter specified by `diam_mean`. Not used if + `slide` is a `sf.WSI` object. + diam_mean (int, optional): Cell diameter to detect, in pixels (without + image resizing). If None, uses Cellpose defaults (17 for the + 'nuclei' model, 30 for all others). + window_size (int): Window size, in pixels, at which to segment cells. + Not used if slide is a `sf.WSI` object. + downscale (float): Factor by which to downscale generated masks after + calculation. Defaults to None (keep masks at original size). + batch_size (int): Batch size for cell segmentation. Defaults to 8. + gpus (int, list(int)): GPUs to use for cell segmentation. + Defaults to 0 (first GPU). + spawn_workers (bool): Enable spawn-based multiprocessing. Increases + cell segmentation speed at the cost of higher memory utilization. + pb (:class:`rich.progress.Progress`, optional): Progress bar instance. + Used for external progress bar tracking. Defaults to None. + pb_tasks (list(:class:`rich.progress.TaskID`)): Progress bar tasks. + Used for external progress bar tracking. Defaults to None. + show_progress (bool): Show a tqdm progress bar. Defaults to True. + save_flow (bool): Save flow values for the whole-slide image. + Increases memory utilization. Defaults to True. + cp_thresh (float): Cell probability threshold. All pixels with value + above threshold kept for masks, decrease to find more and larger + masks. Defaults to 0. + flow_threshold (float): Flow error threshold (all cells with errors + below threshold are kept). Defaults to 0.4. + interp (bool): Interpolate during 2D dynamics. Defaults to True. + tile (bool): Tiles image to decrease GPU/CPU memory usage. + Defaults to True. + verbose (bool): Verbose log output at the INFO level. Defaults to True. + + Returns: + :class:`slideflow.cellseg.Segmentation` + """ + + # Quiet the logger to suppress warnings of empty masks + logging.getLogger('cellpose').setLevel(40) + if diam_mean is None: + diam_mean = 30 if model != 'nuclei' else 17 + + # Initial validation checks. ---------------------------------------------- + if isinstance(slide, str): + assert diam_um is not None, "Must supply diam_um if slide is a path to a slide" + assert window_size is not None, "Must supply window_size if slide is a path to a slide" + tile_um = int(window_size * (diam_um / diam_mean)) + slide = sf.WSI(slide, tile_px=window_size, tile_um=tile_um, verbose=False) + elif window_size is not None or diam_um is not None: + raise ValueError("Invalid argument: cannot provide window_size or diam_um " + "when slide is a sf.WSI object") + else: + window_size = slide.tile_px + diam_um = diam_mean * (slide.tile_um/slide.tile_px) + if window_size % 16: + raise ValueError("Window size (tile_px) must be a multiple of 16.") + if downscale is None: + target_size = window_size + else: + target_size = int(window_size / downscale) + if slide.stride_div != 1: + log.warn("Whole-slide cell segmentation not configured for strides " + f"other than 1 (got: {slide.stride_div}).") + + # Set up model and parameters. -------------------------------------------- + start_time = time.time() + device = torch_utils.get_device(device) + if device.type == 'cpu': + # Run from CPU if CUDA is not available + model = Cellpose(gpu=False, device=device) + gpus = None + log.info("No GPU detected - running from CPU") + else: + model = Cellpose(gpu=True, device=device) + cp = model.cp + cp.batch_size = batch_size + cp.net.load_model(cp.pretrained_model[0], cpu=(not cp.gpu)) # Modify to accept different models + cp.net.eval() + rescale = 1 # No rescaling, as we are manually setting diameter = diam_mean + mask_dim = (slide.stride * (slide.shape[0]-1) + slide.tile_px, + slide.stride * (slide.shape[1]-1) + slide.tile_px) + all_masks = np.zeros((slide.shape[1] * target_size, + slide.shape[0] * target_size), + dtype=np.uint32) + if save_flow: + all_flows = np.zeros((slide.shape[1] * target_size, + slide.shape[0] * target_size, 3), + dtype=np.uint8) + + log_fn = log.info if verbose else log.debug + log_fn("=== Segmentation parameters ===") + log_fn(f"Diameter (px): {diam_mean}") + log_fn(f"Diameter (um): {diam_um}") + log_fn(f"Window size: {window_size}") + log_fn(f"Target size: {target_size}") + log_fn(f"Perform tiled: {tile}") + log_fn(f"Slide dimensions: {slide.dimensions}") + log_fn(f"Slide shape: {slide.shape}") + log_fn(f"Slide stride (px): {slide.stride}") + log_fn(f"Est. tiles: {slide.estimated_num_tiles}") + log_fn(f"Save flow: {save_flow}") + log_fn(f"Mask dimensions: {mask_dim}") + log_fn(f"Mask size: {all_masks.shape}") + log_fn("===============================") + + # Processes and pools. ---------------------------------------------------- + tile_q = mp.Queue(4) + y_q = Queue(2) + ctx = mp.get_context('spawn') + fork_pool = mp.Pool( + batch_size, + initializer=sf.util.set_ignore_sigint + ) + if spawn_workers: + spawn_pool = ctx.Pool( + 4, + initializer=sf.util.set_ignore_sigint + ) + else: + spawn_pool = mp.dummy.Pool(4) + proc_fn = mp.Process if sf.slide_backend() != 'libvips' else threading.Thread + tile_process = proc_fn( + target=tile_processor, + args=(slide, tile_q, batch_size, cp.nchan) + ) + tile_process.start() + + def net_runner(): + while True: + item = tile_q.get() + if item is None: + y_q.put(None) + break + imgs, c = item + torch_batch = cp._to_device(imgs) + torch_batch = process_batch(torch_batch) + if tile: + y, style = cp._run_tiled( + torch_batch.cpu().numpy(), + augment=False, + bsize=224, + return_conv=False + ) + else: + y, style = cp.network(torch_batch) + y_q.put((y, style, c)) + + runner = threading.Thread(target=net_runner) + runner.start() + + # Main loop. -------------------------------------------------------------- + running_max = 0 + if show_progress: + tqdm_pb = tqdm(total=slide.estimated_num_tiles) + while True: + item = y_q.get() + if item is None: + break + y, style, c = item + + # Initial preparation + #style /= (style**2).sum()**0.5 + y = np.transpose(y, (0,2,3,1)) + cellprob = y[:, :, :, 2].astype(np.float32) + dP = y[:, :, :, :2].transpose((3,0,1,2)) + del y, style + #styles = style.squeeze() + + # Calculate flows + batch_p, batch_ind = zip(*spawn_pool.map( + partial(follow_flows, + niter=(1 / rescale * 200), + interp=interp, + cp_thresh=cp_thresh, + gpus=gpus), + zip([dP[:, i] for i in range(len(c))], cellprob) + )) + + # Calculate masks + batch_masks, batch_p = zip(*fork_pool.map( + partial(get_masks, cp_thresh=cp_thresh), + zip(batch_p, batch_ind, cellprob))) + + # Remove bad flow + batch_masks = spawn_pool.map( + partial(remove_bad_flow, flow_threshold=flow_threshold, gpus=gpus), + zip(batch_masks, [dP[:, i] for i in range(len(c))])) + + # Resize masks and clean (remove small masks/holes) + batch_masks = fork_pool.map( + partial(resize_and_clean_mask, + target_size=(None if target_size == window_size + else target_size)), + batch_masks) + + dP = dP.squeeze() + cellprob = cellprob.squeeze() + #p = np.stack(batch_p, axis=0) + #flows = [plot.dx_to_circ(dP), dP, cellprob, p] + + for i in range(len(c)): + x, y = c[i][0], c[i][1] + img_masks = batch_masks[i].astype(np.uint32) + max_in_mask = img_masks.max() + img_masks[np.nonzero(img_masks)] += running_max + running_max += max_in_mask + all_masks[y * target_size: (y+1)*target_size, + x * target_size: (x+1)*target_size] = img_masks + if save_flow: + flow_plot = plot.dx_to_circ(dP[:, i]) + if target_size != window_size: + flow_plot = cv2.resize(flow_plot, (target_size, target_size)) + all_flows[y * target_size: (y+1)*target_size, + x * target_size: (x+1)*target_size, :] = flow_plot + + # Final cleanup + del dP, cellprob + + # Update progress bars + if show_progress: + tqdm_pb.update(batch_size) + if pb is not None and pb_tasks: + for task in pb_tasks: + pb.advance(task, batch_size) + + # Close pools/processes and log time. + spawn_pool.close() + spawn_pool.join() + fork_pool.close() + fork_pool.join() + runner.join() + tile_process.join() + ttime = time.time() - start_time + log.info(f"Segmented {running_max} cells for {slide.name} ({ttime:.0f} s)") + + # Calculate WSI dimensions and return final segmentation. + wsi_dim = (slide.shape[0] * slide.full_extract_px, + slide.shape[1] * slide.full_extract_px) + wsi_offset = (0, 0) + + return Segmentation( + slide=slide, + masks=all_masks, + flows=None if not save_flow else all_flows, + wsi_dim=wsi_dim, + wsi_offset=wsi_offset)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/dataset/index.html b/docs/_modules/slideflow/dataset/index.html new file mode 100644 index 000000000..cab5f2ad1 --- /dev/null +++ b/docs/_modules/slideflow/dataset/index.html @@ -0,0 +1,4924 @@ + + + + + + + + + + + + slideflow.dataset — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.dataset

+"""Module for the ``Dataset`` class and its associated functions.
+
+The ``Dataset`` class handles management of collections of patients,
+clinical annotations, slides, extracted tiles, and assembly of images
+into torch DataLoader and tensorflow Dataset objects. The high-level
+overview of the structure of ``Dataset`` is as follows:
+
+
+ ──────────── Information Methods ───────────────────────────────
+   Annotations      Slides        Settings         TFRecords
+  ┌──────────────┐ ┌─────────┐   ┌──────────────┐ ┌──────────────┐
+  │Patient       │ │Paths to │   │Tile size (px)│ | *.tfrecords  |
+  │Slide         │ │ slides  │   │Tile size (um)│ |  (generated) |
+  │Label(s)      │ └─────────┘   └──────────────┘ └──────────────┘
+  │ - Categorical│  .slides()     .tile_px         .tfrecords()
+  │ - Continuous │  .rois()       .tile_um         .manifest()
+  │ - Time Series│  .slide_paths()                 .num_tiles
+  └──────────────┘  .thumbnails()                  .img_format
+    .patients()
+    .rois()
+    .labels()
+    .harmonize_labels()
+    .is_float()
+
+
+ ─────── Filtering and Splitting Methods ──────────────────────
+  ┌────────────────────────────┐
+  │                            │
+  │ ┌─────────┐                │ .filter()
+  │ │Filtered │                │ .remove_filter()
+  │ │ Dataset │                │ .clear_filters()
+  │ └─────────┘                │ .split()
+  │               Full Dataset │
+  └────────────────────────────┘
+
+
+ ───────── Summary of Image Data Flow ──────────────────────────
+  ┌──────┐
+  │Slides├─────────────┐
+  └──┬───┘             │
+     │                 │
+     ▼                 │
+  ┌─────────┐          │
+  │TFRecords├──────────┤
+  └──┬──────┘          │
+     │                 │
+     ▼                 ▼
+  ┌────────────────┐ ┌─────────────┐
+  │torch DataLoader│ │Loose images │
+  │ / tf Dataset   │ │ (.png, .jpg)│
+  └────────────────┘ └─────────────┘
+
+ ──────── Slide Processing Methods ─────────────────────────────
+  ┌──────┐
+  │Slides├───────────────┐
+  └──┬───┘               │
+     │.extract_tiles()   │.extract_tiles(
+     ▼                   │    save_tiles=True
+  ┌─────────┐            │  )
+  │TFRecords├────────────┤
+  └─────────┘            │ .extract_tiles
+                         │  _from_tfrecords()
+
+                       ┌─────────────┐
+                       │Loose images │
+                       │ (.png, .jpg)│
+                       └─────────────┘
+
+
+ ─────────────── TFRecords Operations ─────────────────────────
+                      ┌─────────┐
+   ┌────────────┬─────┤TFRecords├──────────┐
+   │            │     └─────┬───┘          │
+   │.tfrecord   │.tfrecord  │ .balance()   │.resize_tfrecords()
+   │  _heatmap()│  _report()│ .clip()      │.split_tfrecords
+   │            │           │ .torch()     │  _by_roi()
+   │            │           │ .tensorflow()│
+   ▼            ▼           ▼              ▼
+  ┌───────┐ ┌───────┐ ┌────────────────┐┌─────────┐
+  │Heatmap│ │PDF    │ │torch DataLoader││TFRecords│
+  └───────┘ │ Report│ │ / tf Dataset   │└─────────┘
+            └───────┘ └────────────────┘
+"""
+
+import copy
+import csv
+import multiprocessing as mp
+import os
+import shutil
+import threading
+import time
+import types
+import tempfile
+import warnings
+from contextlib import contextmanager
+from collections import defaultdict
+from datetime import datetime
+from glob import glob
+from multiprocessing.dummy import Pool as DPool
+from os.path import basename, dirname, exists, isdir, join
+from queue import Queue
+from random import shuffle
+from tabulate import tabulate  # type: ignore[import]
+from pprint import pformat
+from functools import partial
+from typing import (TYPE_CHECKING, Any, Dict, List, Optional, Sequence, Tuple,
+                    Union, Callable)
+import numpy as np
+import pandas as pd
+import shapely.geometry as sg
+from rich.progress import track, Progress
+from tqdm import tqdm
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.slide import WSI, ExtractionReport, SlideReport
+from slideflow.util import (log, Labels, _shortname, path_to_name,
+                            tfrecord2idx, TileExtractionProgress, MultiprocessProgress)
+
+if TYPE_CHECKING:
+    import tensorflow as tf
+    import cellpose
+    from slideflow.model import BaseFeatureExtractor
+    from slideflow.model import ModelParams
+    from torch.utils.data import DataLoader
+    from slideflow.norm import StainNormalizer
+
+# -----------------------------------------------------------------------------
+
+
+def _prepare_slide(
+    path: str,
+    report_dir: Optional[str],
+    wsi_kwargs: Dict,
+    qc: Optional[str],
+    qc_kwargs: Dict,
+) -> Optional["sf.WSI"]:
+
+    try:
+        slide = sf.WSI(path, **wsi_kwargs)
+        if qc:
+            slide.qc(method=qc, **qc_kwargs)
+        return slide
+    except errors.MissingROIError:
+        log.debug(f'Missing ROI for slide {path}; skipping')
+        return None
+    except errors.IncompatibleBackendError:
+        log.error('Slide {} has type {}, which is incompatible with the active '
+                  'slide reading backend, {}. Consider using a different '
+                  'backend, which can be set with the environmental variable '
+                  'SF_SLIDE_BACKEND. See https://slideflow.dev/installation/#cucim-vs-libvips '
+                  'for more information.'.format(
+                    path,
+                    sf.util.path_to_ext(path).upper(),
+                    sf.slide_backend()
+                  ))
+    except errors.SlideLoadError as e:
+        log.error(f'Error loading slide {path}: {e}. Skipping')
+        return None
+    except errors.QCError as e:
+        log.error(e)
+        return None
+    except errors.TileCorruptionError:
+        log.error(f'{path} corrupt; skipping')
+        return None
+    except (KeyboardInterrupt, SystemExit) as e:
+        print('Exiting...')
+        raise e
+    except Exception as e:
+        log.error(f'Error processing slide {path}: {e}. Skipping')
+        return None
+
+
+@contextmanager
+def _handle_slide_errors(path: str):
+    try:
+        yield
+    except errors.MissingROIError:
+        log.info(f'Missing ROI for slide {path}; skipping')
+    except errors.SlideLoadError as e:
+        log.error(f'Error loading slide {path}: {e}. Skipping')
+    except errors.QCError as e:
+        log.error(e)
+    except errors.TileCorruptionError:
+        log.error(f'{path} corrupt; skipping')
+    except (KeyboardInterrupt, SystemExit) as e:
+        print('Exiting...')
+        raise e
+
+
+def _tile_extractor(
+    path: str,
+    tfrecord_dir: str,
+    tiles_dir: str,
+    reports: Dict,
+    qc: str,
+    wsi_kwargs: Dict,
+    generator_kwargs: Dict,
+    qc_kwargs: Dict,
+    render_thumb: bool = True
+) -> None:
+    """Extract tiles. Internal function.
+
+    Slide processing needs to be process-isolated when num_workers > 1 .
+
+    Args:
+        tfrecord_dir (str): Path to TFRecord directory.
+        tiles_dir (str): Path to tiles directory (loose format).
+        reports (dict): Multiprocessing-enabled dict.
+        qc (bool): Quality control method.
+        wsi_kwargs (dict): Keyword arguments for sf.WSI.
+        generator_kwargs (dict): Keyword arguments for WSI.extract_tiles()
+        qc_kwargs(dict): Keyword arguments for quality control.
+    """
+    with _handle_slide_errors(path):
+        log.debug(f'Extracting tiles for {path_to_name(path)}')
+        slide = _prepare_slide(
+            path,
+            report_dir=tfrecord_dir,
+            wsi_kwargs=wsi_kwargs,
+            qc=qc,
+            qc_kwargs=qc_kwargs)
+        if slide is not None:
+            report = slide.extract_tiles(
+                tfrecord_dir=tfrecord_dir,
+                tiles_dir=tiles_dir,
+                **generator_kwargs
+            )
+            if render_thumb and isinstance(report, SlideReport):
+                _ = report.thumb
+            reports.update({path: report})
+
+
+def _buffer_slide(path: str, dest: str) -> str:
+    """Buffer a slide to a path."""
+    buffered = join(dest, basename(path))
+    shutil.copy(path, buffered)
+
+    # If this is an MRXS file, copy the associated folder.
+    if path.lower().endswith('mrxs'):
+        folder_path = join(dirname(path), path_to_name(path))
+        if exists(folder_path):
+            shutil.copytree(folder_path, join(dest, path_to_name(path)))
+        else:
+            log.debug("Could not find associated MRXS folder for slide buffer")
+
+    return buffered
+
+
+def _debuffer_slide(path: str) -> None:
+    """De-buffer a slide."""
+    os.remove(path)
+    # If this is an MRXS file, remove the associated folder.
+    if path.lower().endswith('mrxs'):
+        folder_path = join(dirname(path), path_to_name(path))
+        if exists(folder_path):
+            shutil.rmtree(folder_path)
+        else:
+            log.debug("Could not find MRXS folder for slide debuffer")
+
+
+def _fill_queue(
+    slide_list: Sequence[str],
+    q: Queue,
+    q_size: int,
+    buffer: Optional[str] = None
+) -> None:
+    """Fill a queue with slide paths, using an optional buffer."""
+    for path in slide_list:
+        warned = False
+        if buffer:
+            while True:
+                if q.qsize() < q_size:
+                    try:
+                        q.put(_buffer_slide(path, buffer))
+                        break
+                    except OSError:
+                        if not warned:
+                            slide = _shortname(path_to_name(path))
+                            log.debug(f'OSError for {slide}: buffer full?')
+                            log.debug(f'Queue size: {q.qsize()}')
+                            warned = True
+                        time.sleep(1)
+                else:
+                    time.sleep(1)
+        else:
+            q.put(path)
+    q.put(None)
+    q.join()
+
+
+def _count_otsu_tiles(wsi):
+    wsi.qc('otsu')
+    return wsi.estimated_num_tiles
+
+
+def _create_index(tfrecord, force=False):
+    index_name = join(
+        dirname(tfrecord),
+        path_to_name(tfrecord)+'.index'
+    )
+    if not tfrecord2idx.find_index(tfrecord) or force:
+        tfrecord2idx.create_index(tfrecord, index_name)
+
+
+def _get_tile_df(
+    slide_path: str,
+    tile_px: int,
+    tile_um: Union[int, str],
+    rois: Optional[List[str]],
+    stride_div: int,
+    roi_method: str
+) -> pd.DataFrame:
+    try:
+        wsi = sf.WSI(
+        slide_path,
+        tile_px,
+        tile_um,
+        rois=rois,
+        stride_div=stride_div,
+        roi_method=roi_method,
+        verbose=False
+    )
+    except Exception as e:
+        log.warning("Skipping slide {}, error raised: {}".format(
+            path_to_name(slide_path), e
+        ))
+        return None
+    _df = wsi.get_tile_dataframe()
+    _df['slide'] = wsi.name
+    return _df
+
+# -----------------------------------------------------------------------------
+
+def split_patients_preserved_site(
+    patients_dict: Dict[str, Dict],
+    n: int,
+    balance: Optional[str] = None,
+    method: str = 'auto'
+) -> List[List[str]]:
+    """Split a dictionary of patients into n groups, with site balancing.
+
+    Splits are balanced according to key "balance", while preserving site.
+
+    Args:
+        patients_dict (dict): Nested dictionary mapping patient names to
+            dict of outcomes: labels
+        n (int): Number of splits to generate.
+        balance (str): Annotation header to balance splits across.
+        method (str): Solver method. 'auto', 'cplex', or 'bonmin'. If 'auto',
+            will use CPLEX if availabe, otherwise will default to pyomo/bonmin.
+
+    Returns:
+        List of patient splits
+    """
+    patient_list = list(patients_dict.keys())
+    shuffle(patient_list)
+
+    def flatten(arr):
+        """Flatten an array."""
+        return [y for x in arr for y in x]
+
+    # Get patient outcome labels
+    if balance is not None:
+        patient_outcome_labels = [
+            patients_dict[p][balance] for p in patient_list
+        ]
+    else:
+        patient_outcome_labels = [1 for _ in patient_list]
+    # Get unique outcomes
+    unique_labels = list(set(patient_outcome_labels))
+    n_unique = len(set(unique_labels))
+    # Delayed import in case CPLEX not installed
+    import slideflow.io.preservedsite.crossfolds as cv
+
+    site_list = [patients_dict[p]['site'] for p in patient_list]
+    df = pd.DataFrame(
+        list(zip(patient_list, patient_outcome_labels, site_list)),
+        columns=['patient', 'outcome_label', 'site']
+    )
+    df = cv.generate(
+        df, 'outcome_label', k=n, target_column='CV', method=method
+    )
+    log.info("[bold]Train/val split with Preserved-Site Cross-Val")
+    log.info("[bold]Category\t" + "\t".join(
+        [str(cat) for cat in range(n_unique)]
+    ))
+    for k in range(n):
+        def num_labels_matching(o):
+            match = df[(df.CV == str(k+1)) & (df.outcome_label == o)]
+            return str(len(match))
+        matching = [num_labels_matching(o) for o in unique_labels]
+        log.info(f"K-fold-{k}\t" + "\t".join(matching))
+    splits = [
+        df.loc[df.CV == str(ni+1), "patient"].tolist()
+        for ni in range(n)
+    ]
+    return splits
+
+
+def split_patients_balanced(
+    patients_dict: Dict[str, Dict],
+    n: int,
+    balance: str
+) -> List[List[str]]:
+    """Split a dictionary of patients into n groups, balancing by outcome.
+
+    Splits are balanced according to key "balance".
+
+    Args:
+        patients_dict (dict): Nested ditionary mapping patient names to
+            dict of outcomes: labels
+        n (int): Number of splits to generate.
+        balance (str): Annotation header to balance splits across.
+
+    Returns:
+        List of patient splits
+    """
+    patient_list = list(patients_dict.keys())
+    shuffle(patient_list)
+
+    def flatten(arr):
+        """Flatten an array."""
+        return [y for x in arr for y in x]
+
+    # Get patient outcome labels
+    patient_outcome_labels = [
+        patients_dict[p][balance] for p in patient_list
+    ]
+    # Get unique outcomes
+    unique_labels = list(set(patient_outcome_labels))
+    n_unique = len(set(unique_labels))
+
+    # Now, split patient_list according to outcomes
+    pt_by_outcome = [
+        [p for p in patient_list if patients_dict[p][balance] == uo]
+        for uo in unique_labels
+    ]
+    # Then, for each sublist, split into n components
+    pt_by_outcome_by_n = [
+        list(sf.util.split_list(sub_l, n)) for sub_l in pt_by_outcome
+    ]
+    # Print splitting as a table
+    log.info(
+        "[bold]Category\t" + "\t".join([str(cat) for cat in range(n_unique)])
+    )
+    for k in range(n):
+        matching = [str(len(clist[k])) for clist in pt_by_outcome_by_n]
+        log.info(f"K-fold-{k}\t" + "\t".join(matching))
+    # Join sublists
+    splits = [
+        flatten([
+            item[ni] for item in pt_by_outcome_by_n
+        ]) for ni in range(n)
+    ]
+    return splits
+
+
+def split_patients(patients_dict: Dict[str, Dict], n: int) -> List[List[str]]:
+    """Split a dictionary of patients into n groups.
+
+    Args:
+        patients_dict (dict): Nested ditionary mapping patient names to
+            dict of outcomes: labels
+        n (int): Number of splits to generate.
+
+    Returns:
+        List of patient splits
+    """
+    patient_list = list(patients_dict.keys())
+    shuffle(patient_list)
+    return list(sf.util.split_list(patient_list, n))
+
+# -----------------------------------------------------------------------------
+
+
+
[docs]class Dataset: + """Supervises organization and processing of slides, tfrecords, and tiles. + + Datasets can be comprised of one or more sources, where a source is a + combination of slides and any associated regions of interest (ROI) and + extracted image tiles (stored as TFRecords or loose images). + + Datasets can be created in two ways: either by loading one dataset source, + or by loading a dataset configuration that contains information about + multiple dataset sources. + + For the first approach, the dataset source configuration is provided via + keyword arguments (``tiles``, ``tfrecords``, ``slides``, and ``roi``). + Each is a path to a directory containing the respective data. + + For the second approach, the first argument ``config`` is either a nested + dictionary containing the configuration for multiple dataset sources, or + a path to a JSON file with this information. The second argument is a list + of dataset sources to load (keys from the ``config`` dictionary). + + With either approach, slide/patient-level annotations are provided through + the ``annotations`` keyword argument, which can either be a path to a CSV + file, or a pandas DataFrame, which must contain at minimum the column + '`patient`'. + """ + + def __init__( + self, + config: Optional[Union[str, Dict[str, Dict[str, str]]]] = None, + sources: Optional[Union[str, List[str]]] = None, + tile_px: Optional[int] = None, + tile_um: Optional[Union[str, int]] = None, + *, + tfrecords: Optional[str] = None, + tiles: Optional[str] = None, + roi: Optional[str] = None, + slides: Optional[str] = None, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[List[str], str]] = None, + annotations: Optional[Union[str, pd.DataFrame]] = None, + min_tiles: int = 0, + ) -> None: + """Initialize a Dataset to organize processed images. + + Examples + Load a dataset via keyword arguments. + + .. code-block:: python + + dataset = Dataset( + tfrecords='../path', + slides='../path', + annotations='../file.csv' + ) + + Load a dataset configuration file and specify dataset source(s). + + .. code-block:: python + + dataset = Dataset( + config='../path/to/config.json', + sources=['Lung_Adeno', 'Lung_Squam'], + annotations='../file.csv + ) + + Args: + config (str, dict): Either a dictionary or a path to a JSON file. + If a dictionary, keys should be dataset source names, and + values should be dictionaries containing the keys 'tiles', + 'tfrecords', 'roi', and/or 'slides', specifying directories for + each dataset source. If `config` is a str, it should be a path + to a JSON file containing a dictionary with the same + formatting. If None, tiles, tfrecords, roi and/or slides should + be manually provided via keyword arguments. Defaults to None. + sources (List[str]): List of dataset sources to include from + configuration. If not provided, will use all sources in the + provided configuration. Defaults to None. + tile_px (int): Tile size in pixels. + tile_um (int or str): Tile size in microns (int) or magnification + (str, e.g. "20x"). + + Keyword args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Only include slides with this + many tiles at minimum. Defaults to 0. + annotations (str or pd.DataFrame, optional): Path + to annotations file or pandas DataFrame with slide-level + annotations. Defaults to None. + + Raises: + errors.SourceNotFoundError: If provided source does not exist + in the dataset config. + """ + if isinstance(tile_um, str): + sf.util.assert_is_mag(tile_um) + tile_um = tile_um.lower() + + self.tile_px = tile_px + self.tile_um = tile_um + self._filters = filters if filters else {} + if filter_blank is None: + self._filter_blank = [] + else: + self._filter_blank = sf.util.as_list(filter_blank) + self._min_tiles = min_tiles + self._clip = {} # type: Dict[str, int] + self.prob_weights = None # type: Optional[Dict] + self._annotations = None # type: Optional[pd.DataFrame] + self.annotations_file = None # type: Optional[str] + + if (any(arg is not None for arg in (tfrecords, tiles, roi, slides)) + and (config is not None or sources is not None)): + raise ValueError( + "When initializing a Dataset object via keywords (tiles, " + "tfrecords, slides, roi), the arguments 'config' and 'sources'" + " are invalid." + ) + elif any(arg is not None for arg in (tfrecords, tiles, roi, slides)): + config = dict(dataset=dict( + tfrecords=tfrecords, tiles=tiles, roi=roi, slides=slides + )) + sources = ['dataset'] + + if isinstance(config, str): + self._config = config + loaded_config = sf.util.load_json(config) + else: + self._config = "<dict>" + loaded_config = config + + # Read dataset sources from the configuration + if sources is None: + raise ValueError("Missing argument 'sources'") + sources = sources if isinstance(sources, list) else [sources] + try: + self.sources = { + k: v for k, v in loaded_config.items() if k in sources + } + self.sources_names = list(self.sources.keys()) + except KeyError: + sources_list = ', '.join(sources) + raise errors.SourceNotFoundError(sources_list, config) + missing_sources = [s for s in sources if s not in self.sources] + if len(missing_sources): + log.warn( + "The following sources were not found in the dataset " + f"configuration: {', '.join(missing_sources)}" + ) + # Create labels for each source based on tile size + if (tile_px is not None) and (tile_um is not None): + label = sf.util.tile_size_label(tile_px, tile_um) + else: + label = None + for source in self.sources: + self.sources[source]['label'] = label + + # Load annotations + if annotations is not None: + self.load_annotations(annotations) + + def __repr__(self) -> str: # noqa D105 + _b = "Dataset(config={!r}, sources={!r}, tile_px={!r}, tile_um={!r})" + return _b.format( + self._config, + self.sources_names, + self.tile_px, + self.tile_um + ) + + @property + def annotations(self) -> Optional[pd.DataFrame]: + """Pandas DataFrame of all loaded clinical annotations.""" + return self._annotations + + @property + def num_tiles(self) -> int: + """Number of tiles in tfrecords after filtering/clipping.""" + tfrecords = self.tfrecords() + m = self.manifest() + if not all([tfr in m for tfr in tfrecords]): + self.update_manifest() + n_tiles = [ + m[tfr]['total'] if 'clipped' not in m[tfr] else m[tfr]['clipped'] + for tfr in tfrecords + ] + return sum(n_tiles) + + @property + def filters(self) -> Dict: + """Returns the active filters, if any.""" + return self._filters + + @property + def filter_blank(self) -> Union[str, List[str]]: + """Returns the active filter_blank filter, if any.""" + return self._filter_blank + + @property + def min_tiles(self) -> int: + """Returns the active min_tiles filter, if any (defaults to 0).""" + return self._min_tiles + + @property + def filtered_annotations(self) -> pd.DataFrame: + """Pandas DataFrame of clinical annotations, after filtering.""" + if self.annotations is not None: + f_ann = self.annotations + + # Only return slides with annotation values specified in "filters" + if self.filters: + for filter_key in self.filters.keys(): + if filter_key not in f_ann.columns: + raise IndexError( + f"Filter header {filter_key} not in annotations." + ) + filter_vals = sf.util.as_list(self.filters[filter_key]) + f_ann = f_ann.loc[f_ann[filter_key].isin(filter_vals)] + + # Filter out slides that are blank in a given annotation + # column ("filter_blank") + if self.filter_blank and self.filter_blank != [None]: + for fb in self.filter_blank: + if fb not in f_ann.columns: + raise errors.DatasetFilterError( + f"Header {fb} not found in annotations." + ) + f_ann = f_ann.loc[f_ann[fb].notna()] + f_ann = f_ann.loc[~f_ann[fb].isin(sf.util.EMPTY)] + + # Filter out slides that do not meet minimum number of tiles + if self.min_tiles: + manifest = self.manifest(key='name', filter=False) + man_slides = [s for s in manifest + if manifest[s]['total'] >= self.min_tiles] + f_ann = f_ann.loc[f_ann.slide.isin(man_slides)] + + return f_ann + else: + return None + + @property + def img_format(self) -> Optional[str]: + """Format of images stored in TFRecords (jpg/png). + + Verifies all tfrecords share the same image format. + + Returns: + str: Image format of tfrecords (PNG or JPG), or None if no + tfrecords have been extracted. + """ + return self.verify_img_format(progress=False) + + def _tfrecords_set(self, source: str): + if source not in self.sources: + raise ValueError(f"Unrecognized dataset source {source}") + config = self.sources[source] + return 'tfrecords' in config and config['tfrecords'] + + def _tiles_set(self, source: str): + if source not in self.sources: + raise ValueError(f"Unrecognized dataset source {source}") + config = self.sources[source] + return 'tiles' in config and config['tiles'] + + def _roi_set(self, source: str): + if source not in self.sources: + raise ValueError(f"Unrecognized dataset source {source}") + config = self.sources[source] + return 'roi' in config and config['roi'] + + def _slides_set(self, source: str): + if source not in self.sources: + raise ValueError(f"Unrecognized dataset source {source}") + config = self.sources[source] + return 'slides' in config and config['slides'] + + def _assert_size_matches_hp(self, hp: Union[Dict, "ModelParams"]) -> None: + """Check if dataset tile size (px/um) matches the given parameters.""" + + if isinstance(hp, dict): + hp_px = hp['tile_px'] + hp_um = hp['tile_um'] + elif isinstance(hp, sf.ModelParams): + hp_px = hp.tile_px + hp_um = hp.tile_um + else: + raise ValueError(f"Unrecognized hyperparameter type {type(hp)}") + if self.tile_px != hp_px or self.tile_um != hp_um: + d_sz = f'({self.tile_px}px, tile_um={self.tile_um})' + m_sz = f'({hp_px}px, tile_um={hp_um})' + raise ValueError( + f"Dataset tile size {d_sz} does not match model {m_sz}" + ) + + def load_annotations(self, annotations: Union[str, pd.DataFrame]) -> None: + """Load annotations. + + Args: + annotations (Union[str, pd.DataFrame]): Either path to annotations + in CSV format, or a pandas DataFrame. + + Raises: + errors.AnnotationsError: If annotations are incorrectly formatted. + """ + if isinstance(annotations, str): + if not exists(annotations): + raise errors.AnnotationsError( + f'Unable to find annotations file {annotations}' + ) + try: + ann_df = pd.read_csv(annotations, dtype=str) + ann_df.fillna('', inplace=True) + self._annotations = ann_df + self.annotations_file = annotations + except pd.errors.EmptyDataError: + log.error(f"Unable to load empty annotations {annotations}") + elif isinstance(annotations, pd.core.frame.DataFrame): + annotations.fillna('', inplace=True) + self._annotations = annotations + else: + raise errors.AnnotationsError( + 'Invalid annotations format; expected path or DataFrame' + ) + + # Check annotations + assert self.annotations is not None + if len(self.annotations.columns) == 1: + raise errors.AnnotationsError( + "Only one annotations column detected (is it in CSV format?)" + ) + if len(self.annotations.columns) != len(set(self.annotations.columns)): + raise errors.AnnotationsError( + "Annotations file has duplicate headers; all must be unique" + ) + if 'patient' not in self.annotations.columns: + raise errors.AnnotationsError( + "Patient identifier 'patient' missing in annotations." + ) + if 'slide' not in self.annotations.columns: + if isinstance(annotations, pd.DataFrame): + raise errors.AnnotationsError( + "If loading annotations from a pandas DataFrame," + " must include column 'slide' containing slide names." + ) + log.info("Column 'slide' missing in annotations.") + log.info("Attempting to associate patients with slides...") + self.update_annotations_with_slidenames(annotations) + self.load_annotations(annotations) + + # Check for duplicate slides + ann = self.annotations.loc[self.annotations.slide.isin(self.slides())] + if not ann.slide.is_unique: + dup_slide_idx = ann.slide.duplicated() + dup_slides = ann.loc[dup_slide_idx].slide.to_numpy().tolist() + raise errors.DatasetError( + f"Duplicate slides found in annotations: {dup_slides}." + ) + + def balance( + self, + headers: Optional[Union[str, List[str]]] = None, + strategy: Optional[str] = 'category', + *, + force: bool = False, + ) -> "Dataset": + """Return a dataset with mini-batch balancing configured. + + Mini-batch balancing can be configured at tile, slide, patient, or + category levels. + + Balancing information is saved to the attribute ``prob_weights``, which + is used by the interleaving dataloaders when sampling from tfrecords + to create a batch. + + Tile level balancing will create prob_weights reflective of the number + of tiles per slide, thus causing the batch sampling to mirror random + sampling from the entire population of tiles (rather than randomly + sampling from slides). + + Slide level balancing is the default behavior, where batches are + assembled by randomly sampling from each slide/tfrecord with equal + probability. This balancing behavior would be the same as no balancing. + + Patient level balancing is used to randomly sample from individual + patients with equal probability. This is distinct from slide level + balancing, as some patients may have multiple slides per patient. + + Category level balancing takes a list of annotation header(s) and + generates prob_weights such that each category is sampled equally. + This requires categorical outcomes. + + Args: + headers (list of str, optional): List of annotation headers if + balancing by category. Defaults to None. + strategy (str, optional): 'tile', 'slide', 'patient' or 'category'. + Create prob_weights used to balance dataset batches to evenly + distribute slides, patients, or categories in a given batch. + Tile-level balancing generates prob_weights reflective of the + total number of tiles in a slide. Defaults to 'category.' + force (bool, optional): If using category-level balancing, + interpret all headers as categorical variables, even if the + header appears to be a float. + + Returns: + balanced :class:`slideflow.Dataset` object. + """ + ret = copy.deepcopy(self) + manifest = ret.manifest() + tfrecords = ret.tfrecords() + slides = [path_to_name(tfr) for tfr in tfrecords] + totals = { + tfr: (manifest[tfr]['total'] + if 'clipped' not in manifest[tfr] + else manifest[tfr]['clipped']) + for tfr in tfrecords + } + if not tfrecords: + raise errors.DatasetBalanceError( + "Unable to balance; no tfrecords found." + ) + + if strategy == 'none' or strategy is None: + return self + if strategy == 'tile': + ret.prob_weights = { + tfr: totals[tfr] / sum(totals.values()) for tfr in tfrecords + } + if strategy == 'slide': + ret.prob_weights = {tfr: 1/len(tfrecords) for tfr in tfrecords} + if strategy == 'patient': + pts = ret.patients() # Maps tfrecords to patients + r_pts = {} # Maps patients to list of tfrecords + for slide in pts: + if slide not in slides: + continue + if pts[slide] not in r_pts: + r_pts[pts[slide]] = [slide] + else: + r_pts[pts[slide]] += [slide] + ret.prob_weights = { + tfr: 1/(len(r_pts) * len(r_pts[pts[path_to_name(tfr)]])) + for tfr in tfrecords + } + if strategy == 'category': + if headers is None: + raise ValueError('Category balancing requires headers.') + # Ensure that header is not type 'float' + headers = sf.util.as_list(headers) + if any(ret.is_float(h) for h in headers) and not force: + raise errors.DatasetBalanceError( + f"Headers {','.join(headers)} appear to be `float`. " + "Categorical outcomes required for balancing. " + "To force balancing with these outcomes, pass " + "`force=True` to Dataset.balance()" + ) + labels, _ = ret.labels(headers, use_float=False) + cats = {} # type: Dict[str, Dict] + cat_prob = {} + tfr_cats = {} # type: Dict[str, str] + for tfrecord in tfrecords: + slide = path_to_name(tfrecord) + balance_cat = sf.util.as_list(labels[slide]) + balance_cat_str = '-'.join(map(str, balance_cat)) + tfr_cats[tfrecord] = balance_cat_str + tiles = totals[tfrecord] + if balance_cat_str not in cats: + cats.update({balance_cat_str: { + 'num_slides': 1, + 'num_tiles': tiles + }}) + else: + cats[balance_cat_str]['num_slides'] += 1 + cats[balance_cat_str]['num_tiles'] += tiles + for category in cats: + min_cat_slides = min([ + cats[i]['num_slides'] for i in cats + ]) + slides_in_cat = cats[category]['num_slides'] + cat_prob[category] = min_cat_slides / slides_in_cat + total_prob = sum([cat_prob[tfr_cats[tfr]] for tfr in tfrecords]) + ret.prob_weights = { + tfr: cat_prob[tfr_cats[tfr]]/total_prob for tfr in tfrecords + } + return ret + + def build_index( + self, + force: bool = True, + *, + num_workers: Optional[int] = None + ) -> None: + """Build index files for TFRecords. + + Args: + force (bool): Force re-build existing indices. + + Keyword args: + num_workers (int, optional): Number of workers to use for + building indices. Defaults to num_cpus, up to maximum of 16. + + Returns: + None + """ + if num_workers is None: + num_workers = min(sf.util.num_cpu(), 16) + if force: + index_to_update = self.tfrecords() + # Remove existing indices + for tfr in self.tfrecords(): + index = tfrecord2idx.find_index(tfr) + if index: + os.remove(index) + else: + index_to_update = [] + for tfr in self.tfrecords(): + index = tfrecord2idx.find_index(tfr) + if not index: + index_to_update.append(tfr) + elif (not tfrecord2idx.index_has_locations(index) + and sf.io.tfrecord_has_locations(tfr)): + os.remove(index) + index_to_update.append(tfr) + if not index_to_update: + return + if num_workers == 0: + # Single thread. + for tfr in track(index_to_update, + description=f'Updating index files...', + total=len(index_to_update), + transient=True): + _create_index(tfr, force=force) + else: + # Multiprocessing. + index_fn = partial(_create_index, force=force) + pool = mp.Pool( + sf.util.num_cpu(), + initializer=sf.util.set_ignore_sigint + ) + for _ in track(pool.imap_unordered(index_fn, index_to_update), + description=f'Updating index files...', + total=len(index_to_update), + transient=True): + pass + pool.close() + + def cell_segmentation( + self, + diam_um: float, + dest: str, + *, + model: Union["cellpose.models.Cellpose", str] = 'cyto2', + window_size: int = 256, + diam_mean: Optional[int] = None, + qc: Optional[str] = None, + qc_kwargs: Optional[dict] = None, + buffer: Optional[str] = None, + q_size: int = 2, + force: bool = False, + save_centroid: bool = True, + save_flow: bool = False, + **kwargs + ) -> None: + """Perform cell segmentation on slides, saving segmentation masks. + + Args: + diam_um (int, optional): Cell segmentation diameter, in microns. + dest (str): Destination in which to save cell segmentation masks. + + Keyword args: + batch_size (int): Batch size for cell segmentation. Defaults to 8. + cp_thresh (float): Cell probability threshold. All pixels with + value above threshold kept for masks, decrease to find more and + larger masks. Defaults to 0. + diam_mean (int, optional): Cell diameter to detect, in pixels + (without image resizing). If None, uses Cellpose defaults (17 + for the 'nuclei' model, 30 for all others). + downscale (float): Factor by which to downscale generated masks + after calculation. Defaults to None (keep masks at original + size). + flow_threshold (float): Flow error threshold (all cells with errors + below threshold are kept). Defaults to 0.4. + gpus (int, list(int)): GPUs to use for cell segmentation. + Defaults to 0 (first GPU). + interp (bool): Interpolate during 2D dynamics. Defaults to True. + qc (str): Slide-level quality control method to use before + performing cell segmentation. Defaults to "Otsu". + model (str, :class:`cellpose.models.Cellpose`): Cellpose model to + use for cell segmentation. May be any valid cellpose model. + Defaults to 'cyto2'. + mpp (float): Microns-per-pixel at which cells should be segmented. + Defaults to 0.5. + num_workers (int, optional): Number of workers. + Defaults to 2 * num_gpus. + save_centroid (bool): Save mask centroids. Increases memory + utilization slightly. Defaults to True. + save_flow (bool): Save flow values for the whole-slide image. + Increases memory utilization. Defaults to False. + sources (List[str]): List of dataset sources to include from + configuration file. + tile (bool): Tiles image to decrease GPU/CPU memory usage. + Defaults to True. + verbose (bool): Verbose log output at the INFO level. + Defaults to True. + window_size (int): Window size at which to segment cells across + a whole-slide image. Defaults to 256. + + Returns: + None + """ + from slideflow.cellseg import segment_slide + + if qc_kwargs is None: + qc_kwargs = {} + + slide_list = self.slide_paths() + if not force: + n_all = len(slide_list) + slide_list = [ + s for s in slide_list + if not exists( + join(dest, sf.util.path_to_name(s)+'-masks.zip') + ) + ] + n_skipped = n_all - len(slide_list) + if n_skipped: + log.info("Skipping {} slides (masks already generated)".format( + n_skipped + )) + if slide_list: + log.info(f"Segmenting cells for {len(slide_list)} slides.") + else: + log.info("No slides found.") + return + + if diam_mean is None: + diam_mean = 30 if model != 'nuclei' else 17 + tile_um = int(window_size * (diam_um / diam_mean)) + pb = TileExtractionProgress() + speed_task = pb.add_task( + "Speed: ", progress_type="speed", total=None + ) + slide_task = pb.add_task( + "Slides: ", progress_type="slide_progress", total=len(slide_list) + ) + q = Queue() # type: Queue + if buffer: + thread = threading.Thread( + target=_fill_queue, + args=(slide_list, q, q_size, buffer)) + thread.start() + + pb.start() + with sf.util.cleanup_progress(pb): + while True: + slide_path = q.get() + if slide_path is None: + q.task_done() + break + wsi = sf.WSI( + slide_path, + tile_px=window_size, + tile_um=tile_um, + verbose=False + ) + if qc is not None: + wsi.qc(qc, **qc_kwargs) + segment_task = pb.add_task( + "Segmenting... ", + progress_type="slide_progress", + total=wsi.estimated_num_tiles + ) + # Perform segmentation and save + segmentation = segment_slide( + wsi, + pb=pb, + pb_tasks=[speed_task, segment_task], + show_progress=False, + model=model, + diam_mean=diam_mean, + save_flow=save_flow, + **kwargs) + mask_dest = dest if dest is not None else dirname(slide_path) + segmentation.save( + join(mask_dest, f'{wsi.name}-masks.zip'), + flows=save_flow, + centroids=save_centroid) + pb.advance(slide_task) + pb.remove_task(segment_task) + + if buffer: + _debuffer_slide(slide_path) + q.task_done() + if buffer: + thread.join() + + def check_duplicates( + self, + dataset: Optional["Dataset"] = None, + px: int = 64, + mse_thresh: int = 100 + ) -> List[Tuple[str, str]]: + """Check for duplicate slides by comparing slide thumbnails. + + Args: + dataset (`slideflow.Dataset`, optional): Also check for + duplicate slides between this dataset and the provided dataset. + px (int): Pixel size at which to compare thumbnails. + Defaults to 64. + mse_thresh (int): MSE threshold below which an image pair is + considered duplicate. Defaults to 100. + + Returns: + List[str], optional: List of path pairs of potential duplicates. + """ + import cv2 + + thumbs = {} + dups = [] + + def mse(A, B): + """Calulate the mean squared error between two image matrices.""" + err = np.sum((A.astype("float") - B.astype("float")) ** 2) + err /= float(A.shape[0] * A.shape[1]) + return err + + def img_from_path(path): + """Read and resize an image.""" + img = cv2.imdecode( + np.fromfile(path, dtype=np.uint8), + cv2.IMREAD_UNCHANGED) + img = img[..., 0:3] + return cv2.resize(img, + dsize=(px, px), + interpolation=cv2.INTER_CUBIC) + + with tempfile.TemporaryDirectory() as temp_dir: + os.makedirs(join(temp_dir, 'this_dataset')) + self.thumbnails(join(temp_dir, 'this_dataset')) + if dataset: + os.makedirs(join(temp_dir, 'other_dataset')) + dataset.thumbnails(join(temp_dir, 'other_dataset')) + for subdir in os.listdir(temp_dir): + files = os.listdir(join(temp_dir, subdir)) + for file in tqdm(files, desc="Scanning for duplicates..."): + if dataset and subdir == 'other_dataset': + wsi_path = dataset.find_slide(slide=path_to_name(file)) + else: + wsi_path = self.find_slide(slide=path_to_name(file)) + assert wsi_path is not None + img = img_from_path(join(temp_dir, subdir, file)) + thumbs[wsi_path] = img + + # Check if this thumbnail has a duplicate + for existing_img in thumbs: + if wsi_path != existing_img: + img2 = thumbs[existing_img] + img_mse = mse(img, img2) + if img_mse < mse_thresh: + tqdm.write( + 'Possible duplicates: ' + '{} and {} (MSE: {})'.format( + wsi_path, + existing_img, + mse(img, img2) + ) + ) + dups += [(wsi_path, existing_img)] + if not dups: + log.info("No duplicates found.") + else: + log.info(f"{len(dups)} possible duplicates found.") + return dups + + def clear_filters(self) -> "Dataset": + """Return a dataset with all filters cleared. + + Returns: + :class:`slideflow.Dataset` object. + + """ + ret = copy.deepcopy(self) + ret._filters = {} + ret._filter_blank = [] + ret._min_tiles = 0 + return ret + + def clip( + self, + max_tiles: int = 0, + strategy: Optional[str] = None, + headers: Optional[List[str]] = None + ) -> "Dataset": + """Return a dataset with TFRecords clipped to a max number of tiles. + + Clip the number of tiles per tfrecord to a given maximum value and/or + to the min number of tiles per patient or category. + + Args: + max_tiles (int, optional): Clip the maximum number of tiles per + tfrecord to this number. Defaults to 0 (do not perform + tfrecord-level clipping). + strategy (str, optional): 'slide', 'patient', or 'category'. + Clip the maximum number of tiles to the minimum tiles seen + across slides, patients, or categories. If 'category', headers + must be provided. Defaults to None (do not perform group-level + clipping). + headers (list of str, optional): List of annotation headers to use + if clipping by minimum category count (strategy='category'). + Defaults to None. + + Returns: + clipped :class:`slideflow.Dataset` object. + + """ + if strategy == 'category' and not headers: + raise errors.DatasetClipError( + "headers must be provided if clip strategy is 'category'." + ) + if not max_tiles and strategy is None: + return self.unclip() + + ret = copy.deepcopy(self) + manifest = ret.manifest() + tfrecords = ret.tfrecords() + slides = [path_to_name(tfr) for tfr in tfrecords] + totals = {tfr: manifest[tfr]['total'] for tfr in tfrecords} + + if not tfrecords: + raise errors.DatasetClipError("No tfrecords found.") + if strategy == 'slide': + if max_tiles: + clip = min(min(totals.values()), max_tiles) + else: + clip = min(totals.values()) + ret._clip = { + tfr: (clip if totals[tfr] > clip else totals[tfr]) + for tfr in manifest + } + elif strategy == 'patient': + patients = ret.patients() # Maps slide name to patient + rev_patients = {} # Will map patients to list of slide names + slide_totals = {path_to_name(tfr): t for tfr, t in totals.items()} + for slide in patients: + if slide not in slides: + continue + if patients[slide] not in rev_patients: + rev_patients[patients[slide]] = [slide] + else: + rev_patients[patients[slide]] += [slide] + tiles_per_patient = { + pt: sum([slide_totals[slide] for slide in slide_list]) + for pt, slide_list in rev_patients.items() + } + if max_tiles: + clip = min(min(tiles_per_patient.values()), max_tiles) + else: + clip = min(tiles_per_patient.values()) + ret._clip = { + tfr: (clip + if slide_totals[path_to_name(tfr)] > clip + else totals[tfr]) + for tfr in manifest + } + elif strategy == 'category': + if headers is None: + raise ValueError("Category clipping requires arg `headers`") + labels, _ = ret.labels(headers, use_float=False) + categories = {} + cat_fraction = {} + tfr_cats = {} + for tfrecord in tfrecords: + slide = path_to_name(tfrecord) + balance_category = sf.util.as_list(labels[slide]) + balance_cat_str = '-'.join(map(str, balance_category)) + tfr_cats[tfrecord] = balance_cat_str + tiles = totals[tfrecord] + if balance_cat_str not in categories: + categories[balance_cat_str] = tiles + else: + categories[balance_cat_str] += tiles + + for category in categories: + min_cat_count = min([categories[i] for i in categories]) + cat_fraction[category] = min_cat_count / categories[category] + ret._clip = { + tfr: int(totals[tfr] * cat_fraction[tfr_cats[tfr]]) + for tfr in manifest + } + elif max_tiles: + ret._clip = { + tfr: (max_tiles if totals[tfr] > max_tiles else totals[tfr]) + for tfr in manifest + } + return ret + + def convert_xml_rois(self): + """Convert ImageScope XML ROI files to QuPath format CSV ROI files.""" + n_converted = 0 + xml_list = [] + for source in self.sources: + if self._roi_set(source): + xml_list = glob(join(self.sources[source]['roi'], "*.xml")) + if len(xml_list) == 0: + raise errors.DatasetError( + 'No XML files found. Check dataset configuration.' + ) + for xml in xml_list: + try: + sf.slide.utils.xml_to_csv(xml) + except errors.ROIError as e: + log.warning(f"Failed to convert XML roi {xml}: {e}") + else: + n_converted += 1 + log.info(f'Converted {n_converted} XML ROIs -> CSV') + + def get_tile_dataframe( + self, + roi_method: str = 'auto', + stride_div: int = 1, + ) -> pd.DataFrame: + """Generate a pandas dataframe with tile-level ROI labels. + + Returns: + Pandas dataframe of all tiles, with the following columns: + - ``loc_x``: X-coordinate of tile center + - ``loc_y``: Y-coordinate of tile center + - ``grid_x``: X grid index of the tile + - ``grid_y``: Y grid index of the tile + - ``roi_name``: Name of the ROI if tile is in an ROI, else None + - ``roi_desc``: Description of the ROI if tile is in ROI, else None + - ``label``: ROI label, if present. + + """ + df = None + with mp.Pool(4, initializer=sf.util.set_ignore_sigint) as pool: + fn = partial( + _get_tile_df, + tile_px=self.tile_px, + tile_um=self.tile_um, + rois=self.rois(), + stride_div=stride_div, + roi_method=roi_method + ) + for _df in track(pool.imap_unordered(fn, self.slide_paths()), + description=f'Building...', + total=len(self.slide_paths()), + transient=True): + if df is None: + df = _df + else: + df = pd.concat([df, _df], axis=0, join='outer') + + return df + + def get_unique_roi_labels(self, allow_empty: bool = False) -> List[str]: + """Get a list of unique ROI labels for all slides in this dataset.""" + + # Get a list of unique labels. + roi_unique_labels = [] + for roi in self.rois(): + _df = pd.read_csv(roi) + if 'label' not in _df.columns: + continue + unique = [ + l for l in _df.label.unique().tolist() + if (l not in roi_unique_labels) + ] + roi_unique_labels += unique + without_nan = sorted([ + l for l in roi_unique_labels + if (not isinstance(l, float) or not np.isnan(l)) + ]) + if allow_empty and (len(roi_unique_labels) > len(without_nan)): + return without_nan + [None] + else: + return without_nan + + def extract_cells( + self, + masks_path: str, + **kwargs + ) -> Dict[str, SlideReport]: + """Extract cell images from slides, with a tile at each cell centroid. + + Requires that cells have already been segmented with + ``Dataset.cell_segmentation()``. + + Args: + masks_path (str): Location of saved segmentation masks. + + Keyword Args: + apply_masks (bool): Apply cell segmentation masks to the extracted + tiles. Defaults to True. + **kwargs: All other keyword arguments for + :meth:`Dataset.extract_tiles()`. + + Returns: + Dictionary mapping slide paths to each slide's SlideReport + (:class:`slideflow.slide.report.SlideReport`) + + """ + from slideflow.cellseg.seg_utils import ApplySegmentation + + # Add WSI segmentation as slide-level transformation. + qc = [] if 'qc' not in kwargs else kwargs['qc'] + if not isinstance(qc, list): + qc = [qc] + qc.append(ApplySegmentation(masks_path)) + kwargs['qc'] = qc + + # Extract tiles from segmentation centroids. + return self.extract_tiles( + from_centroids=True, + **kwargs + ) + + def extract_tiles( + self, + *, + save_tiles: bool = False, + save_tfrecords: bool = True, + source: Optional[str] = None, + stride_div: int = 1, + enable_downsample: bool = True, + roi_method: str = 'auto', + roi_filter_method: Union[str, float] = 'center', + skip_extracted: bool = True, + tma: bool = False, + randomize_origin: bool = False, + buffer: Optional[str] = None, + q_size: int = 2, + qc: Optional[Union[str, Callable, List[Callable]]] = None, + report: bool = True, + use_edge_tiles: bool = False, + artifact_labels: Optional[Union[List[str], str]] = list(), + mpp_override: Optional[float] = None, + **kwargs: Any + ) -> Dict[str, SlideReport]: + r"""Extract tiles from a group of slides. + + Extracted tiles are saved either loose image or in TFRecord format. + + Extracted tiles are either saved in TFRecord format + (``save_tfrecords=True``, default) or as loose \*.jpg / \*.png images + (``save_tiles=True``). TFRecords or image tiles are saved in the + the tfrecord and tile directories configured by + :class:`slideflow.Dataset`. + + Keyword Args: + save_tiles (bool, optional): Save tile images in loose format. + Defaults to False. + save_tfrecords (bool): Save compressed image data from + extracted tiles into TFRecords in the corresponding TFRecord + directory. Defaults to True. + source (str, optional): Name of dataset source from which to select + slides for extraction. Defaults to None. If not provided, will + default to all sources in project. + stride_div (int): Stride divisor for tile extraction. + A stride of 1 will extract non-overlapping tiles. + A stride_div of 2 will extract overlapping tiles, with a stride + equal to 50% of the tile width. Defaults to 1. + enable_downsample (bool): Enable downsampling for slides. + This may result in corrupted image tiles if downsampled slide + layers are corrupted or incomplete. Defaults to True. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and skip the slide if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + roi_filter_method (str or float): Method of filtering tiles with + ROIs. Either 'center' or float (0-1). If 'center', tiles are + filtered with ROIs based on the center of the tile. If float, + tiles are filtered based on the proportion of the tile inside + the ROI, and ``roi_filter_method`` is interpreted as a + threshold. If the proportion of a tile inside the ROI is + greater than this number, the tile is included. For example, + if ``roi_filter_method=0.7``, a tile that is 80% inside of an + ROI will be included, and a tile that is 50% inside of an ROI + will be excluded. Defaults to 'center'. + skip_extracted (bool): Skip slides that have already + been extracted. Defaults to True. + tma (bool): Reads slides as Tumor Micro-Arrays (TMAs). + Deprecated argument; all slides are now read as standard WSIs. + randomize_origin (bool): Randomize pixel starting + position during extraction. Defaults to False. + buffer (str, optional): Slides will be copied to this directory + before extraction. Defaults to None. Using an SSD or ramdisk + buffer vastly improves tile extraction speed. + q_size (int): Size of queue when using a buffer. + Defaults to 2. + qc (str, optional): 'otsu', 'blur', 'both', or None. Perform blur + detection quality control - discarding tiles with detected + out-of-focus regions or artifact - and/or otsu's method. + Increases tile extraction time. Defaults to None. + report (bool): Save a PDF report of tile extraction. + Defaults to True. + normalizer (str, optional): Normalization strategy. + Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + whitespace_fraction (float, optional): Range 0-1. Discard tiles + with this fraction of whitespace. If 1, will not perform + whitespace filtering. Defaults to 1. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. + If 1, will not perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this threshold are + considered grayspace. + img_format (str, optional): 'png' or 'jpg'. Defaults to 'jpg'. + Image format to use in tfrecords. PNG (lossless) for fidelity, + JPG (lossy) for efficiency. + shuffle (bool, optional): Shuffle tiles prior to storage in + tfrecords. Defaults to True. + num_threads (int, optional): Number of worker processes for each + tile extractor. When using cuCIM slide reading backend, + defaults to the total number of available CPU cores, using the + 'fork' multiprocessing method. With Libvips, this defaults to + the total number of available CPU cores or 32, whichever is + lower, using 'spawn' multiprocessing. + qc_blur_radius (int, optional): Quality control blur radius for + out-of-focus area detection. Used if qc=True. Defaults to 3. + qc_blur_threshold (float, optional): Quality control blur threshold + for detecting out-of-focus areas. Only used if qc=True. + Defaults to 0.1 + qc_filter_threshold (float, optional): Float between 0-1. Tiles + with more than this proportion of blur will be discarded. + Only used if qc=True. Defaults to 0.6. + qc_mpp (float, optional): Microns-per-pixel indicating image + magnification level at which quality control is performed. + Defaults to mpp=4 (effective magnification 2.5 X) + dry_run (bool, optional): Determine tiles that would be extracted, + but do not export any images. Defaults to None. + max_tiles (int, optional): Only extract this many tiles per slide. + Defaults to None. + use_edge_tiles (bool): Use edge tiles in extraction. Areas + outside the slide will be padded white. Defaults to False. + artifact_labels (list(str) or str, optional): List of ROI issue labels + to treat as artifacts. Whenever this is not None, all the ROIs with + referred label will be inverted with ROI.invert(). + Defaults to an empty list. + mpp_override (float, optional): Override the microns-per-pixel + for each slide. If None, will auto-detect microns-per-pixel + for all slides and raise an error if MPP is not found. + Defaults to None. + + Returns: + Dictionary mapping slide paths to each slide's SlideReport + (:class:`slideflow.slide.report.SlideReport`) + """ + if tma: + warnings.warn( + "tma=True is deprecated and will be removed in a future " + "version. Tumor micro-arrays are read as standard slides. " + ) + if not self.tile_px or not self.tile_um: + raise errors.DatasetError( + "Dataset tile_px and tile_um must be != 0 to extract tiles" + ) + if source: + sources = sf.util.as_list(source) # type: List[str] + else: + sources = list(self.sources.keys()) + all_reports = [] + self.verify_annotations_slides() + + if isinstance(artifact_labels, str): + artifact_labels = [artifact_labels] + + # Log the active slide reading backend + col = 'green' if sf.slide_backend() == 'cucim' else 'cyan' + log.info(f"Slide reading backend: [{col}]{sf.slide_backend()}[/]") + + # Set up kwargs for tile extraction generator and quality control + qc_kwargs = {k[3:]: v for k, v in kwargs.items() if k[:3] == 'qc_'} + kwargs = {k: v for k, v in kwargs.items() if k[:3] != 'qc_'} + sf.slide.log_extraction_params(**kwargs) + + for source in sources: + log.info(f'Working on dataset source [bold]{source}[/]...') + if self._roi_set(source): + roi_dir = self.sources[source]['roi'] + else: + roi_dir = None + src_conf = self.sources[source] + if 'dry_run' not in kwargs or not kwargs['dry_run']: + if save_tfrecords and not self._tfrecords_set(source): + log.error(f"tfrecords path not set for source {source}") + continue + elif save_tfrecords: + tfrecord_dir = join( + src_conf['tfrecords'], + src_conf['label'] + ) + else: + tfrecord_dir = None + if save_tiles and not self._tiles_set(source): + log.error(f"tiles path not set for source {source}") + continue + elif save_tiles: + tiles_dir = join(src_conf['tiles'], src_conf['label']) + else: + tiles_dir = None + if save_tfrecords and not exists(tfrecord_dir): + os.makedirs(tfrecord_dir) + if save_tiles and not exists(tiles_dir): + os.makedirs(tiles_dir) + else: + save_tfrecords, save_tiles = False, False + tfrecord_dir, tiles_dir = None, None + + # Prepare list of slides for extraction + slide_list = self.slide_paths(source=source) + + # Check for interrupted or already-extracted tfrecords + if skip_extracted and save_tfrecords: + done = [ + path_to_name(tfr) for tfr in self.tfrecords(source=source) + ] + _dir = tfrecord_dir if tfrecord_dir else tiles_dir + unfinished = glob(join((_dir), '*.unfinished')) + interrupted = [path_to_name(marker) for marker in unfinished] + if len(interrupted): + log.info(f'Re-extracting {len(interrupted)} interrupted:') + for interrupted_slide in interrupted: + log.info(interrupted_slide) + if interrupted_slide in done: + del done[done.index(interrupted_slide)] + + slide_list = [ + s for s in slide_list if path_to_name(s) not in done + ] + if len(done): + log.info(f'Skipping {len(done)} slides; already done.') + _tail = f"(tile_px={self.tile_px}, tile_um={self.tile_um})" + log.info(f'Extracting tiles from {len(slide_list)} slides {_tail}') + + # Use multithreading if specified, extracting tiles + # from all slides in the filtered list + if len(slide_list): + q = Queue() # type: Queue + # Forking incompatible with some libvips configurations + ptype = 'spawn' if sf.slide_backend() == 'libvips' else 'fork' + ctx = mp.get_context(ptype) + manager = ctx.Manager() + reports = manager.dict() + kwargs['report'] = report + + # Use a single shared multiprocessing pool + if 'num_threads' not in kwargs: + num_threads = sf.util.num_cpu() + if num_threads is None: + num_threads = 8 + if sf.slide_backend() == 'libvips': + num_threads = min(num_threads, 32) + else: + num_threads = kwargs['num_threads'] + if num_threads != 1: + pool = kwargs['pool'] = ctx.Pool( + num_threads, + initializer=sf.util.set_ignore_sigint + ) + qc_kwargs['pool'] = pool + else: + pool = None + ptype = None + log.info(f'Using {num_threads} processes (pool={ptype})') + + # Set up the multiprocessing progress bar + pb = TileExtractionProgress() + pb.add_task( + "Speed: ", + progress_type="speed", + total=None) + slide_task = pb.add_task( + f"Extracting ({source})...", + progress_type="slide_progress", + total=len(slide_list)) + + wsi_kwargs = { + 'tile_px': self.tile_px, + 'tile_um': self.tile_um, + 'stride_div': stride_div, + 'enable_downsample': enable_downsample, + 'roi_dir': roi_dir, + 'roi_method': roi_method, + 'roi_filter_method': roi_filter_method, + 'origin': 'random' if randomize_origin else (0, 0), + 'pb': pb, + 'use_edge_tiles': use_edge_tiles, + 'artifact_labels': artifact_labels, + 'mpp': mpp_override + } + extraction_kwargs = { + 'tfrecord_dir': tfrecord_dir, + 'tiles_dir': tiles_dir, + 'reports': reports, + 'qc': qc, + 'generator_kwargs': kwargs, + 'qc_kwargs': qc_kwargs, + 'wsi_kwargs': wsi_kwargs, + 'render_thumb': (buffer is not None) + } + pb.start() + with sf.util.cleanup_progress(pb): + if buffer: + # Start the worker threads + thread = threading.Thread( + target=_fill_queue, + args=(slide_list, q, q_size, buffer)) + thread.start() + + # Grab slide path from queue and start extraction + while True: + path = q.get() + if path is None: + q.task_done() + break + _tile_extractor(path, **extraction_kwargs) + pb.advance(slide_task) + _debuffer_slide(path) + q.task_done() + thread.join() + else: + for slide in slide_list: + with _handle_slide_errors(slide): + wsi = _prepare_slide( + slide, + report_dir=tfrecord_dir, + wsi_kwargs=wsi_kwargs, + qc=qc, + qc_kwargs=qc_kwargs) + if wsi is not None: + log.debug(f'Extracting tiles for {wsi.name}') + wsi_report = wsi.extract_tiles( + tfrecord_dir=tfrecord_dir, + tiles_dir=tiles_dir, + **kwargs + ) + reports.update({wsi.path: wsi_report}) + del wsi + pb.advance(slide_task) + + # Generate PDF report. + if report: + log.info('Generating PDF (this may take some time)...', ) + rep_vals = list( + reports.copy().values() + ) # type: List[SlideReport] + all_reports += rep_vals + num_slides = len(slide_list) + img_kwargs = defaultdict(lambda: None) # type: Dict + img_kwargs.update(kwargs) + img_kwargs = sf.slide.utils._update_kw_with_defaults(img_kwargs) + report_meta = types.SimpleNamespace( + tile_px=self.tile_px, + tile_um=self.tile_um, + qc=qc, + total_slides=num_slides, + slides_skipped=len([r for r in rep_vals if r is None]), + roi_method=roi_method, + stride=stride_div, + gs_frac=img_kwargs['grayspace_fraction'], + gs_thresh=img_kwargs['grayspace_threshold'], + ws_frac=img_kwargs['whitespace_fraction'], + ws_thresh=img_kwargs['whitespace_threshold'], + normalizer=img_kwargs['normalizer'], + img_format=img_kwargs['img_format'] + ) + pdf_report = ExtractionReport( + [r for r in rep_vals if r is not None], + meta=report_meta, + pool=pool + ) + _time = datetime.now().strftime('%Y%m%d-%H%M%S') + pdf_dir = tfrecord_dir if tfrecord_dir else '' + pdf_report.save( + join(pdf_dir, f'tile_extraction_report-{_time}.pdf') + ) + pdf_report.update_csv( + join(pdf_dir, 'extraction_report.csv') + ) + warn_path = join(pdf_dir, f'warn_report-{_time}.txt') + if pdf_report.warn_txt: + with open(warn_path, 'w') as warn_f: + warn_f.write(pdf_report.warn_txt) + + # Close the multiprocessing pool. + if pool is not None: + pool.close() + + # Update manifest & rebuild indices + self.update_manifest(force_update=True) + self.build_index(True) + all_reports = [r for r in all_reports if r is not None] + return {report.path: report for report in all_reports} + + def extract_tiles_from_tfrecords(self, dest: str) -> None: + """Extract tiles from a set of TFRecords. + + Args: + dest (str): Path to directory in which to save tile images. + If None, uses dataset default. Defaults to None. + + """ + for source in self.sources: + to_extract_tfrecords = self.tfrecords(source=source) + if dest: + tiles_dir = dest + elif self._tiles_set(source): + tiles_dir = join(self.sources[source]['tiles'], + self.sources[source]['label']) + if not exists(tiles_dir): + os.makedirs(tiles_dir) + else: + log.error(f"tiles directory not set for source {source}") + continue + for tfr in to_extract_tfrecords: + sf.io.extract_tiles(tfr, tiles_dir) + + def filter(self, *args: Any, **kwargs: Any) -> "Dataset": + """Return a filtered dataset. + + This method can either accept a single argument (``filters``) or any + combination of keyword arguments (``filters``, ``filter_blank``, or + ``min_tiles``). + + Keyword Args: + filters (dict, optional): Dictionary used for filtering + the dataset. Dictionary keys should be column headers in the + patient annotations, and the values should be the variable + states to be included in the dataset. For example, + ``filters={'HPV_status': ['negative', 'positive']}`` + would filter the dataset by the column ``HPV_status`` and only + include slides with values of either ``'negative'`` or + ``'positive'`` in this column. + See `Filtering <https://slideflow.dev/datasets_and_val/#filtering>`_ + for further discussion. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int): Filter out tfrecords that have less than this + minimum number of tiles. Defaults to 0. + + Returns: + :class:`slideflow.Dataset`: Dataset with filter added. + """ + if len(args) == 1 and 'filters' not in kwargs: + kwargs['filters'] = args[0] + elif len(args): + raise ValueError( + "filter() accepts either one argument (filters), or any " + "combination of keywords (filters, filter_blank, min_tiles)" + ) + for kwarg in kwargs: + if kwarg not in ('filters', 'filter_blank', 'min_tiles'): + raise ValueError(f'Unknown filtering argument {kwarg}') + ret = copy.deepcopy(self) + if 'filters' in kwargs and kwargs['filters'] is not None: + if not isinstance(kwargs['filters'], dict): + raise TypeError("'filters' must be a dict.") + ret._filters.update(kwargs['filters']) + if 'filter_blank' in kwargs and kwargs['filter_blank'] is not None: + if not isinstance(kwargs['filter_blank'], list): + kwargs['filter_blank'] = [kwargs['filter_blank']] + ret._filter_blank += kwargs['filter_blank'] + if 'min_tiles' in kwargs and kwargs['min_tiles'] is not None: + if not isinstance(kwargs['min_tiles'], int): + raise TypeError("'min_tiles' must be an int.") + ret._min_tiles = kwargs['min_tiles'] + return ret + + def filter_bags_by_roi( + self, + bags_path: str, + dest: str, + *, + tile_df: Optional[pd.DataFrame] = None + ) -> None: + """Filter bags by tiles in an ROI.""" + import torch + + #TODO: extend to tfrecords + #TODO: accelerate with multiprocessing + #TODO: save filtered indices + #TODO: copy bags config + + if tile_df is None: + tile_df = self.get_tile_dataframe() + if not exists(dest): + os.makedirs(dest) + + # Subset the dataframe to only include tiles with an ROI + roi_df = tile_df.loc[tile_df.roi_name.notnull()] + + n_complete = 0 + for slide in tqdm(roi_df.slide.unique()): + if not exists(join(bags_path, slide+'.pt')): + continue + + # Get the bag + bag = torch.load(join(bags_path, slide+'.pt')) + bag_index = np.load(join(bags_path, slide+'.index.npz'))['arr_0'] + + # Subset the ROI based on this slide + slide_df = roi_df.loc[roi_df.slide == slide] + + # Get the common locations (in an ROI) + bag_locs = {tuple(r) for r in bag_index} + roi_locs = {tuple(r) for r in np.stack([slide_df.loc_x.values, slide_df.loc_y.values], axis=1)} + common_locs = bag_locs.intersection(roi_locs) + + # Find indices in the bag that match the common locations (in an ROI) + bag_i = [i for i, row in enumerate(bag_index) if tuple(row) in common_locs] + + if not len(bag_i): + log.debug("No common locations found for {}".format(slide)) + continue + + # Subset and save the bag + bag = bag[bag_i] + torch.save(bag, join(dest, slide+'.pt')) + log.debug("Subset size ({}): {} -> {}".format(slide, len(bag_index), len(bag))) + n_complete += 1 + + log.info("Bag filtering complete. {} bags filtered.".format(n_complete)) + + def find_rois(self, slide: str) -> Optional[str]: + """Find an ROI path from a given slide. + + Args: + slide (str): Slide name. + + Returns: + str: Matching path to ROI, if found. If not found, returns None + """ + rois = self.rois() + if not rois: + return None + for roi in rois: + if path_to_name(roi) == slide: + return roi + return None + + def find_slide( + self, + *, + slide: Optional[str] = None, + patient: Optional[str] = None + ) -> Optional[str]: + """Find a slide path from a given slide or patient. + + Keyword args: + slide (str): Find a tfrecord associated with this slide name. + patient (str): Find a tfrecord associated with this patient. + + Returns: + str: Matching path to slide, if found. If not found, returns None + """ + if slide is None and patient is None: + raise ValueError("Must supply either slide or patient.") + if slide is not None and patient is not None: + raise ValueError("Must supply either slide or patient, not both.") + + if slide is not None: + filtered = self.filter({'slide': slide}) + if patient is not None: + filtered = self.filter({'slide': patient}) + matching = filtered.slide_paths() + if not len(matching): + return None + else: + return matching[0] + + def find_tfrecord( + self, + *, + slide: Optional[str] = None, + patient: Optional[str] = None + ) -> Optional[str]: + """Find a TFRecord path from a given slide or patient. + + Keyword args: + slide (str): Find a tfrecord associated with this slide name. + patient (str): Find a tfrecord associated with this patient. + + Returns: + str: Matching path to tfrecord, if found. Otherwise, returns None + """ + if slide is None and patient is None: + raise ValueError("Must supply either slide or patient.") + if slide is not None and patient is not None: + raise ValueError("Must supply either slide or patient, not both.") + + if slide is not None: + filtered = self.filter({'slide': slide}) + if patient is not None: + filtered = self.filter({'slide': patient}) + matching = filtered.tfrecords() + if not len(matching): + return None + else: + return matching[0] + + def generate_feature_bags( + self, + model: Union[str, "BaseFeatureExtractor"], + outdir: str, + *, + force_regenerate: bool = False, + batch_size: int = 32, + slide_batch_size: int = 16, + num_gpus: int = 0, + **kwargs: Any + ) -> None: + """Generate bags of tile-level features for slides for use with MIL models. + + Args: + model (str): Path to model from which to generate activations. + May provide either this or "pt_files" + outdir (str, optional): Save exported activations in .pt format. + + Keyword Args: + layers (list): Which model layer(s) generate activations. + If ``model`` is a saved model, this defaults to 'postconv'. + Not used if ``model`` is pretrained feature extractor. + Defaults to None. + force_regenerate (bool): Forcibly regenerate activations + for all slides even if .pt file exists. Defaults to False. + batch_size (int): Batch size during feature calculation. + Defaults to 32. + slide_batch_size (int): Interleave feature calculation across + this many slides. Higher values may improve performance + but require more memory. Defaults to 16. + num_gpus (int): Number of GPUs to use for feature extraction. + Defaults to 0. + **kwargs: Additional keyword arguments are passed to + :class:`slideflow.DatasetFeatures`. + + """ + if not sf.util.torch_available: + raise RuntimeError( + "Pytorch is required for generating feature bags. " + "Please install Pytorch and try again." + ) + + # Interpret model argument. + if isinstance(model, str) and sf.model.is_extractor(model): + # Model is a architecture name (for Imagenet pretrained model) + log.info(f"Building feature extractor: [green]{model}[/]") + layer_kw = dict(layers=kwargs['layers']) if 'layers' in kwargs else dict() + model = sf.build_feature_extractor(model, **layer_kw) + + elif isinstance(model, str) and exists(model): + # Model is a path to a trained slideflow model + log.info(f"Using model: [green]{model}[/]") + + elif isinstance(model, str) and not exists(model): + # Model is a string but not a path to a saved model + raise ValueError( + f"'{model}' is neither a path to a saved model nor the name " + "of a valid feature extractor (use sf.model.list_extractors() " + "for a list of all available feature extractors).") + + elif not isinstance(model, str): + # Model is a feature extractor object + from slideflow.model.base import BaseFeatureExtractor + if not isinstance(model, BaseFeatureExtractor): + raise ValueError( + f"'{model}' is neither a path to a saved model nor the name " + "of a valid feature extractor (use sf.model.list_extractors() " + "for a list of all available feature extractors).") + log.info(f"Using feature extractor: [green]{model.tag}[/]") + + # Create the pt_files directory + if not exists(outdir): + os.makedirs(outdir) + + # Detect already generated pt files + done = [ + path_to_name(f) for f in os.listdir(outdir) + if sf.util.path_to_ext(join(outdir, f)) == 'pt' + ] + + # Work from this dataset. + dataset = self + + if not force_regenerate and len(done): + all_slides = dataset.slides() + slides_to_generate = [s for s in all_slides if s not in done] + if len(slides_to_generate) != len(all_slides): + to_skip = len(all_slides) - len(slides_to_generate) + skip_p = f'{to_skip}/{len(all_slides)}' + log.info(f"Skipping {skip_p} finished slides.") + if not slides_to_generate: + log.warn("No slides for which to generate features.") + return outdir + dataset = dataset.filter(filters={'slide': slides_to_generate}) + filtered_slides_to_generate = dataset.slides() + log.info(f'Working on {len(filtered_slides_to_generate)} slides') + + # Verify TFRecords are available + n_tfrecords = len(dataset.tfrecords()) + n_slides = len(dataset.slides()) + if not n_tfrecords: + log.warning("Unable to generate features; no TFRecords found.") + return outdir + elif n_tfrecords < n_slides: + log.warning("{} tfrecords missing.".format(n_slides - n_tfrecords)) + + # Rebuild any missing index files. + # Must be done before the progress bar is started. + dataset.build_index(False) + + # Set up progress bar. + pb = sf.util.FeatureExtractionProgress() + pb.add_task( + "Speed: ", + progress_type="speed", + total=self.num_tiles + ) + slide_task = pb.add_task( + "Generating...", + progress_type="slide_progress", + total=n_slides + ) + pb.start() + + # Prepare keyword arguments. + dts_kwargs = dict( + include_preds=False, + include_uncertainty=False, + batch_size=batch_size, + verbose=False, + progress=False, + **kwargs + ) + + # Set up activations interface. + # Calculate features one slide at a time to reduce memory consumption. + with sf.util.cleanup_progress(pb): + if not num_gpus > 1: + sf.model.features._export_bags( + model, + dataset, + slides=dataset.slides(), + slide_batch_size=slide_batch_size, + pb=pb, + outdir=outdir, + slide_task=slide_task, + **dts_kwargs + ) + + else: + if not hasattr(model, 'dump_config'): + raise ValueError( + "Feature extraction with multiple GPUs is only " + "supported for feature extractors with a dump_config() " + "attribute. Please set num_gpus=1 or use a different " + "feature extractor." + ) + import torch + model_cfg = sf.model.extractors.extractor_to_config(model) + + # Mixed precision and channels_last config + if hasattr(model, "mixed_precision"): + mixed_precision = model.mixed_precision + else: + mixed_precision = None + if hasattr(model, "channels_last"): + channels_last = model.channels_last + else: + channels_last = None + + with MultiprocessProgress(pb) as mp_pb: + torch.multiprocessing.spawn( + sf.model.features._distributed_export, + args=( + model_cfg, + dataset, + [n.tolist() for n in np.array_split(dataset.slides(), + num_gpus)], + slide_batch_size, + mp_pb.tracker, + outdir, + slide_task, + dts_kwargs, + mixed_precision, + channels_last + ), + nprocs=num_gpus + ) + + def generate_rois( + self, + model: str, + *, + overwrite: bool = False, + dest: Optional[str] = None, + **kwargs + ): + """Generate ROIs using a U-Net model. + + Args: + model (str): Path to model (zip) or model configuration (json). + + Keyword args: + overwrite (bool, optional): Overwrite existing ROIs. Defaults to False. + dest (str, optional): Destination directory for generated ROIs. + If not provided, uses the dataset's default ROI directory. + sq_mm_threshold (float, optional): If not None, filter out ROIs with an area + less than the given threshold (in square millimeters). Defaults to None. + + """ + + # Load the model configuration. + segment = sf.slide.qc.StridedSegment(model) + + for slide in track(self.slide_paths(), description='Generating...'): + + # Set the destination directory + source = self.get_slide_source(slide) + if 'roi' not in self.sources[source] and dest is None: + raise errors.DatasetError( + "No ROI directory set. Please set an ROI directory in the " + "dataset configuration, or provide a destination directory " + "with the `dest` argument." + ) + if dest is None: + dest = self.sources[source]['roi'] + if not exists(dest): + os.makedirs(dest) + + # Check if an ROI already exists. + existing_rois = [path_to_name(f) for f in os.listdir(dest) if f.endswith('csv')] + if path_to_name(slide) in existing_rois: + if overwrite: + log.info(f"Overwriting ROI for slide {path_to_name(slide)} at {dest}") + else: + log.info(f"ROI already exists for slide {path_to_name(slide)} at {dest}") + continue + + # Load the slide and remove any existing auto-loaded ROIs. + log.info("Working on {}...".format(slide)) + try: + wsi = sf.WSI(slide, 299, 512, verbose=False) + wsi.rois = [] + + # Generate and apply ROIs. + segment.generate_rois(wsi, apply=True, **kwargs) + except Exception as e: + log.error(f"Failed to generate ROIs for {slide}: {e}") + continue + + # Export ROIs to CSV. + wsi.export_rois(join(dest, wsi.name + '.csv')) + + def get_slide_source(self, slide: str) -> str: + """Return the source of a given slide. + + Args: + slide (str): Slide name. + + Returns: + str: Source name. + + """ + for source in self.sources: + paths = self.slide_paths(source=source) + names = [path_to_name(path) for path in paths] + if slide in paths or slide in names: + return source + raise errors.DatasetError(f"Could not find slide '{slide}'") + + def get_tfrecord_locations(self, slide: str) -> List[Tuple[int, int]]: + """Return a list of locations stored in an associated TFRecord. + + Args: + slide (str): Slide name. + + Returns: + List of tuples of (x, y) coordinates. + + """ + tfr = self.find_tfrecord(slide=slide) + if tfr is None: + raise errors.TFRecordsError( + f"Could not find associated TFRecord for slide '{slide}'" + ) + tfr_idx = sf.util.tfrecord2idx.find_index(tfr) + if not tfr_idx: + _create_index(tfr) + elif tfr_idx.endswith('index'): + log.info(f"Updating index for {tfr}...") + os.remove(tfr_idx) + _create_index(tfr) + return sf.io.get_locations_from_tfrecord(tfr) + + def harmonize_labels( + self, + *args: "Dataset", + header: Optional[str] = None + ) -> Dict[str, int]: + """Harmonize labels with another dataset. + + Returns categorical label assignments converted to int, harmonized with + another dataset to ensure label consistency between datasets. + + Args: + *args (:class:`slideflow.Dataset`): Any number of Datasets. + header (str): Categorical annotation header. + + Returns: + Dict mapping slide names to categories. + + """ + if header is None: + raise ValueError("Must supply kwarg 'header'") + if not isinstance(header, str): + raise ValueError('Harmonized labels require a single header.') + + _, my_unique = self.labels(header, use_float=False) + other_uniques = [ + np.array(dts.labels(header, use_float=False)[1]) for dts in args + ] + other_uniques = other_uniques + [np.array(my_unique)] + uniques_list = np.concatenate(other_uniques).tolist() + all_unique = sorted(list(set(uniques_list))) + labels_to_int = dict(zip(all_unique, range(len(all_unique)))) + return labels_to_int + + def is_float(self, header: str) -> bool: + """Check if labels in the given header can all be converted to float. + + Args: + header (str): Annotations column header. + + Returns: + bool: If all values from header can be converted to float. + + """ + if self.annotations is None: + raise errors.DatasetError("Annotations not loaded.") + filtered_labels = self.filtered_annotations[header] + try: + filtered_labels = [float(o) for o in filtered_labels] + return True + except ValueError: + return False + + def kfold_split( + self, + k: int, + *, + labels: Optional[Union[Dict, str]] = None, + preserved_site: bool = False, + site_labels: Optional[Union[str, Dict[str, str]]] = 'site', + splits: Optional[str] = None, + read_only: bool = False, + ) -> Tuple[Tuple["Dataset", "Dataset"], ...]: + """Split the dataset into k cross-folds. + + Args: + k (int): Number of cross-folds. + + Keyword args: + labels (dict or str, optional): Either a dictionary mapping slides + to labels, or an outcome label (``str``). Used for balancing + outcome labels in training and validation cohorts. If None, + will not balance k-fold splits by outcome labels. Defaults + to None. + preserved_site (bool): Split with site-preserved cross-validation. + Defaults to False. + site_labels (dict, optional): Dict mapping patients to site labels, + or an outcome column with site labels. Only used for site + preserved cross validation. Defaults to 'site'. + splits (str, optional): Path to JSON file containing validation + splits. Defaults to None. + read_only (bool): Prevents writing validation splits to file. + Defaults to False. + + """ + if splits is None: + temp_dir = tempfile.TemporaryDirectory() + splits = join(temp_dir.name, '_splits.json') + else: + temp_dir = None + crossval_splits = [] + for k_fold_iter in range(k): + split_kw = dict( + labels=labels, + val_strategy=('k-fold-preserved-site' if preserved_site + else 'k-fold'), + val_k_fold=k, + k_fold_iter=k_fold_iter+1, + site_labels=site_labels, + splits=splits, + read_only=read_only + ) + crossval_splits.append(self.split(**split_kw)) + if temp_dir is not None: + temp_dir.cleanup() + return tuple(crossval_splits) + + def labels( + self, + headers: Union[str, List[str]], + use_float: Union[bool, Dict, str] = False, + assign: Optional[Dict[str, Dict[str, int]]] = None, + format: str = 'index' + ) -> Tuple[Labels, Union[Dict[str, Union[List[str], List[float]]], + List[str], + List[float]]]: + """Return a dict of slide names mapped to patient id and label(s). + + Args: + headers (list(str)) Annotation header(s) that specifies label. + May be a list or string. + use_float (bool, optional) Either bool, dict, or 'auto'. + If true, convert data into float; if unable, raise TypeError. + If false, interpret all data as categorical. + If a dict(bool), look up each header to determine type. + If 'auto', will try to convert all data into float. For each + header in which this fails, will interpret as categorical. + assign (dict, optional): Dictionary mapping label ids to + label names. If not provided, will map ids to names by sorting + alphabetically. + format (str, optional): Either 'index' or 'name.' Indicates which + format should be used for categorical outcomes when returning + the label dictionary. If 'name', uses the string label name. + If 'index', returns an int (index corresponding with the + returned list of unique outcomes as str). Defaults to 'index'. + + Returns: + A tuple containing + + **dict**: Dictionary mapping slides to outcome labels in + numerical format (float for continuous outcomes, int of outcome + label id for categorical outcomes). + + **list**: List of unique labels. For categorical outcomes, + this will be a list of str; indices correspond with the outcome + label id. + + """ + if self.annotations is None: + raise errors.DatasetError("Annotations not loaded.") + if not len(self.filtered_annotations): + raise errors.DatasetError( + "Cannot generate labels: dataset is empty after filtering." + ) + results = {} # type: Dict + headers = sf.util.as_list(headers) + unique_labels = {} + filtered_pts = self.filtered_annotations.patient + filtered_slides = self.filtered_annotations.slide + for header in headers: + if assign and (len(headers) > 1 or header in assign): + assigned_for_header = assign[header] + elif assign is not None: + raise errors.DatasetError( + f"Unable to read outcome assignments for header {header}" + f" (assign={assign})" + ) + else: + assigned_for_header = None + unique_labels_for_this_header = [] + try: + filtered_labels = self.filtered_annotations[header] + except KeyError: + raise errors.AnnotationsError(f"Missing column {header}.") + + # Determine whether values should be converted into float + if isinstance(use_float, dict) and header not in use_float: + raise ValueError( + f"use_float is dict, but header {header} is missing." + ) + elif isinstance(use_float, dict): + header_is_float = use_float[header] + elif isinstance(use_float, bool): + header_is_float = use_float + elif use_float == 'auto': + header_is_float = self.is_float(header) + else: + raise ValueError(f"Invalid use_float option {use_float}") + + # Ensure labels can be converted to desired type, + # then assign values + if header_is_float and not self.is_float(header): + raise TypeError( + f"Unable to convert all labels of {header} into 'float' " + f"({','.join(filtered_labels)})." + ) + elif header_is_float: + log.debug(f'Interpreting column "{header}" as continuous') + filtered_labels = filtered_labels.astype(float) + else: + log.debug(f'Interpreting column "{header}" as categorical') + unique_labels_for_this_header = list(set(filtered_labels)) + unique_labels_for_this_header.sort() + for i, ul in enumerate(unique_labels_for_this_header): + n_matching_filtered = sum(f == ul for f in filtered_labels) + if assigned_for_header and ul not in assigned_for_header: + raise KeyError( + f"assign was provided, but label {ul} missing" + ) + elif assigned_for_header: + val_msg = assigned_for_header[ul] + n_s = str(n_matching_filtered) + log.debug( + f"{header} {ul} assigned {val_msg} [{n_s} slides]" + ) + else: + n_s = str(n_matching_filtered) + log.debug( + f"{header} {ul} assigned {i} [{n_s} slides]" + ) + + def _process_cat_label(o): + if assigned_for_header: + return assigned_for_header[o] + elif format == 'name': + return o + else: + return unique_labels_for_this_header.index(o) + + # Check for multiple, different labels per patient and warn + pt_assign = np.array(list(set(zip(filtered_pts, filtered_labels)))) + unique_pt, counts = np.unique(pt_assign[:, 0], return_counts=True) + for pt in unique_pt[np.argwhere(counts > 1)][:, 0]: + dup_vals = pt_assign[pt_assign[:, 0] == pt][:, 1] + dups = ", ".join([str(d) for d in dup_vals]) + log.error( + f'Multiple labels for patient "{pt}" (header {header}): ' + f'{dups}' + ) + + # Assemble results dictionary + for slide, lbl in zip(filtered_slides, filtered_labels): + if slide in sf.util.EMPTY: + continue + if not header_is_float: + lbl = _process_cat_label(lbl) + if slide in results: + results[slide] = sf.util.as_list(results[slide]) + results[slide] += [lbl] + elif header_is_float: + results[slide] = [lbl] + else: + results[slide] = lbl + unique_labels[header] = unique_labels_for_this_header + if len(headers) == 1: + return results, unique_labels[headers[0]] + else: + return results, unique_labels + + def load_indices(self, verbose=False) -> Dict[str, np.ndarray]: + """Return TFRecord indices.""" + pool = DPool(8) + tfrecords = self.tfrecords() + indices = {} + + def load_index(tfr): + tfr_name = path_to_name(tfr) + index = tfrecord2idx.load_index(tfr) + return tfr_name, index + + log.debug("Loading indices...") + for tfr_name, index in pool.imap(load_index, tfrecords): + indices[tfr_name] = index + pool.close() + return indices + + def manifest( + self, + key: str = 'path', + filter: bool = True + ) -> Dict[str, Dict[str, int]]: + """Generate a manifest of all tfrecords. + + Args: + key (str): Either 'path' (default) or 'name'. Determines key format + in the manifest dictionary. + filter (bool): Apply active filters to manifest. + + Returns: + dict: Dict mapping key (path or slide name) to number of tiles. + + """ + if key not in ('path', 'name'): + raise ValueError("'key' must be in ['path, 'name']") + + all_manifest = {} + for source in self.sources: + if self.sources[source]['label'] is None: + continue + if not self._tfrecords_set(source): + log.warning(f"tfrecords path not set for source {source}") + continue + tfrecord_dir = join( + self.sources[source]['tfrecords'], + self.sources[source]['label'] + ) + manifest_path = join(tfrecord_dir, "manifest.json") + if not exists(manifest_path): + log.debug(f"No manifest at {tfrecord_dir}; creating now") + sf.io.update_manifest_at_dir(tfrecord_dir) + + if exists(manifest_path): + relative_manifest = sf.util.load_json(manifest_path) + else: + relative_manifest = {} + global_manifest = {} + for record in relative_manifest: + k = join(tfrecord_dir, record) + global_manifest.update({k: relative_manifest[record]}) + all_manifest.update(global_manifest) + # Now filter out any tfrecords that would be excluded by filters + if filter: + filtered_tfrecords = self.tfrecords() + manifest_tfrecords = list(all_manifest.keys()) + for tfr in manifest_tfrecords: + if tfr not in filtered_tfrecords: + del all_manifest[tfr] + # Log clipped tile totals if applicable + for tfr in all_manifest: + if tfr in self._clip: + all_manifest[tfr]['clipped'] = min(self._clip[tfr], + all_manifest[tfr]['total']) + else: + all_manifest[tfr]['clipped'] = all_manifest[tfr]['total'] + if key == 'path': + return all_manifest + else: + return {path_to_name(t): v for t, v in all_manifest.items()} + + def manifest_histogram( + self, + by: Optional[str] = None, + binrange: Optional[Tuple[int, int]] = None + ) -> None: + """Plot histograms of tiles-per-slide. + + Example + Create histograms of tiles-per-slide, stratified by site. + + .. code-block:: python + + import matplotlib.pyplot as plt + + dataset.manifest_histogram(by='site') + plt.show() + + Args: + by (str, optional): Stratify histograms by this annotation column + header. Defaults to None. + binrange (tuple(int, int)): Histogram bin ranges. If None, uses + full range. Defaults to None. + + """ + import seaborn as sns + import matplotlib.pyplot as plt + + if by is not None: + _, unique_vals = self.labels(by, format='name') + val_counts = [ + [ + m['total'] + for m in self.filter({by: val}).manifest().values() + ] + for val in unique_vals + ] + all_counts = [c for vc in val_counts for c in vc] + else: + unique_vals = [''] + all_counts = [m['total'] for m in self.manifest().values()] + val_counts = [all_counts] + if binrange is None: + max_count = (max(all_counts) // 20) * 20 + binrange = (0, max_count) + + fig, axes = plt.subplots(len(unique_vals), 1, + figsize=(3, len(unique_vals))) + if not isinstance(axes, np.ndarray): + axes = [axes] + fig.set_tight_layout({"pad": .0}) + for a, ax in enumerate(axes): + sns.histplot(val_counts[a], bins=20, binrange=binrange, ax=ax) + ax.yaxis.set_tick_params(labelleft=False) + ax.set_ylabel(unique_vals[a], rotation='horizontal', ha='right') + ax.set_xlim(binrange) + if a != (len(axes) - 1): + ax.xaxis.set_tick_params(labelbottom=False) + ax.set(xlabel=None) + ax.set(xlabel="Tiles per slide") + + def patients(self) -> Dict[str, str]: + """Return a list of patient IDs from this dataset.""" + if self.annotations is None: + raise errors.DatasetError("Annotations not loaded.") + result = {} # type: Dict[str, str] + pairs = list(zip( + self.filtered_annotations['slide'], + self.filtered_annotations['patient'] + )) + for slide, patient in pairs: + if slide in result and result[slide] != patient: + raise errors.AnnotationsError( + f'Slide "{slide}" assigned to multiple patients: ' + f"({patient}, {result[slide]})" + ) + else: + if slide not in sf.util.EMPTY: + result[slide] = patient + return result + + def pt_files(self, *args, **kwargs): + """Deprecated function. Please use `Dataset.get_bags()`.""" + warnings.warn( + "pt_files() is deprecated. Please use Dataset.get_bags()", + DeprecationWarning + ) + return self.get_bags(*args, **kwargs) + + def get_bags(self, path, warn_missing=True): + """Return list of all \*.pt files with slide names in this dataset. + + May return more than one \*.pt file for each slide. + + Args: + path (str, list(str)): Directory(ies) to search for \*.pt files. + warn_missing (bool): Raise a warning if any slides in this dataset + do not have a \*.pt file. + + """ + slides = self.slides() + if isinstance(path, str): + path = [path] + + bags = [] + for p in path: + if not exists(p): + raise ValueError(f"Path {p} does not exist.") + bags_at_path = np.array([ + join(p, f) for f in os.listdir(p) + if f.endswith('.pt') and path_to_name(f) in slides + ]) + bags.append(bags_at_path) + bags = np.concatenate(bags) + unique_slides_with_bags = np.unique([path_to_name(b) for b in bags]) + if (len(unique_slides_with_bags) != len(slides)) and warn_missing: + log.warning(f"Bags missing for {len(slides) - len(unique_slides_with_bags)} slides.") + return bags + + def read_tfrecord_by_location( + self, + slide: str, + loc: Tuple[int, int], + decode: Optional[bool] = None + ) -> Any: + """Read a record from a TFRecord, indexed by location. + + Finds the associated TFRecord for a slide, and returns the record + inside which corresponds to a given tile location. + + Args: + slide (str): Name of slide. Will search for the slide's associated + TFRecord. + loc ((int, int)): ``(x, y)`` tile location. Searches the TFRecord + for the tile that corresponds to this location. + decode (bool): Decode the associated record, returning Tensors. + Defaults to True. + + Returns: + Unprocessed raw TFRecord bytes if ``decode=False``, otherwise a + tuple containing ``(slide, image)``, where ``image`` is a + uint8 Tensor. + + """ + tfr = self.find_tfrecord(slide=slide) + if tfr is None: + raise errors.TFRecordsError( + f"Could not find associated TFRecord for slide '{slide}'" + ) + if decode is None: + decode = True + else: + warnings.warn( + "The 'decode' argument to `Dataset.read_tfrecord_by_location` " + "is deprecated and will be removed in a future version. In the " + "future, all records will be decoded." + ) + return sf.io.get_tfrecord_by_location(tfr, loc, decode=decode) + + def remove_filter(self, **kwargs: Any) -> "Dataset": + """Remove a specific filter from the active filters. + + Keyword Args: + filters (list of str): Filter keys. Will remove filters with + these keys. + filter_blank (list of str): Will remove these headers stored in + filter_blank. + + Returns: + :class:`slideflow.Dataset`: Dataset with filter removed. + + """ + for kwarg in kwargs: + if kwarg not in ('filters', 'filter_blank'): + raise ValueError(f'Unknown filtering argument {kwarg}') + ret = copy.deepcopy(self) + if 'filters' in kwargs: + if isinstance(kwargs['filters'], str): + kwargs['filters'] = [kwargs['filters']] + elif not isinstance(kwargs['filters'], list): + raise TypeError("'filters' must be a list.") + for f in kwargs['filters']: + if f not in ret._filters: + raise errors.DatasetFilterError( + f"Filter {f} not found in dataset (active filters:" + f"{','.join(list(ret._filters.keys()))})" + ) + else: + del ret._filters[f] + if 'filter_blank' in kwargs: + kwargs['filter_blank'] = sf.util.as_list(kwargs['filter_blank']) + for f in kwargs['filter_blank']: + if f not in ret._filter_blank: + raise errors.DatasetFilterError( + f"Filter_blank {f} not found in dataset (active " + f"filter_blank: {','.join(ret._filter_blank)})" + ) + elif isinstance(ret._filter_blank, dict): + del ret._filter_blank[ret._filter_blank.index(f)] + return ret + + def rebuild_index(self) -> None: + """Rebuild index files for TFRecords. + + Equivalent to ``Dataset.build_index(force=True)``. + + Args: + None + + Returns: + None + """ + self.build_index(force=True) + + def resize_tfrecords(self, tile_px: int) -> None: + """Resize images in a set of TFRecords to a given pixel size. + + Args: + tile_px (int): Target pixel size for resizing TFRecord images. + + """ + if not sf.util.tf_available: + raise NotImplementedError( + "Dataset.resize_tfrecords() requires Tensorflow, which is " + "not installed.") + + log.info(f'Resizing TFRecord tiles to ({tile_px}, {tile_px})') + tfrecords_list = self.tfrecords() + log.info(f'Resizing {len(tfrecords_list)} tfrecords') + for tfr in tfrecords_list: + sf.io.tensorflow.transform_tfrecord( + tfr, + tfr+'.transformed', + resize=tile_px + ) + + def rois(self) -> List[str]: + """Return a list of all ROIs.""" + rois_list = [] + for source in self.sources: + if self._roi_set(source): + rois_list += glob(join(self.sources[source]['roi'], "*.csv")) + else: + log.warning(f"roi path not set for source {source}") + slides = self.slides() + return [r for r in list(set(rois_list)) if path_to_name(r) in slides] + + def slide_manifest( + self, + roi_method: str = 'auto', + stride_div: int = 1, + tma: bool = False, + source: Optional[str] = None, + low_memory: bool = False + ) -> Dict[str, int]: + """Return a dictionary of slide names and estimated number of tiles. + + Uses Otsu thresholding for background filtering, and the ROI strategy. + + Args: + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and skip a slide if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + stride_div (int): Stride divisor for tile extraction. + A stride of 1 will extract non-overlapping tiles. + A stride_div of 2 will extract overlapping tiles, with a stride + equal to 50% of the tile width. Defaults to 1. + tma (bool): Deprecated argument. Tumor micro-arrays are read as + standard slides. Defaults to False. + source (str, optional): Dataset source name. + Defaults to None (using all sources). + low_memory (bool): Operate in low-memory mode at the cost of + worse performance. + + Returns: + Dict[str, int]: Dictionary mapping slide names to number of + estimated non-background tiles in the slide. + + """ + if tma: + warnings.warn( + "tma=True is deprecated and will be removed in a future " + "version. Tumor micro-arrays are read as standard slides. " + ) + if self.tile_px is None or self.tile_um is None: + raise errors.DatasetError( + "tile_px and tile_um must be set to calculate a slide manifest" + ) + paths = self.slide_paths(source=source) + pb = Progress(transient=True) + read_task = pb.add_task('Reading slides...', total=len(paths)) + if not low_memory: + otsu_task = pb.add_task("Otsu thresholding...", total=len(paths)) + pb.start() + pool = mp.Pool( + sf.util.num_cpu(default=16), + initializer=sf.util.set_ignore_sigint + ) + wsi_list = [] + to_remove = [] + counts = [] + for path in paths: + try: + wsi = sf.WSI( + path, + self.tile_px, + self.tile_um, + rois=self.rois(), + stride_div=stride_div, + roi_method=roi_method, + verbose=False) + if low_memory: + wsi.qc('otsu') + counts += [wsi.estimated_num_tiles] + else: + wsi_list += [wsi] + pb.advance(read_task) + except errors.SlideLoadError as e: + log.error(f"Error reading slide {path}: {e}") + to_remove += [path] + for path in to_remove: + paths.remove(path) + pb.update(read_task, total=len(paths)) + pb.update(otsu_task, total=len(paths)) + if not low_memory: + for count in pool.imap(_count_otsu_tiles, wsi_list): + counts += [count] + pb.advance(otsu_task) + pb.stop() + pool.close() + return {path: counts[p] for p, path in enumerate(paths)} + + def slide_paths( + self, + source: Optional[str] = None, + apply_filters: bool = True + ) -> List[str]: + """Return a list of paths to slides. + + Either returns a list of paths to all slides, or slides only matching + dataset filters. + + Args: + source (str, optional): Dataset source name. + Defaults to None (using all sources). + filter (bool, optional): Return only slide paths meeting filter + criteria. If False, return all slides. Defaults to True. + + Returns: + list(str): List of slide paths. + + """ + if source and source not in self.sources.keys(): + raise errors.DatasetError(f"Dataset {source} not found.") + # Get unfiltered paths + if source: + if not self._slides_set(source): + log.warning(f"slides path not set for source {source}") + return [] + else: + paths = sf.util.get_slide_paths(self.sources[source]['slides']) + else: + paths = [] + for src in self.sources: + if not self._slides_set(src): + log.warning(f"slides path not set for source {src}") + else: + paths += sf.util.get_slide_paths( + self.sources[src]['slides'] + ) + + # Remove any duplicates from shared dataset paths + paths = list(set(paths)) + # Filter paths + if apply_filters: + filtered_slides = self.slides() + filtered_paths = [ + p for p in paths if path_to_name(p) in filtered_slides + ] + return filtered_paths + else: + return paths + + def slides(self) -> List[str]: + """Return a list of slide names in this dataset.""" + if self.annotations is None: + raise errors.AnnotationsError( + "No annotations loaded; is the annotations file empty?" + ) + if 'slide' not in self.annotations.columns: + raise errors.AnnotationsError( + f"{'slide'} not found in annotations file." + ) + ann = self.filtered_annotations + ann = ann.loc[~ann.slide.isin(sf.util.EMPTY)] + slides = ann.slide.unique().tolist() + return slides + + def split( + self, + model_type: Optional[str] = None, + labels: Optional[Union[Dict, str]] = None, + val_strategy: str = 'fixed', + splits: Optional[str] = None, + val_fraction: Optional[float] = None, + val_k_fold: Optional[int] = None, + k_fold_iter: Optional[int] = None, + site_labels: Optional[Union[str, Dict[str, str]]] = 'site', + read_only: bool = False, + from_wsi: bool = False, + ) -> Tuple["Dataset", "Dataset"]: + """Split this dataset into a training and validation dataset. + + If a validation split has already been prepared (e.g. K-fold iterations + were already determined), the previously generated split will be used. + Otherwise, create a new split and log the result in the TFRecord + directory so future models may use the same split for consistency. + + Args: + model_type (str): Either 'classification' or 'regression'. Defaults + to 'classification' if ``labels`` is provided. + labels (dict or str): Either a dictionary of slides: labels, + or an outcome label (``str``). Used for balancing outcome + labels in training and validation cohorts. Defaults to None. + val_strategy (str): Either 'k-fold', 'k-fold-preserved-site', + 'bootstrap', or 'fixed'. Defaults to 'fixed'. + splits (str, optional): Path to JSON file containing validation + splits. Defaults to None. + outcome_key (str, optional): Key indicating outcome label in + slide_labels_dict. Defaults to 'outcome_label'. + val_fraction (float, optional): Proportion of data for validation. + Not used if strategy is k-fold. Defaults to None. + val_k_fold (int): K, required if using K-fold validation. + Defaults to None. + k_fold_iter (int, optional): Which K-fold iteration to generate + starting at 1. Fequired if using K-fold validation. + Defaults to None. + site_labels (dict, optional): Dict mapping patients to site labels, + or an outcome column with site labels. Only used for site + preserved cross validation. Defaults to 'site'. + read_only (bool): Prevents writing validation splits to file. + Defaults to False. + + Returns: + A tuple containing + + :class:`slideflow.Dataset`: Training dataset. + + :class:`slideflow.Dataset`: Validation dataset. + """ + if (not k_fold_iter and 'k-fold' in val_strategy): + raise errors.DatasetSplitError( + "If strategy is 'k-fold', must supply k_fold_iter " + "(int starting at 1)" + ) + if (not val_k_fold and 'k-fold' in val_strategy): + raise errors.DatasetSplitError( + "If strategy is 'k-fold', must supply val_k_fold (K)" + ) + if val_strategy == 'k-fold-preserved-site' and not site_labels: + raise errors.DatasetSplitError( + "k-fold-preserved-site requires site_labels (dict of " + "patients:sites, or name of annotation column header" + ) + if (val_strategy == 'k-fold-preserved-site' + and isinstance(site_labels, str)): + site_labels, _ = self.labels(site_labels, format='name') + if val_strategy == 'k-fold-preserved-site' and site_labels is None: + raise errors.DatasetSplitError( + f"Must supply site_labels for strategy {val_strategy}" + ) + if val_strategy in ('bootstrap', 'fixed') and val_fraction is None: + raise errors.DatasetSplitError( + f"Must supply val_fraction for strategy {val_strategy}" + ) + if isinstance(labels, str): + labels = self.labels(labels)[0] + if labels is None and model_type is None: + labels = self.patients() + model_type = 'regression' + elif model_type is None: + model_type = 'classification' + if model_type == 'categorical': + warnings.warn( + "model_type='categorical' is deprecated. Please use " + "'classification' instead." + ) + model_type = 'classification' + if model_type == 'linear': + warnings.warn( + "model_type='linear' is deprecated. Please use " + "'regression' instead." + ) + model_type = 'regression' + if model_type not in ('classification', 'regression'): + raise ValueError( + f"Invalid model_type {model_type}; must be either " + "'classification' or 'regression'" + ) + + # Prepare dataset + patients = self.patients() + splits_file = splits + training_tfr = [] + val_tfr = [] + accepted_split = None + slide_list = list(labels.keys()) + + # Assemble dict of patients linking to list of slides & outcome labels + # dataset.labels() ensures no duplicate labels for a single patient + tfr_dir_list = self.tfrecords() if not from_wsi else self.slide_paths() + skip_tfr_verification = False + if not len(tfr_dir_list) and not from_wsi: + log.warning("No tfrecords found; splitting from annotations only.") + tfr_dir_list = tfr_dir_list_names = self.slides() + skip_tfr_verification = True + elif not len(tfr_dir_list): + log.warning("No slides found; splitting from annotations only.") + tfr_dir_list = tfr_dir_list_names = self.slides() + skip_tfr_verification = True + else: + tfr_dir_list_names = [ + sf.util.path_to_name(tfr) for tfr in tfr_dir_list + ] + patients_dict = {} + num_warned = 0 + for slide in slide_list: + patient = slide if not patients else patients[slide] + # Skip slides not found in directory + if slide not in tfr_dir_list_names: + log.debug(f"Slide {slide} missing tfrecord, skipping") + num_warned += 1 + continue + if patient not in patients_dict: + patients_dict[patient] = { + 'outcome_label': labels[slide], + 'slides': [slide] + } + elif patients_dict[patient]['outcome_label'] != labels[slide]: + ol = patients_dict[patient]['outcome_label'] + ok = labels[slide] + raise errors.DatasetSplitError( + f"Multiple labels found for {patient} ({ol}, {ok})" + ) + else: + patients_dict[patient]['slides'] += [slide] + + # Add site labels to the patients dict if doing + # preserved-site cross-validation + if val_strategy == 'k-fold-preserved-site': + assert site_labels is not None + site_slide_list = list(site_labels.keys()) + for slide in site_slide_list: + patient = slide if not patients else patients[slide] + # Skip slides not found in directory + if slide not in tfr_dir_list_names: + continue + if 'site' not in patients_dict[patient]: + patients_dict[patient]['site'] = site_labels[slide] + elif patients_dict[patient]['site'] != site_labels[slide]: + ol = patients_dict[patient]['slide'] + ok = site_labels[slide] + _tail = f"{patient} ({ol}, {ok})" + raise errors.DatasetSplitError( + f"Multiple site labels found for {_tail}" + ) + if num_warned: + log.warning(f"{num_warned} slides missing tfrecords, skipping") + patients_list = list(patients_dict.keys()) + sorted_patients = [p for p in patients_list] + sorted_patients.sort() + shuffle(patients_list) + + # Create and log a validation subset + if val_strategy == 'none': + log.info("val_strategy is None; skipping validation") + train_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in patients_dict.keys() + ]).tolist() + val_slides = [] + elif val_strategy == 'bootstrap': + assert val_fraction is not None + num_val = int(val_fraction * len(patients_list)) + log.info( + f"Boostrap validation: selecting {num_val} " + "patients at random for validation testing" + ) + val_patients = patients_list[0:num_val] + train_patients = patients_list[num_val:] + if not len(val_patients) or not len(train_patients): + raise errors.InsufficientDataForSplitError + val_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in val_patients + ]).tolist() + train_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in train_patients + ]).tolist() + else: + # Try to load validation split + if (not splits_file or not exists(splits_file)): + loaded_splits = [] + else: + loaded_splits = sf.util.load_json(splits_file) + for split_id, split in enumerate(loaded_splits): + # First, see if strategy is the same + if split['strategy'] != val_strategy: + continue + # If k-fold, check that k-fold length is the same + if (val_strategy in ('k-fold', 'k-fold-preserved-site') + and len(list(split['tfrecords'].keys())) != val_k_fold): + continue + + # Then, check if patient lists are the same + sp_pts = list(split['patients'].keys()) + sp_pts.sort() + if sp_pts == sorted_patients: + # Finally, check if outcome variables are the same + c1 = [patients_dict[p]['outcome_label'] for p in sp_pts] + c2 = [split['patients'][p]['outcome_label']for p in sp_pts] + if c1 == c2: + log.info( + f"Using {val_strategy} validation split detected" + f" at [green]{splits_file}[/] (ID: {split_id})" + ) + accepted_split = split + break + + # If no split found, create a new one + if not accepted_split: + if splits_file: + log.info("No compatible train/val split found.") + log.info(f"Logging new split at [green]{splits_file}") + else: + log.info("No training/validation splits file provided.") + log.info("Unable to save or load validation splits.") + new_split = { + 'strategy': val_strategy, + 'patients': patients_dict, + 'tfrecords': {} + } # type: Any + if val_strategy == 'fixed': + assert val_fraction is not None + num_val = int(val_fraction * len(patients_list)) + val_patients = patients_list[0:num_val] + train_patients = patients_list[num_val:] + if not len(val_patients) or not len(train_patients): + raise errors.InsufficientDataForSplitError + val_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in val_patients + ]).tolist() + train_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in train_patients + ]).tolist() + new_split['tfrecords']['validation'] = val_slides + new_split['tfrecords']['training'] = train_slides + + elif val_strategy in ('k-fold', 'k-fold-preserved-site'): + assert val_k_fold is not None + if (val_strategy == 'k-fold-preserved-site'): + k_fold_patients = split_patients_preserved_site( + patients_dict, + val_k_fold, + balance=('outcome_label' + if model_type == 'classification' + else None) + ) + elif model_type == 'classification': + k_fold_patients = split_patients_balanced( + patients_dict, + val_k_fold, + balance='outcome_label' + ) + else: + k_fold_patients = split_patients( + patients_dict, val_k_fold + ) + # Verify at least one patient is in each k_fold group + if (len(k_fold_patients) != val_k_fold + or not min([len(pl) for pl in k_fold_patients])): + raise errors.InsufficientDataForSplitError + train_patients = [] + for k in range(1, val_k_fold+1): + new_split['tfrecords'][f'k-fold-{k}'] = np.concatenate( + [patients_dict[patient]['slides'] + for patient in k_fold_patients[k-1]] + ).tolist() + if k == k_fold_iter: + val_patients = k_fold_patients[k-1] + else: + train_patients += k_fold_patients[k-1] + val_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in val_patients + ]).tolist() + train_slides = np.concatenate([ + patients_dict[patient]['slides'] + for patient in train_patients + ]).tolist() + else: + raise errors.DatasetSplitError( + f"Unknown validation strategy {val_strategy}." + ) + # Write the new split to log + loaded_splits += [new_split] + if not read_only and splits_file: + sf.util.write_json(loaded_splits, splits_file) + else: + # Use existing split + if val_strategy == 'fixed': + val_slides = accepted_split['tfrecords']['validation'] + train_slides = accepted_split['tfrecords']['training'] + elif val_strategy in ('k-fold', 'k-fold-preserved-site'): + assert val_k_fold is not None + k_id = f'k-fold-{k_fold_iter}' + val_slides = accepted_split['tfrecords'][k_id] + train_slides = np.concatenate([ + accepted_split['tfrecords'][f'k-fold-{ki}'] + for ki in range(1, val_k_fold+1) + if ki != k_fold_iter + ]).tolist() + else: + raise errors.DatasetSplitError( + f"Unknown val_strategy {val_strategy} requested." + ) + + # Perform final integrity check to ensure no patients + # are in both training and validation slides + if patients: + validation_pt = list(set([patients[s] for s in val_slides])) + training_pt = list(set([patients[s] for s in train_slides])) + else: + validation_pt, training_pt = val_slides, train_slides + if sum([pt in training_pt for pt in validation_pt]): + raise errors.DatasetSplitError( + "At least one patient is in both val and training sets." + ) + + # Assemble list of tfrecords + if val_strategy != 'none': + val_tfr = [ + tfr for tfr in tfr_dir_list + if path_to_name(tfr) in val_slides or tfr in val_slides + ] + training_tfr = [ + tfr for tfr in tfr_dir_list + if path_to_name(tfr) in train_slides or tfr in train_slides + ] + if not len(val_tfr) == len(val_slides): + raise errors.DatasetError( + f"Number of validation tfrecords ({len(val_tfr)}) does " + f"not match number of validation slides ({len(val_slides)}). " + "This may happen if multiple tfrecords were found for a slide." + ) + if not len(training_tfr) == len(train_slides): + raise errors.DatasetError( + f"Number of training tfrecords ({len(training_tfr)}) does " + f"not match number of training slides ({len(train_slides)}). " + "This may happen if multiple tfrecords were found for a slide." + ) + training_dts = copy.deepcopy(self) + training_dts = training_dts.filter(filters={'slide': train_slides}) + val_dts = copy.deepcopy(self) + val_dts = val_dts.filter(filters={'slide': val_slides}) + if not skip_tfr_verification and not from_wsi: + assert sorted(training_dts.tfrecords()) == sorted(training_tfr) + assert sorted(val_dts.tfrecords()) == sorted(val_tfr) + elif not skip_tfr_verification: + assert sorted(training_dts.slide_paths()) == sorted(training_tfr) + assert sorted(val_dts.slide_paths()) == sorted(val_tfr) + return training_dts, val_dts + + def split_tfrecords_by_roi( + self, + destination: str, + roi_filter_method: Union[str, float] = 'center' + ) -> None: + """Split dataset tfrecords into separate tfrecords according to ROI. + + Will generate two sets of tfrecords, with identical names: one with + tiles inside the ROIs, one with tiles outside the ROIs. Will skip any + tfrecords that are missing ROIs. Requires slides to be available. + + Args: + destination (str): Destination path. + roi_filter_method (str or float): Method of filtering tiles with + ROIs. Either 'center' or float (0-1). If 'center', tiles are + filtered with ROIs based on the center of the tile. If float, + tiles are filtered based on the proportion of the tile inside + the ROI, and ``roi_filter_method`` is interpreted as a + threshold. If the proportion of a tile inside the ROI is + greater than this number, the tile is included. For example, + if ``roi_filter_method=0.7``, a tile that is 80% inside of an + ROI will be included, and a tile that is 50% inside of an ROI + will be excluded. Defaults to 'center'. + + Returns: + None + """ + tfrecords = self.tfrecords() + slides = {path_to_name(s): s for s in self.slide_paths()} + rois = self.rois() + manifest = self.manifest() + + if self.tile_px is None or self.tile_um is None: + raise errors.DatasetError( + "tile_px and tile_um must be non-zero to process TFRecords." + ) + + for tfr in tfrecords: + slidename = path_to_name(tfr) + if slidename not in slides: + continue + try: + slide = WSI( + slides[slidename], + self.tile_px, + self.tile_um, + rois=rois, + roi_method='inside', + roi_filter_method=roi_filter_method + ) + except errors.SlideLoadError as e: + log.error(e) + continue + parser = sf.io.get_tfrecord_parser( + tfr, + decode_images=False, + to_numpy=True + ) + if parser is None: + log.error(f"Could not read TFRecord {tfr}; skipping") + continue + reader = sf.io.TFRecordDataset(tfr) + if not exists(join(destination, 'inside')): + os.makedirs(join(destination, 'inside')) + if not exists(join(destination, 'outside')): + os.makedirs(join(destination, 'outside')) + in_path = join(destination, 'inside', f'{slidename}.tfrecords') + out_path = join(destination, 'outside', f'{slidename}.tfrecords') + inside_roi_writer = sf.io.TFRecordWriter(in_path) + outside_roi_writer = sf.io.TFRecordWriter(out_path) + for record in track(reader, total=manifest[tfr]['total']): + parsed = parser(record) + loc_x, loc_y = parsed['loc_x'], parsed['loc_y'] + tile_in_roi = any([ + roi.poly.contains(sg.Point(loc_x, loc_y)) + for roi in slide.rois + ]) + # Convert from a Tensor -> Numpy array + if hasattr(record, 'numpy'): + record = record.numpy() + if tile_in_roi: + inside_roi_writer.write(record) + else: + outside_roi_writer.write(record) + inside_roi_writer.close() + outside_roi_writer.close() + + def summary(self) -> None: + """Print a summary of this dataset.""" + # Get ROI information. + patients = self.patients() + has_rois = defaultdict(bool) + slides_with_roi = {} + patients_with_roi = defaultdict(bool) + for r in self.rois(): + s = sf.util.path_to_name(r) + with open(r, 'r') as f: + has_rois[s] = len(f.read().split('\n')) > 2 + for sp in self.slide_paths(): + s = sf.util.path_to_name(sp) + slides_with_roi[s] = has_rois[s] + for s in self.slides(): + p = patients[s] + if s in slides_with_roi and slides_with_roi[s]: + patients_with_roi[p] = True + + # Print summary. + if self.annotations is not None: + num_patients = len(self.annotations.patient.unique()) + else: + num_patients = 0 + print("Overview:") + table = [("Configuration file:", self._config), + ("Tile size (px):", self.tile_px), + ("Tile size (um):", self.tile_um), + ("Slides:", len(self.slides())), + ("Patients:", num_patients), + ("Slides with ROIs:", len([s for s in slides_with_roi + if slides_with_roi[s]])), + ("Patients with ROIs:", len([p for p in patients_with_roi + if patients_with_roi[p]]))] + print(tabulate(table, tablefmt='fancy_outline')) + print("\nFilters:") + table = [("Filters:", pformat(self.filters)), + ("Filter Blank:", pformat(self.filter_blank)), + ("Min Tiles:", pformat(self.min_tiles))] + print(tabulate(table, tablefmt='fancy_grid')) + print("\nSources:") + if not self.sources: + print("<None>") + else: + for source in self.sources: + print(f"\n{source}") + d = self.sources[source] + print(tabulate(zip(d.keys(), d.values()), + tablefmt="fancy_outline")) + + print("\nNumber of tiles in TFRecords:", self.num_tiles) + print("Annotation columns:") + print("<NA>" if self.annotations is None else self.annotations.columns) + + def tensorflow( + self, + labels: Labels = None, + batch_size: Optional[int] = None, + from_wsi: bool = False, + **kwargs: Any + ) -> "tf.data.Dataset": + """Return a Tensorflow Dataset object that interleaves tfrecords. + + The returned dataset yields a batch of (image, label) for each tile. + Labels may be specified either via a dict mapping slide names to + outcomes, or a parsing function which accept and image and slide name, + returning a dict {'image_raw': image(tensor)} and label (int or float). + + Args: + labels (dict or str, optional): Dict or function. If dict, must + map slide names to outcome labels. If function, function must + accept an image (tensor) and slide name (str), and return a + dict {'image_raw': image (tensor)} and label (int or float). + If not provided, all labels will be None. + batch_size (int): Batch size. + + Keyword Args: + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + deterministic (bool, optional): When num_parallel_calls is specified, + if this boolean is specified, it controls the order in which the + transformation produces elements. If set to False, the + transformation is allowed to yield elements out of order to trade + determinism for performance. Defaults to False. + drop_last (bool, optional): Drop the last non-full batch. + Defaults to False. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + incl_loc (str, optional): 'coord', 'grid', or None. Return (x,y) + origin coordinates ('coord') for each tile center along with tile + images, or the (x,y) grid coordinates for each tile. + Defaults to 'coord'. + incl_slidenames (bool, optional): Include slidenames as third returned + variable. Defaults to False. + infinite (bool, optional): Create an finite dataset. WARNING: If + infinite is False && balancing is used, some tiles will be skipped. + Defaults to True. + img_size (int): Image width in pixels. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer to use on images. Defaults to None. + num_parallel_reads (int, optional): Number of parallel reads for each + TFRecordDataset. Defaults to 4. + num_shards (int, optional): Shard the tfrecord datasets, used for + multiprocessing datasets. Defaults to None. + pool (multiprocessing.Pool): Shared multiprocessing pool. Useful + if ``from_wsi=True``, for sharing a unified processing pool between + dataloaders. Defaults to None. + rois (list(str), optional): List of ROI paths. Only used if + from_wsi=True. Defaults to None. + roi_method (str, optional): Method for extracting ROIs. Only used if + from_wsi=True. Defaults to 'auto'. + shard_idx (int, optional): Index of the tfrecord shard to use. + Defaults to None. + standardize (bool, optional): Standardize images to (0,1). + Defaults to True. + tile_um (int, optional): Size of tiles to extract from WSI, in + microns. Only used if from_wsi=True. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + transform (Callable, optional): Arbitrary transform function. + Performs transformation after augmentations but before + standardization. Defaults to None. + **decode_kwargs (dict): Keyword arguments to pass to + :func:`slideflow.io.tensorflow.decode_image`. + + Returns: + tf.data.Dataset + + """ + from slideflow.io.tensorflow import interleave + + if self.tile_px is None: + raise errors.DatasetError("tile_px and tile_um must be non-zero" + "to create dataloaders.") + if self.prob_weights is not None and from_wsi: + log.warning("Dataset balancing is disabled when `from_wsi=True`") + if self._clip not in (None, {}) and from_wsi: + log.warning("Dataset clipping is disabled when `from_wsi=True`") + + if from_wsi: + tfrecords = self.slide_paths() + kwargs['rois'] = self.rois() + kwargs['tile_um'] = self.tile_um + kwargs['from_wsi'] = True + prob_weights = None + clip = None + else: + tfrecords = self.tfrecords() + prob_weights = self.prob_weights + clip = self._clip + if not tfrecords: + raise errors.TFRecordsNotFoundError + self.verify_img_format(progress=False) + + return interleave(paths=tfrecords, + labels=labels, + img_size=self.tile_px, + batch_size=batch_size, + prob_weights=prob_weights, + clip=clip, + **kwargs) + + def tfrecord_report( + self, + dest: str, + normalizer: Optional["StainNormalizer"] = None + ) -> None: + """Create a PDF report of TFRecords. + + Reports include 10 example tiles per TFRecord. Report is saved + in the target destination. + + Args: + dest (str): Directory in which to save the PDF report. + normalizer (`slideflow.norm.StainNormalizer`, optional): + Normalizer to use on image tiles. Defaults to None. + + """ + if normalizer is not None: + log.info(f'Using realtime {normalizer.method} normalization') + + tfrecord_list = self.tfrecords() + reports = [] + log.info('Generating TFRecords report...') + # Get images for report + for tfr in track(tfrecord_list, description='Generating report...'): + dataset = sf.io.TFRecordDataset(tfr) + parser = sf.io.get_tfrecord_parser( + tfr, + ('image_raw',), + to_numpy=True, + decode_images=False + ) + if not parser: + continue + sample_tiles = [] + for i, record in enumerate(dataset): + if i > 9: + break + image_raw_data = parser(record)[0] + if normalizer: + image_raw_data = normalizer.jpeg_to_jpeg(image_raw_data) + sample_tiles += [image_raw_data] + reports += [SlideReport(sample_tiles, + tfr, + tile_px=self.tile_px, + tile_um=self.tile_um, + ignore_thumb_errors=True)] + + # Generate and save PDF + log.info('Generating PDF (this may take some time)...') + pdf_report = ExtractionReport(reports, title='TFRecord Report') + timestring = datetime.now().strftime('%Y%m%d-%H%M%S') + if exists(dest) and isdir(dest): + filename = join(dest, f'tfrecord_report-{timestring}.pdf') + elif sf.util.path_to_ext(dest) == 'pdf': + filename = join(dest) + else: + raise ValueError(f"Could not find destination directory {dest}.") + pdf_report.save(filename) + log.info(f'TFRecord report saved to [green]{filename}') + + def tfrecord_heatmap( + self, + tfrecord: Union[str, List[str]], + tile_dict: Dict[int, float], + filename: str, + **kwargs + ) -> None: + """Create a tfrecord-based WSI heatmap. + + Uses a dictionary of tile values for heatmap display, and saves to + the specified directory. + + Args: + tfrecord (str or list(str)): Path(s) to tfrecord(s). + tile_dict (dict): Dictionary mapping tfrecord indices to a + tile-level value for display in heatmap format + filename (str): Destination filename for heatmap. + + """ + slide_paths = { + sf.util.path_to_name(sp): sp for sp in self.slide_paths() + } + if not self.tile_px or not self.tile_um: + raise errors.DatasetError( + "Dataset tile_px & tile_um must be set to create TFRecords." + ) + for tfr in sf.util.as_list(tfrecord): + name = sf.util.path_to_name(tfr) + if name not in slide_paths: + raise errors.SlideNotFoundError(f'Unable to find slide {name}') + sf.util.tfrecord_heatmap( + tfrecord=tfr, + slide=slide_paths[name], + tile_px=self.tile_px, + tile_um=self.tile_um, + tile_dict=tile_dict, + filename=filename, + **kwargs + ) + + def tfrecords(self, source: Optional[str] = None) -> List[str]: + """Return a list of all tfrecords. + + Args: + source (str, optional): Only return tfrecords from this dataset + source. Defaults to None (return all tfrecords in dataset). + + Returns: + List of tfrecords paths. + + """ + if source and source not in self.sources.keys(): + log.error(f"Dataset {source} not found.") + return [] + if source is None: + sources_to_search = list(self.sources.keys()) # type: List[str] + else: + sources_to_search = [source] + + tfrecords_list = [] + folders_to_search = [] + for source in sources_to_search: + if not self._tfrecords_set(source): + log.warning(f"tfrecords path not set for source {source}") + continue + tfrecords = self.sources[source]['tfrecords'] + label = self.sources[source]['label'] + if label is None: + continue + tfrecord_path = join(tfrecords, label) + if not exists(tfrecord_path): + log.debug( + f"TFRecords path not found: {tfrecord_path}" + ) + continue + folders_to_search += [tfrecord_path] + for folder in folders_to_search: + tfrecords_list += glob(join(folder, "*.tfrecords")) + tfrecords_list = list(set(tfrecords_list)) + + # Filter the list by filters + if self.annotations is not None: + slides = self.slides() + filtered_tfrecords_list = [ + tfrecord for tfrecord in tfrecords_list + if path_to_name(tfrecord) in slides + ] + filtered = filtered_tfrecords_list + else: + log.warning("Error filtering TFRecords, are annotations empty?") + filtered = tfrecords_list + + # Filter by min_tiles + manifest = self.manifest(filter=False) + if not all([f in manifest for f in filtered]): + self.update_manifest() + manifest = self.manifest(filter=False) + if self.min_tiles: + return [ + f for f in filtered + if f in manifest and manifest[f]['total'] >= self.min_tiles + ] + else: + return [f for f in filtered + if f in manifest and manifest[f]['total'] > 0] + + def tfrecords_by_subfolder(self, subfolder: str) -> List[str]: + """Return a list of all tfrecords in a specific subfolder. + + Ignores any dataset filters. + + Args: + subfolder (str): Path to subfolder to check for tfrecords. + + Returns: + List of tfrecords paths. + """ + tfrecords_list = [] + folders_to_search = [] + for source in self.sources: + if self.sources[source]['label'] is None: + continue + if not self._tfrecords_set(source): + log.warning(f"tfrecords path not set for source {source}") + continue + base_dir = join( + self.sources[source]['tfrecords'], + self.sources[source]['label'] + ) + tfrecord_path = join(base_dir, subfolder) + if not exists(tfrecord_path): + raise errors.DatasetError( + f"Unable to find subfolder [bold]{subfolder}[/] in " + f"source [bold]{source}[/], tfrecord directory: " + f"[green]{base_dir}" + ) + folders_to_search += [tfrecord_path] + for folder in folders_to_search: + tfrecords_list += glob(join(folder, "*.tfrecords")) + return tfrecords_list + + def tfrecords_folders(self) -> List[str]: + """Return folders containing tfrecords.""" + folders = [] + for source in self.sources: + if self.sources[source]['label'] is None: + continue + if not self._tfrecords_set(source): + log.warning(f"tfrecords path not set for source {source}") + continue + folders += [join( + self.sources[source]['tfrecords'], + self.sources[source]['label'] + )] + return folders + + def tfrecords_from_tiles(self, delete_tiles: bool = False) -> None: + """Create tfrecord files from a collection of raw images. + + Images must be stored in the dataset source(s) tiles directory. + + Args: + delete_tiles (bool): Remove tiles after storing in tfrecords. + + Returns: + None + """ + if not self.tile_px or not self.tile_um: + raise errors.DatasetError( + "Dataset tile_px & tile_um must be set to create TFRecords." + ) + for source in self.sources: + log.info(f'Working on dataset source {source}') + config = self.sources[source] + if not (self._tiles_set(source) and self._tfrecords_set(source)): + log.error("tiles and/or tfrecords paths not set for " + f"source {source}") + continue + tfrecord_dir = join(config['tfrecords'], config['label']) + tiles_dir = join(config['tiles'], config['label']) + if not exists(tiles_dir): + log.warn(f'No tiles found for source [bold]{source}') + continue + sf.io.write_tfrecords_multi(tiles_dir, tfrecord_dir) + self.update_manifest() + if delete_tiles: + shutil.rmtree(tiles_dir) + + def tfrecords_have_locations(self) -> bool: + """Check if TFRecords have associated tile location information.""" + for tfr in self.tfrecords(): + try: + tfr_has_loc = sf.io.tfrecord_has_locations(tfr) + except errors.TFRecordsError: + # Encountered when the TFRecord is empty. + continue + if not tfr_has_loc: + log.info(f"{tfr}: Tile location information missing.") + return False + return True + + def thumbnails( + self, + outdir: str, + size: int = 512, + roi: bool = False, + enable_downsample: bool = True + ) -> None: + """Generate square slide thumbnails with black borders of fixed size. + + Saves thumbnails to the specified directory. + + Args: + size (int, optional): Width/height of thumbnail in pixels. + Defaults to 512. + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate activations. If not supplied, will + calculate activations for all tfrecords at the tile_px/tile_um + matching the supplied model, optionally using provided filters + and filter_blank. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + roi (bool, optional): Include ROI in the thumbnail images. + Defaults to False. + enable_downsample (bool, optional): If True and a thumbnail is not + embedded in the slide file, downsampling is permitted to + accelerate thumbnail calculation. + """ + slide_list = self.slide_paths() + rois = self.rois() + log.info(f'Saving thumbnails to [green]{outdir}') + for slide_path in tqdm(slide_list, desc="Generating thumbnails..."): + log.debug(f'Working on [green]{path_to_name(slide_path)}[/]...') + try: + whole_slide = WSI(slide_path, + tile_px=1000, + tile_um=1000, + stride_div=1, + enable_downsample=enable_downsample, + rois=rois, + verbose=False) + except errors.MissingROIError: + log.info(f"Skipping {slide_path}; missing ROI") + continue + except Exception as e: + log.error( + f"Error generating thumbnail for {slide_path}: {e}" + ) + continue + if roi: + thumb = whole_slide.thumb(rois=True) + else: + thumb = whole_slide.square_thumb(size) + thumb.save(join(outdir, f'{whole_slide.name}.png')) + log.info('Thumbnail generation complete.') + + def train_val_split( + self, + *args: Any, + **kwargs: Any + ) -> Tuple["Dataset", "Dataset"]: + """Deprecated function.""" # noqa: D401 + warnings.warn( + "Dataset.train_val_split() is deprecated and will be " + "removed in a future version. Please use Dataset.split()", + DeprecationWarning + ) + return self.split(*args, **kwargs) + + def transform_tfrecords(self, dest: str, **kwargs) -> None: + """Transform TFRecords, saving to a target path. + + Tfrecords will be saved in the output directory nested by source name. + + Args: + dest (str): Destination. + + """ + if not exists(dest): + os.makedirs(dest) + total = len(self.tfrecords()) + pb = tqdm(total=total) + for source in self.sources: + log.debug(f"Working on source {source}") + tfr_dest = join(dest, source) + if not exists(tfr_dest): + os.makedirs(tfr_dest) + for tfr in self.tfrecords(source=source): + sf.io.tensorflow.transform_tfrecord( + tfr, + join(tfr_dest, basename(tfr)), + **kwargs + ) + pb.update(1) + log.info(f"Saved {total} transformed tfrecords to {dest}.") + + def torch( + self, + labels: Optional[Union[Dict[str, Any], str, pd.DataFrame]] = None, + batch_size: Optional[int] = None, + rebuild_index: bool = False, + from_wsi: bool = False, + **kwargs: Any + ) -> "DataLoader": + """Return a PyTorch DataLoader object that interleaves tfrecords. + + The returned dataloader yields a batch of (image, label) for each tile. + + Args: + labels (dict, str, or pd.DataFrame, optional): If a dict is provided, + expect a dict mapping slide names to outcome labels. If a str, + will intepret as categorical annotation header. For regression + tasks, or outcomes with manually assigned labels, pass the + first result of dataset.labels(...). If None, returns slide + instead of label. + batch_size (int): Batch size. + rebuild_index (bool): Re-build index files even if already present. + Defaults to True. + + Keyword Args: + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + chunk_size (int, optional): Chunk size for image decoding. + Defaults to 1. + drop_last (bool, optional): Drop the last non-full batch. + Defaults to False. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + incl_loc (bool, optional): Include loc_x and loc_y (image tile + center coordinates, in base / level=0 dimension) as additional + returned variables. Defaults to False. + incl_slidenames (bool, optional): Include slidenames as third returned + variable. Defaults to False. + infinite (bool, optional): Infinitely repeat data. Defaults to True. + max_size (bool, optional): Unused argument present for legacy + compatibility; will be removed. + model_type (str, optional): Used to generate random labels + (for StyleGAN2). Not required. Defaults to 'classification'. + num_replicas (int, optional): Number of GPUs or unique instances which + will have their own DataLoader. Used to interleave results among + workers without duplications. Defaults to 1. + num_workers (int, optional): Number of DataLoader workers. + Defaults to 2. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer. Defaults to None. + onehot (bool, optional): Onehot encode labels. Defaults to False. + persistent_workers (bool, optional): Sets the DataLoader + persistent_workers flag. Defaults toNone (4 if not using a SPAMS + normalizer, 1 if using SPAMS). + pin_memory (bool, optional): Pin memory to GPU. Defaults to True. + pool (multiprocessing.Pool): Shared multiprocessing pool. Useful + if from_wsi=True, for sharing a unified processing pool between + dataloaders. Defaults to None. + prefetch_factor (int, optional): Number of batches to prefetch in each + SlideflowIterator. Defaults to 1. + rank (int, optional): Worker ID to identify this worker. + Used to interleave results. + among workers without duplications. Defaults to 0 (first worker). + rois (list(str), optional): List of ROI paths. Only used if + from_wsi=True. Defaults to None. + roi_method (str, optional): Method for extracting ROIs. Only used if + from_wsi=True. Defaults to 'auto'. + standardize (bool, optional): Standardize images to mean 0 and + variance of 1. Defaults to True. + tile_um (int, optional): Size of tiles to extract from WSI, in + microns. Only used if from_wsi=True. Defaults to None. + transform (Callable, optional): Arbitrary torchvision transform + function. Performs transformation after augmentations but + before standardization. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + + """ + from slideflow.io.torch import interleave_dataloader + + if isinstance(labels, str) and not exists(labels): + labels = self.labels(labels)[0] + if self.tile_px is None: + raise errors.DatasetError("tile_px and tile_um must be non-zero" + "to create dataloaders.") + if self._clip not in (None, {}) and from_wsi: + log.warning("Dataset clipping is disabled when `from_wsi=True`") + + if from_wsi: + tfrecords = self.slide_paths() + kwargs['rois'] = self.rois() + kwargs['tile_um'] = self.tile_um + kwargs['img_size'] = self.tile_px + indices = None + clip = None + else: + self.build_index(rebuild_index) + tfrecords = self.tfrecords() + if not tfrecords: + raise errors.TFRecordsNotFoundError + self.verify_img_format(progress=False) + _idx_dict = self.load_indices() + indices = [_idx_dict[path_to_name(tfr)] for tfr in tfrecords] + clip = self._clip + + if self.prob_weights: + prob_weights = [self.prob_weights[tfr] for tfr in tfrecords] + else: + prob_weights = None + + return interleave_dataloader(tfrecords=tfrecords, + batch_size=batch_size, + labels=labels, + num_tiles=self.num_tiles, + prob_weights=prob_weights, + clip=clip, + indices=indices, + from_wsi=from_wsi, + **kwargs) + + def unclip(self) -> "Dataset": + """Return a dataset object with all clips removed. + + Returns: + :class:`slideflow.Dataset`: Dataset with clips removed. + + """ + ret = copy.deepcopy(self) + ret._clip = {} + return ret + + def update_manifest(self, force_update: bool = False) -> None: + """Update tfrecord manifests. + + Args: + forced_update (bool, optional): Force regeneration of the + manifests from scratch. + + """ + tfrecords_folders = self.tfrecords_folders() + for tfr_folder in tfrecords_folders: + sf.io.update_manifest_at_dir( + directory=tfr_folder, + force_update=force_update + ) + + def update_annotations_with_slidenames( + self, + annotations_file: str + ) -> None: + """Automatically associated slide names and paths in the annotations. + + Attempts to automatically associate slide names from a directory + with patients in a given annotations file, skipping any slide names + that are already present in the annotations file. + + Args: + annotations_file (str): Path to annotations file. + + """ + header, _ = sf.util.read_annotations(annotations_file) + slide_list = self.slide_paths(apply_filters=False) + + # First, load all patient names from the annotations file + try: + patient_index = header.index('patient') + except ValueError: + raise errors.AnnotationsError( + f"Patient header {'patient'} not found in annotations." + ) + patients = [] + pt_to_slide = {} + with open(annotations_file) as csv_file: + csv_reader = csv.reader(csv_file, delimiter=',') + header = next(csv_reader) + for row in csv_reader: + patients.extend([row[patient_index]]) + patients = list(set(patients)) + log.debug(f"Number of patients in annotations: {len(patients)}") + log.debug(f"Slides found: {len(slide_list)}") + + # Then, check for sets of slides that would match to the same patient; + # due to ambiguity, these will be skipped. + n_occur = {} + for slide in slide_list: + if _shortname(slide) not in n_occur: + n_occur[_shortname(slide)] = 1 + else: + n_occur[_shortname(slide)] += 1 + slides_to_skip = [s for s in slide_list if n_occur[_shortname(s)] > 1] + + # Next, search through the slides folder for all valid slide files + for file in slide_list: + slide = path_to_name(file) + # First, skip this slide due to ambiguity if needed + if slide in slides_to_skip: + log.warning(f"Skipping slide {slide} due to ambiguity") + # Then, make sure the shortname and long name + # aren't both in the annotation file + if ((slide != _shortname(slide)) + and (slide in patients) + and (_shortname(slide) in patients)): + log.warning(f"Skipping slide {slide} due to ambiguity") + # Check if either the slide name or the shortened version + # are in the annotation file + if any(x in patients for x in [slide, _shortname(slide)]): + slide = slide if slide in patients else _shortname(slide) + pt_to_slide.update({slide: slide}) + + # Now, write the assocations + n_updated = 0 + n_missing = 0 + with open(annotations_file) as csv_file: + csv_reader = csv.reader(csv_file, delimiter=',') + header = next(csv_reader) + with open('temp.csv', 'w') as csv_outfile: + csv_writer = csv.writer(csv_outfile, delimiter=',') + + # Write to existing "slide" column in the annotations file, + # otherwise create new column + try: + slide_index = header.index('slide') + except ValueError: + header.extend(['slide']) + csv_writer.writerow(header) + for row in csv_reader: + patient = row[patient_index] + if patient in pt_to_slide: + row.extend([pt_to_slide[patient]]) + n_updated += 1 + else: + row.extend([""]) + n_missing += 1 + csv_writer.writerow(row) + else: + csv_writer.writerow(header) + for row in csv_reader: + pt = row[patient_index] + # Only write column if no slide is in the annotation + if (pt in pt_to_slide) and (row[slide_index] == ''): + row[slide_index] = pt_to_slide[pt] + n_updated += 1 + elif ((pt not in pt_to_slide) + and (row[slide_index] == '')): + n_missing += 1 + csv_writer.writerow(row) + if n_updated: + log.info(f"Done; associated slides with {n_updated} annotations.") + if n_missing: + log.info(f"Slides not found for {n_missing} annotations.") + elif n_missing: + log.debug(f"Slides missing for {n_missing} annotations.") + else: + log.debug("Annotations up-to-date, no changes made.") + + # Finally, backup the old annotation file and overwrite + # existing with the new data + backup_file = f"{annotations_file}.backup" + if exists(backup_file): + os.remove(backup_file) + assert isinstance(annotations_file, str) + shutil.move(annotations_file, backup_file) + shutil.move('temp.csv', annotations_file) + + def verify_annotations_slides(self) -> None: + """Verify that annotations are correctly loaded.""" + if self.annotations is None: + log.warn("Annotations not loaded.") + return + + # Verify no duplicate slide names are found + ann = self.annotations.loc[self.annotations.slide.isin(self.slides())] + if not ann.slide.is_unique: + raise errors.AnnotationsError( + "Duplicate slide names detected in the annotation file." + ) + + # Verify that there are no tfrecords with the same name. + # This is a problem because the tfrecord name is used to + # identify the slide. + tfrecords = self.tfrecords() + if len(tfrecords): + tfrecord_names = [sf.util.path_to_name(tfr) for tfr in tfrecords] + if not len(set(tfrecord_names)) == len(tfrecord_names): + duplicate_tfrs = [ + tfr for tfr in tfrecords + if tfrecord_names.count(sf.util.path_to_name(tfr)) > 1 + ] + raise errors.AnnotationsError( + "Multiple TFRecords with the same names detected: {}".format( + ', '.join(duplicate_tfrs) + ) + ) + + # Verify all slides in the annotation column are valid + n_missing = len(self.annotations.loc[ + (self.annotations.slide.isin(['', ' ']) + | self.annotations.slide.isna()) + ]) + if n_missing == 1: + log.warn("1 patient does not have a slide assigned.") + if n_missing > 1: + log.warn(f"{n_missing} patients do not have a slide assigned.") + + def verify_img_format(self, *, progress: bool = True) -> Optional[str]: + """Verify that all tfrecords have the same image format (PNG/JPG). + + Returns: + str: image format (png or jpeg) + + """ + tfrecords = self.tfrecords() + if len(tfrecords): + with mp.Pool(sf.util.num_cpu(), + initializer=sf.util.set_ignore_sigint) as pool: + img_formats = [] + mapped = pool.imap_unordered( + sf.io.detect_tfrecord_format, + tfrecords + ) + if progress: + mapped = track( + mapped, + description="Verifying tfrecord formats...", + transient=True + ) + for *_, fmt in mapped: + if fmt is not None: + img_formats += [fmt] + if len(set(img_formats)) > 1: + log_msg = "Mismatched TFRecord image formats:\n" + for tfr, fmt in zip(tfrecords, img_formats): + log_msg += f"{tfr}: {fmt}\n" + log.error(log_msg) + raise errors.MismatchedImageFormatsError( + "Mismatched TFRecord image formats detected" + ) + if len(img_formats): + return img_formats[0] + else: + return None + else: + return None + + def verify_slide_names(self, allow_errors: bool = False) -> bool: + """Verify that slide names inside TFRecords match the file names. + + Args: + allow_errors (bool): Do not raise an error if there is a mismatch. + Defaults to False. + + Returns: + bool: If all slide names inside TFRecords match the TFRecord + file names. + + Raises: + sf.errors.MismatchedSlideNamesError: If any slide names inside + TFRecords do not match the TFRecord file names, + and allow_errors=False. + + """ + tfrecords = self.tfrecords() + if len(tfrecords): + pb = track( + tfrecords, + description="Verifying tfrecord slide names...", + transient=True + ) + for tfr in pb: + first_record = sf.io.get_tfrecord_by_index(tfr, 0) + if first_record['slide'] == sf.util.path_to_name(tfr): + continue + elif allow_errors: + return False + else: + raise errors.MismatchedSlideNamesError( + "Mismatched slide name in TFRecord {}: expected slide " + "name {} based on filename, but found {}. ".format( + tfr, + sf.util.path_to_name(tfr), + first_record['slide'] + ) + ) + return True
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/gan/interpolate/index.html b/docs/_modules/slideflow/gan/interpolate/index.html new file mode 100644 index 000000000..fef780084 --- /dev/null +++ b/docs/_modules/slideflow/gan/interpolate/index.html @@ -0,0 +1,1123 @@ + + + + + + + + + + + + slideflow.gan.interpolate — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.gan.interpolate

+"""Tool to assist with embedding interpolation for a class-conditional GAN."""
+
+from typing import (Generator, List, Optional, Tuple, Union,
+                    TYPE_CHECKING, Any, Iterable)
+import warnings
+import numpy as np
+import pandas as pd
+import slideflow as sf
+import json
+from os.path import join, dirname, exists
+from PIL import Image
+from tqdm import tqdm
+from functools import partial
+
+from slideflow.gan.utils import crop, noise_tensor
+from slideflow import errors
+
+
+if TYPE_CHECKING:
+    import torch
+    import tensorflow as tf
+
+
[docs]class StyleGAN2Interpolator: + +
[docs] def __init__( + self, + gan_pkl: str, + start: int, + end: int, + *, + device: Optional["torch.device"] = None, + target_um: Optional[int] = None, + target_px: Optional[int] = None, + gan_um: Optional[int] = None, + gan_px: Optional[int] = None, + noise_mode: str = 'const', + truncation_psi: int = 1, + **gan_kwargs + ) -> None: + """Coordinates class and embedding interpolation for a trained + class-conditional StyleGAN2. + + Args: + gan_pkl (str): Path to saved network pkl. + start (int): Starting class index. + end (int): Ending class index. + + Keyword Args: + device (torch.device, optional): Torch device. If None, will + automatically select a GPU if available. Defaults to None. + target_um (int, optional): Target size in microns for the + interpolated images. GAN output will be cropped/resized to match + this target. If None, will match GAN output. Defaults to None. + target_px (int, optional): Target size in pixels for the + interpolated images. GAN output will be cropped/resized to match + this target. If None, will match GAN output. Defaults to None. + gan_um (int, optional): Size in microns of the GAN output. If None, + will attempt to auto-detect from training_options.json. + Defaults to None. + gan_px (int, optional): Size in pixels of the GAN output. If None, + will attempt to auto-detect from training_options.json. + Defaults to None. + noise_mode (str, optional): Noise mode for GAN. Defaults to 'const'. + truncation_psi (int, optional): Truncation psi for GAN. + Defaults to 1. + **gan_kwargs: Additional keyword arguments for GAN inference. + + """ + from slideflow.model.torch_utils import get_device + from slideflow.gan.stylegan2.stylegan2 import embedding + + training_options = join(dirname(gan_pkl), 'training_options.json') + if exists(training_options): + with open(training_options, 'r') as f: + opt = json.load(f) + if 'slideflow_kwargs' in opt: + _gan_px = opt['slideflow_kwargs']['tile_px'] + _gan_um = opt['slideflow_kwargs']['tile_um'] + if gan_px != gan_px or _gan_um != _gan_um: + sf.log.warn("Provided GAN tile size (gan_px={}, gan_um={}) does " + "not match training_options.json (gan_px={}, " + "gan_um={})".format(gan_px, gan_um, _gan_px, _gan_um)) + if gan_px is None: + gan_px = _gan_px + if gan_um is None: + gan_um = _gan_um + if gan_px is None or gan_um is None: + raise ValueError("Unable to auto-detect gan_px/gan_um from " + "training_options.json. Must be set with gan_um " + "and gan_px.") + if target_px is None: + target_px = gan_px + if target_um is None: + target_um = gan_um + if device is None: + device = get_device() + self.E_G, self.G = embedding.load_embedding_gan(gan_pkl, device) + self.device = device + self.gan_kwargs = dict( + noise_mode=noise_mode, + truncation_psi=truncation_psi, + **gan_kwargs) + self.embeddings = embedding.get_embeddings(self.G, device=device) + self.embed0 = self.embeddings[start] + self.embed1 = self.embeddings[end] + self.features = None # type: Optional[sf.model.Features] + self.normalizer = None + self.target_px = target_px + self.crop_kw = dict( + gan_um=gan_um, + gan_px=gan_px, + target_um=target_um, + ) + self._classifier_backend = sf.backend()
+ + def _crop_and_convert_to_uint8(self, img: "torch.Tensor") -> Any: + """Convert a batch of GAN images to a resized/cropped uint8 tensor. + + Args: + img (torch.Tensor): Raw GAN output images (torch.float32) + + Returns: + Any: GAN images (torch.uint8) + """ + import torch + import slideflow.io.torch + if self._classifier_backend == 'tensorflow': + import tensorflow as tf + dtype = tf.uint8 + elif self._classifier_backend == 'torch': + dtype = torch.uint8 + else: + raise errors.UnrecognizedBackendError + img = crop(img, **self.crop_kw) # type: ignore + img = (img * 127.5 + 128).clamp(0, 255).to(torch.uint8) + img = sf.io.torch.preprocess_uint8(img, standardize=False, resize_px=self.target_px) + return sf.io.convert_dtype(img, dtype) + + def _preprocess_from_uint8( + self, + img: Any, + normalize: bool, + standardize: bool, + ) -> Any: + """Convert and resize a batch of uint8 tensors to + standardized/normalized tensors ready for input to the + classifier/feature model. + + Args: + img (Any): GAN images (uint8) + normalize (bool): Normalize the images. + standardize (bool): Standardize the images. + + Returns: + Any: Resized GAN images (uint8 or float32 if standardize=True) + """ + normalizer = self.normalizer if normalize else None + if self._classifier_backend == 'tensorflow': + return sf.io.tensorflow.preprocess_uint8( + img, + normalizer=normalizer, + standardize=standardize)['tile_image'] + elif self._classifier_backend == 'torch': + return sf.io.torch.preprocess_uint8( + img, + normalizer=normalizer, + standardize=standardize) + else: + raise errors.UnrecognizedBackendError + + def _standardize(self, img: Any) -> Any: + """Standardize image from uint8 to float. + + Args: + img (Any): uint8 image tensor. + + Returns: + Any: Standardized float image tensor. + """ + if self._classifier_backend == 'tensorflow': + import tensorflow as tf + return sf.io.convert_dtype(img, tf.float32) + elif self._classifier_backend == 'torch': + import torch + return sf.io.convert_dtype(img, torch.float32) + else: + raise errors.UnrecognizedBackendError + + def _build_gan_dataset(self, generator) -> Iterable: + """Build a dataset from a given GAN generator. + + Args: + generator (Generator): Python generator which yields cropped + (but not resized) uint8 tensors. + + Returns: + Iterable: Iterable dataset which yields processed (resized and + normalized) images. + """ + if self._classifier_backend == 'tensorflow': + import tensorflow as tf + + sig = tf.TensorSpec(shape=(None, self.target_px, self.target_px, 3), dtype=tf.uint8) + dts = tf.data.Dataset.from_generator(generator, output_signature=sig) + return dts.map( + partial( + sf.io.tensorflow.preprocess_uint8, + normalizer=self.normalizer), + num_parallel_calls=tf.data.AUTOTUNE, + deterministic=True + ) + elif self._classifier_backend == 'torch': + return map( + partial( + sf.io.torch.preprocess_uint8, + normalizer=self.normalizer), + generator()) + else: + raise errors.UnrecognizedBackendError + +
[docs] def z(self, seed: Union[int, List[int]]) -> "torch.Tensor": + """Returns a noise tensor for a given seed. + + Args: + seed (int): Seed. + + Returns: + torch.tensor: Noise tensor for the corresponding seed. + """ + import torch + if isinstance(seed, int): + return noise_tensor(seed, self.E_G.z_dim).to(self.device) # type: ignore + elif isinstance(seed, list): + return torch.stack( + [noise_tensor(s, self.E_G.z_dim).to(self.device) + for s in seed], + dim=0) + else: + raise ValueError(f"Unrecognized seed: {seed}")
+ + def set_feature_model(self, *args, **kwargs): + warnings.warn( + "StyleGAN2Interpolator.set_feature_model() is deprecated. " + "Please use .set_classifier() instead.", + DeprecationWarning) + return self.set_classifier(*args, **kwargs) + +
[docs] def set_classifier( + self, + path: str, + layers: Optional[Union[str, List[str]]] = None, + **kwargs + ) -> None: + """Configures a classifier model to be used for generating features + and predictions during interpolation. + + Args: + path (str): Path to trained model. + layers (Union[str, List[str]], optional): Layers from which to + calculate activations for interpolated images. + Defaults to None. + """ + if sf.util.is_tensorflow_model_path(path): + from slideflow.model.tensorflow import Features + import slideflow.io.tensorflow + self.features = Features( + path, + layers=layers, + include_preds=True, + **kwargs) + self.normalizer = self.features.wsi_normalizer # type: ignore + self._classifier_backend = 'tensorflow' + elif sf.util.is_torch_model_path(path): + from slideflow.model.torch import Features + import slideflow.io.torch + self.features = Features( + path, + layers=layers, + include_preds=True, + **kwargs) + self.normalizer = self.features.wsi_normalizer # type: ignore + self._classifier_backend = 'torch' + else: + raise ValueError(f"Unrecognized backend for model {path}")
+ + + +
[docs] def plot_comparison( + self, + seeds: Union[int, List[int]], + titles: Optional[List[str]] = None + ) -> None: + """Plots side-by-side comparison of images from the starting + and ending interpolation classes. + + Args: + seeds (int or list(int)): Seeds to display. + """ + import matplotlib.pyplot as plt + + if not isinstance(seeds, list): + seeds = [seeds] + if titles is None: + titles = ['Start', 'End'] + assert len(titles) == 2 + + def _process_to_pil(_img): + _img = self._crop_and_convert_to_uint8(_img) + _img = self._preprocess_from_uint8(_img, standardize=False, normalize=False) + if self._classifier_backend == 'torch': + _img = sf.io.torch.cwh_to_whc(_img) + return Image.fromarray(sf.io.convert_dtype(_img[0], np.uint8)) + + scale = 5 + fig, ax = plt.subplots(len(seeds), 2, figsize=(2 * scale, len(seeds) * scale)) + for s, seed in enumerate(seeds): + img0 = _process_to_pil(self.generate_start(seed)) + img1 = _process_to_pil(self.generate_end(seed)) + + if len(seeds) == 1: + _ax0 = ax[0] + _ax1 = ax[1] + else: + _ax0 = ax[s, 0] + _ax1 = ax[s, 1] + if s == 0: + _ax0.set_title(titles[0]) + _ax1.set_title(titles[1]) + _ax0.imshow(img0) + _ax1.imshow(img1) + _ax0.axis('off') + _ax1.axis('off') + + fig.subplots_adjust(wspace=0.05, hspace=0)
+ +
[docs] def generate(self, seed: Union[int, List[int]], embedding: "torch.Tensor") -> "torch.Tensor": + """Generate an image from a given embedding. + + Args: + seed (int): Seed for noise vector. + embedding (torch.Tensor): Class embedding. + + Returns: + torch.Tensor: Image (float32, shape=(1, 3, height, width)) + """ + z = self.z(seed) + if z.shape[0] == 1 and embedding.shape[0] > 1: + z = z.repeat(embedding.shape[0], 1) + elif z.shape[0] > 1 and embedding.shape[0] == 1: + embedding = embedding.repeat(z.shape[0], 1) + return self.E_G(z, embedding, **self.gan_kwargs)
+ +
[docs] def generate_start(self, seed: int) -> "torch.Tensor": + """Generate an image from the starting class. + + Args: + seed (int): Seed for noise vector. + + Returns: + torch.Tensor: Image (float32, shape=(1, 3, height, width)) + """ + return self.generate(seed, self.embed0)
+ +
[docs] def generate_end(self, seed: int) -> "torch.Tensor": + """Generate an image from the ending class. + + Args: + seed (int): Seed for noise vector. + + Returns: + torch.Tensor: Image (float32, shape=(1, 3, height, width)) + """ + return self.generate(seed, self.embed1)
+ +
[docs] def generate_np_from_embedding( + self, + seed: int, + embedding: "torch.Tensor" + ) -> np.ndarray: + """Generate a numpy image from a given embedding. + + Args: + seed (int): Seed for noise vector. + embedding (torch.Tensor): Class embedding. + + Returns: + np.ndarray: Image (uint8, shape=(height, width, 3)) + """ + import torch + img = self.generate(seed, embedding) + img = (img * 127.5 + 128).clamp(0, 255).to(torch.uint8)[0] + img = img.permute(1, 2, 0) + return sf.io.convert_dtype(img, np.uint8)
+ +
[docs] def generate_np_start(self, seed: int) -> np.ndarray: + """Generate a numpy image from the starting class. + + Args: + seed (int): Seed for noise vector. + + Returns: + np.ndarray: Image (uint8, shape=(height, width, 3)) + """ + return self.generate_np_from_embedding(seed, self.embed0)
+ +
[docs] def generate_np_end(self, seed: int) -> np.ndarray: + """Generate a numpy image from the ending class. + + Args: + seed (int): Seed for noise vector. + + Returns: + np.ndarray: Image (uint8, shape=(height, width, 3)) + """ + return self.generate_np_from_embedding(seed, self.embed1)
+ +
[docs] def generate_tf_from_embedding( + self, + seed: Union[int, List[int]], + embedding: "torch.Tensor" + ) -> Tuple["tf.Tensor", "tf.Tensor"]: + """Create a processed Tensorflow image from the GAN output from a given + seed and embedding. + + Args: + seed (int): Seed for noise vector. + embedding (torch.tensor): Class embedding. + + Returns: + A tuple containing + + tf.Tensor: Unprocessed resized image, uint8. + + tf.Tensor: Processed resized image, standardized and normalized. + """ + gan_out = self.generate(seed, embedding) + gan_out = self._crop_and_convert_to_uint8(gan_out) + gan_out = self._preprocess_from_uint8(gan_out, standardize=False, normalize=True) + standardized = self._standardize(gan_out) + if isinstance(seed, list) or (len(embedding.shape) > 1 and embedding.shape[0] > 1): + return gan_out, standardized + else: + return gan_out[0], standardized[0]
+ +
[docs] def generate_tf_start(self, seed: int) -> Tuple["tf.Tensor", "tf.Tensor"]: + """Create a processed Tensorflow image from the GAN output of a given + seed and the starting class embedding. + + Args: + seed (int): Seed for noise vector. + + Returns: + A tuple containing + + tf.Tensor: Unprocessed image (tf.Tensor), uint8. + + tf.Tensor: Processed image (tf.Tensor), standardized and normalized. + """ + return self.generate_tf_from_embedding(seed, self.embed0)
+ +
[docs] def generate_tf_end(self, seed: int) -> Tuple["tf.Tensor", "tf.Tensor"]: + """Create a processed Tensorflow image from the GAN output of a given + seed and the ending class embedding. + + Args: + seed (int): Seed for noise vector. + + Returns: + A tuple containing + + tf.Tensor: Unprocessed resized image, uint8. + + tf.Tensor: Processed resized image, standardized and normalized. + """ + return self.generate_tf_from_embedding(seed, self.embed1)
+ +
[docs] def class_interpolate(self, seed: int, steps: int = 100) -> Generator: + """Sets up a generator that returns images during class embedding + interpolation. + + Args: + seed (int): Seed for random noise vector. + steps (int, optional): Number of steps for interpolation. + Defaults to 100. + + Returns: + Generator: Generator which yields images (torch.tensor, uint8) + during interpolation. + + Yields: + Generator: images (torch.tensor, dtype=uint8) + """ + from slideflow.gan.stylegan2.stylegan2 import embedding + + return embedding.class_interpolate( + self.E_G, + self.z(seed), + self.embed0, + self.embed1, + device=self.device, + steps=steps, + **self.gan_kwargs + )
+ +
[docs] def linear_interpolate(self, seed: int, steps: int = 100) -> Generator: + """Sets up a generator that returns images during linear label + interpolation. + + Args: + seed (int): Seed for random noise vector. + steps (int, optional): Number of steps for interpolation. + Defaults to 100. + + Returns: + Generator: Generator which yields images (torch.tensor, uint8) + during interpolation. + + Yields: + Generator: images (torch.tensor, dtype=uint8) + """ + from slideflow.gan.stylegan2.stylegan2 import embedding + + return embedding.linear_interpolate( + self.G, + self.z(seed), + device=self.device, + steps=steps, + **self.gan_kwargs + )
+ +
[docs] def interpolate_and_predict( + self, + seed: int, + steps: int = 100, + outcome_idx: int = 0, + ) -> Tuple[List, ...]: + """Interpolates between starting and ending classes for a seed, + recording raw images, processed images, and predictions. + + Args: + seed (int): Seed for random noise vector. + steps (int, optional): Number of steps during interpolation. + Defaults to 100. + + Returns: + Tuple[List, ...]: Raw images, processed images, and predictions. + """ + if not isinstance(seed, int): + raise ValueError("Seed must be an integer.") + + import torch + import matplotlib.pyplot as plt + import seaborn as sns + + imgs = [] + proc_imgs = [] + preds = [] + + for img in tqdm(self.class_interpolate(seed, steps), + total=steps, + desc=f"Working on seed {seed}..."): + img = torch.from_numpy(np.expand_dims(img, axis=0)).permute(0, 3, 1, 2) + img = (img / 127.5) - 1 + img = self._crop_and_convert_to_uint8(img) + img = self._preprocess_from_uint8(img, standardize=False, normalize=True) + processed_img = self._standardize(img) + img = sf.io.convert_dtype(img, np.float32)[0] + + if self.features is not None: + pred = self.features(processed_img)[-1] + if self._classifier_backend == 'torch': + pred = pred.cpu() + pred = pred.numpy() + preds += [pred[0][outcome_idx]] + imgs += [img] + proc_imgs += [processed_img[0]] + + sns.lineplot(x=range(len(preds)), y=preds, label=f"Seed {seed}") + plt.axhline(y=0, color='black', linestyle='--') + plt.title("Prediction during interpolation") + plt.xlabel("Interpolation Step (Start -> End)") + plt.ylabel("Prediction") + + return imgs, proc_imgs, preds
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/gan/utils/index.html b/docs/_modules/slideflow/gan/utils/index.html new file mode 100644 index 000000000..604558e78 --- /dev/null +++ b/docs/_modules/slideflow/gan/utils/index.html @@ -0,0 +1,454 @@ + + + + + + + + + + + + slideflow.gan.utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.gan.utils

+import numpy as np
+from typing import Any, TYPE_CHECKING
+
+if TYPE_CHECKING:
+    import torch
+
+
[docs]def crop( + img: "torch.Tensor", + gan_um: int, + gan_px: int, + target_um: int +) -> Any: + """Process a batch of raw GAN output, converting to a Tensorflow tensor. + + Args: + img (torch.Tensor): Raw batch of GAN images. + gan_um (int, optional): Size of gan output images, in microns. + gan_px (int, optional): Size of gan output images, in pixels. + target_um (int, optional): Size of target images, in microns. + Will crop image to meet this target. + + Returns: + Cropped image. + """ + from torchvision import transforms + + # Calculate parameters for resize/crop. + crop_factor = target_um / gan_um + crop_width = int(crop_factor * gan_px) + left = int(gan_px/2 - crop_width/2) + upper = int(gan_px/2 - crop_width/2) + + # Perform crop/resize and convert to tensor + return transforms.functional.crop(img, upper, left, crop_width, crop_width)
+ + +
[docs]def noise_tensor(seed: int, z_dim: int) -> "torch.Tensor": + """Creates a noise tensor based on a given seed and dimension size. + + Args: + seed (int): Seed. + z_dim (int): Dimension of noise vector to create. + + Returns: + torch.Tensor: Noise vector of shape (1, z_dim) + """ + import torch + return torch.from_numpy(np.random.RandomState(seed).randn(1, z_dim))
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/grad/index.html b/docs/_modules/slideflow/grad/index.html new file mode 100644 index 000000000..405489a02 --- /dev/null +++ b/docs/_modules/slideflow/grad/index.html @@ -0,0 +1,853 @@ + + + + + + + + + + + + slideflow.grad — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.grad

+"""Submodule for calculating/displaying pixel attribution (saliency maps)."""
+
+from typing import Any, Callable, Dict, Optional
+
+import slideflow as sf
+import numpy as np
+import saliency.core as saliency
+from functools import partial
+from slideflow import errors
+from slideflow.grad.plot_utils import (comparison_plot, inferno, multi_plot,
+                                       oranges, overlay,
+                                       saliency_map_comparison)
+
+VANILLA = 0
+VANILLA_SMOOTH = 1
+INTEGRATED_GRADIENTS = 2
+INTEGRATED_GRADIENTS_SMOOTH = 3
+GUIDED_INTEGRATED_GRADIENTS = 4
+GUIDED_INTEGRATED_GRADIENTS_SMOOTH = 5
+BLUR_INTEGRATED_GRADIENTS = 6
+BLUR_INTEGRATED_GRADIENTS_SMOOTH = 7
+XRAI = 8
+XRAI_FAST = 9
+
+
[docs]class SaliencyMap: + +
[docs] def __init__(self, model: Callable, class_idx: int) -> None: + """Class to assist with calculation and display of saliency maps. + + Args: + model (Callable): Differentiable model from which saliency is + calculated. + class_idx (int): Index of class for backpropagating gradients. + """ + if not callable(model): + raise ValueError("'model' must be a differentiable model.") + self.model = model + self.feature_model = None + self.feature_model_layer = None + self.class_idx = class_idx + self.gradients = saliency.GradientSaliency() + self.gradcam_grads = saliency.GradCam() + self.ig = saliency.IntegratedGradients() + self.guided_ig = saliency.GuidedIG() + self.blur_ig = saliency.BlurIG() + self.xrai_grads = saliency.XRAI() + self.fast_xrai_params = saliency.XRAIParameters() + self.fast_xrai_params.algorithm = 'fast' + self._fn_map = { + VANILLA: self.vanilla, + VANILLA_SMOOTH: partial(self.vanilla, smooth=True), + INTEGRATED_GRADIENTS: self.integrated_gradients, + INTEGRATED_GRADIENTS_SMOOTH: partial(self.integrated_gradients, smooth=True), + GUIDED_INTEGRATED_GRADIENTS: self.guided_integrated_gradients, + GUIDED_INTEGRATED_GRADIENTS_SMOOTH: partial(self.guided_integrated_gradients, smooth=True), + BLUR_INTEGRATED_GRADIENTS: self.blur_integrated_gradients, + BLUR_INTEGRATED_GRADIENTS_SMOOTH: partial(self.blur_integrated_gradients), + XRAI: self.xrai, + XRAI_FAST: self.xrai_fast + }
+ + @property + def model_backend(self): + return sf.util.model_backend(self.model) + + @property + def device(self): + if self.model_backend == 'tensorflow': + return None + else: + return next(self.model.parameters()).device + + def _update_feature_model(self, layer): + if self.feature_model_layer == layer: + return + + # Cleanup old model + del self.feature_model + self.feature_model = None + self.feature_model_layer = layer + + if self.model_backend == 'tensorflow': + import tensorflow as tf + flattened = sf.model.tensorflow_utils.flatten(self.model) + conv_layer = flattened.get_layer(layer) + self.feature_model = tf.keras.models.Model([flattened.inputs], [conv_layer.output, flattened.output]) + else: + import torch + from slideflow.model import torch_utils + conv_layer = torch_utils.get_module_by_name(self.model, layer) + self._torch_conv_layer_outputs = {} + def conv_layer_forward(m, i, o): + # move the RGB dimension to the last dimension + self._torch_conv_layer_outputs[saliency.base.CONVOLUTION_LAYER_VALUES] = torch.movedim(o, 1, 3).detach().numpy() + def conv_layer_backward(m, i, o): + # move the RGB dimension to the last dimension + self._torch_conv_layer_outputs[saliency.base.CONVOLUTION_OUTPUT_GRADIENTS] = torch.movedim(o[0], 1, 3).detach().numpy() + conv_layer.register_forward_hook(conv_layer_forward) + conv_layer.register_full_backward_hook(conv_layer_backward) + + def _grad_fn_torch( + self, + image: np.ndarray, + call_model_args: Any = None, + expected_keys: Dict = None + ) -> Any: + """Calculate gradient attribution with PyTorch backend. + + Images are expected to be in W, H, C format. + + """ + import torch + from slideflow.io.torch import whc_to_cwh + image = torch.tensor(image, requires_grad=True).to(torch.float32).to(self.device) # type: ignore + output = self.model(whc_to_cwh(image)) + if saliency.base.INPUT_OUTPUT_GRADIENTS in expected_keys: # type: ignore + outputs = output[:, self.class_idx] + grads = torch.autograd.grad(outputs, image, grad_outputs=torch.ones_like(outputs)) # type: ignore + gradients = grads[0].cpu().detach().numpy() + return {saliency.base.INPUT_OUTPUT_GRADIENTS: gradients} + else: + # For Grad-CAM + one_hot = torch.zeros_like(output) + one_hot[:, self.class_idx] = 1 + self.model.zero_grad() # type: ignore + output.backward(gradient=one_hot, retain_graph=True) + return self._torch_conv_layer_outputs + + def _grad_fn_tf( + self, + image: np.ndarray, + call_model_args: Any = None, + expected_keys: Dict = None + ) -> Any: + """Calculate gradient attribution with Tensorflow backend.""" + import tensorflow as tf + + image = tf.convert_to_tensor(image) + with tf.GradientTape() as tape: + if expected_keys == [saliency.base.INPUT_OUTPUT_GRADIENTS]: + # For vanilla gradient, Integrated Gradients, XRAI + tape.watch(image) + output = self.model(image)[:, self.class_idx] + gradients = tape.gradient(output, image) + return {saliency.base.INPUT_OUTPUT_GRADIENTS: gradients} + else: + # For Grad-CAM + conv_layer, output_layer = self.feature_model(image) + gradients = np.array(tape.gradient(output_layer, conv_layer)) + return {saliency.base.CONVOLUTION_LAYER_VALUES: conv_layer, + saliency.base.CONVOLUTION_OUTPUT_GRADIENTS: gradients} + + def _grad_fn( + self, + image: np.ndarray, + call_model_args: Any = None, + expected_keys: Dict = None + ) -> Any: + """Calculate gradient attribution.""" + if self.model_backend == 'tensorflow': + return self._grad_fn_tf(image, call_model_args, expected_keys) + elif self.model_backend == 'torch': + return self._grad_fn_torch(image, call_model_args, expected_keys) + else: + raise errors.UnrecognizedBackendError + + def _apply_mask_fn( + self, + img: np.ndarray, + grads: saliency.CoreSaliency, + baseline: bool = False, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Applys a saliency masking function to a gradients map. + + Args: + img (np.ndarray or list(np.ndarray)): Image or list of images. + grads (saliency.CoreSaliency): Gradients for saliency. + baseline (bool): Requires x_baseline argument. + smooth (bool): Use a smoothed mask. + + Returns: + np.ndarray: Saliency map. + """ + mask_fn = grads.GetSmoothedMask if smooth else grads.GetMask + + def _get_mask(_img): + if baseline: + kwargs.update({'x_baseline': np.zeros(_img.shape)}) + out = mask_fn(_img, self._grad_fn, **kwargs) + return out + + if isinstance(img, list): + # Normalize together + image_3d = list(map(_get_mask, img)) + v_maxes, v_mins = zip(*[max_min(img3d) for img3d in image_3d]) + vmax = max(v_maxes) + vmin = min(v_mins) + return [grayscale(img3d, vmax=vmax, vmin=vmin) for img3d in image_3d] + else: + return grayscale(_get_mask(img)) + +
[docs] def all(self, img: np.ndarray) -> Dict: + """Calculate all saliency map methods. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + + Returns: + Dict: Dictionary mapping name of saliency method to saliency map. + """ + return { + 'Vanilla': self.vanilla(img), + 'Vanilla (Smoothed)': self.vanilla(img, smooth=True), + 'Integrated Gradients': self.integrated_gradients(img), + 'Integrated Gradients (Smooth)': self.integrated_gradients(img, smooth=True), + 'Guided Integrated Gradients': self.guided_integrated_gradients(img), + 'Guided Integrated Gradients (Smooth)': self.guided_integrated_gradients(img, smooth=True), + 'Blur Integrated Gradients': self.blur_integrated_gradients(img), + 'Blur Integrated Gradients (Smooth)': self.blur_integrated_gradients(img, smooth=True), + }
+ + def get(self, img: np.ndarray, method: int) -> np.ndarray: + return self._fn_map[method](img) + +
[docs] def vanilla( + self, + img: np.ndarray, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Calculate gradient-based saliency map. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + smooth (bool, optional): Smooth gradients. Defaults to False. + + Returns: + np.ndarray: Saliency map. + """ + return self._apply_mask_fn( + img, + self.gradients, + smooth=smooth, + **kwargs + )
+ +
[docs] def gradcam( + self, + img: np.ndarray, + layer: str, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Calculate gradient-based saliency map. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + smooth (bool, optional): Smooth gradients. Defaults to False. + + Returns: + np.ndarray: Saliency map. + """ + self._update_feature_model(layer) + return self._apply_mask_fn( + img, + self.gradcam_grads, + smooth=smooth, + **kwargs + )
+ +
[docs] def integrated_gradients( + self, + img: np.ndarray, + x_steps: int = 25, + batch_size: int = 20, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Calculate saliency map using integrated gradients. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + x_steps (int, optional): Steps for gradient calculation. + Defaults to 25. + max_dist (float, optional): Maximum distance for gradient + calculation. Defaults to 1.0. + smooth (bool, optional): Smooth gradients. Defaults to False. + + Returns: + np.ndarray: Saliency map. + """ + return self._apply_mask_fn( + img, + self.ig, + smooth=smooth, + x_steps=x_steps, + batch_size=batch_size, + baseline=True, + **kwargs + )
+ +
[docs] def guided_integrated_gradients( + self, + img: np.ndarray, + x_steps: int = 25, + max_dist: float = 1.0, + fraction: float = 0.5, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Calculate saliency map using guided integrated gradients. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + x_steps (int, optional): Steps for gradient calculation. + Defaults to 25. + max_dist (float, optional): Maximum distance for gradient + calculation. Defaults to 1.0. + fraction (float, optional): Fraction for gradient calculation. + Defaults to 0.5. + smooth (bool, optional): Smooth gradients. Defaults to False. + + Returns: + np.ndarray: Saliency map. + """ + return self._apply_mask_fn( + img, + self.guided_ig, + x_steps=x_steps, + max_dist=max_dist, + fraction=fraction, + smooth=smooth, + baseline=True, + **kwargs + )
+ +
[docs] def blur_integrated_gradients( + self, + img: np.ndarray, + batch_size: int = 20, + smooth: bool = False, + **kwargs + ) -> np.ndarray: + """Calculate saliency map using blur integrated gradients. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + batch_size (int, optional): Batch size. Defaults to 20. + smooth (bool, optional): Smooth gradients. Defaults to False. + + Returns: + np.ndarray: Saliency map. + """ + return self._apply_mask_fn( + img, + self.blur_ig, + smooth=smooth, + batch_size=batch_size, + **kwargs + )
+ +
[docs] def xrai( + self, + img: np.ndarray, + batch_size: int = 20, + **kwargs + ) -> np.ndarray: + """Calculate saliency map using XRAI. + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + batch_size (int, optional): Batch size. Defaults to 20. + + Returns: + np.ndarray: Saliency map. + """ + mask = self.xrai_grads.GetMask( + img, + self._grad_fn, + batch_size=batch_size, + **kwargs + ) + if isinstance(img, list): + # Normalize together + v_maxes, v_mins = zip(*[max_min(img3d) for img3d in mask]) + vmax = max(v_maxes) + vmin = min(v_mins) + return [normalize_xrai(img3d, vmax=vmax, vmin=vmin) for img3d in mask] + else: + return normalize_xrai(mask)
+ +
[docs] def xrai_fast( + self, + img: np.ndarray, + batch_size: int = 20, + **kwargs + ) -> np.ndarray: + """Calculate saliency map using XRAI (fast implementation). + + Args: + img (np.ndarray): Pre-processed input image in W, H, C format. + batch_size (int, optional): Batch size. Defaults to 20. + + Returns: + np.ndarray: Saliency map. + """ + mask = self.xrai_grads.GetMask( + img, + self._grad_fn, + batch_size=batch_size, + extra_parameters=self.fast_xrai_params, + **kwargs + ) + if isinstance(img, list): + # Normalize together + v_maxes, v_mins = zip(*[max_min(img3d) for img3d in mask]) + vmax = max(v_maxes) + vmin = min(v_mins) + return [normalize_xrai(img3d, vmax=vmax, vmin=vmin) for img3d in mask] + else: + return normalize_xrai(mask)
+ + +
[docs]def grayscale(image_3d, vmax=None, vmin=None, percentile=99): + """Returns a 3D tensor as a grayscale 2D tensor. + This method sums a 3D tensor across the absolute value of axis=2, and then + clips values at a given percentile. + """ + if vmax is None and vmin is None: + vmax, vmin = max_min(image_3d, percentile=percentile) + image_2d = np.sum(np.abs(image_3d), axis=2) + return np.clip((image_2d - vmin) / (vmax - vmin), 0, 1)
+ + +def normalize_xrai(mask, percentile=99): + vmax = np.percentile(mask, percentile) + vmin = np.min(mask) + return np.clip((mask - vmin) / (vmax - vmin), 0, 1) + + +def max_min(image_3d, percentile=99): + image_2d = np.sum(np.abs(image_3d), axis=2) + vmax = np.percentile(image_2d, percentile) + vmin = np.min(image_2d) + return vmax, vmin +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/grad/plot_utils/index.html b/docs/_modules/slideflow/grad/plot_utils/index.html new file mode 100644 index 000000000..7b861a10b --- /dev/null +++ b/docs/_modules/slideflow/grad/plot_utils/index.html @@ -0,0 +1,631 @@ + + + + + + + + + + + + slideflow.grad.plot_utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.grad.plot_utils

+"""Plotting functions for displaying saliency maps."""
+
+from typing import Any, Callable, Dict, List, Optional, TYPE_CHECKING
+
+import numpy as np
+from PIL import Image
+
+if TYPE_CHECKING:
+    import matplotlib.pyplot as plt
+
+
+
[docs]def inferno(img): + import matplotlib.pyplot as plt + cmap = plt.get_cmap('inferno') + return (cmap(img) * 255).astype(np.uint8)
+ + +
[docs]def oranges(img): + import matplotlib.pyplot as plt + cmap = plt.get_cmap('Oranges') + return (cmap(img) * 255).astype(np.uint8)
+ + +
[docs]def overlay(image, mask): + base = Image.fromarray(image) + cmap = Image.fromarray(oranges(mask)) + cmap.putalpha(int(0.6*255)) + base.paste(cmap, mask=cmap) + return np.array(base)
+ + +def remove_ticks(axis): + axis.spines.right.set_visible(False) + axis.spines.top.set_visible(False) + axis.spines.left.set_visible(False) + axis.spines.bottom.set_visible(False) + axis.set_xticklabels([]) + axis.set_xticks([]) + axis.set_yticklabels([]) + axis.set_yticks([]) + + +
[docs]def comparison_plot( + original: np.ndarray, + maps: Dict[str, np.ndarray], + cmap: Any = "plt.cm.gray", + n_rows: int = 3, + n_cols: int = 3, +) -> None: + """Plots comparison of many saliency maps for a single image in a grid. + + Args: + original (np.ndarray): Original (unprocessed) image. + maps (dict(str, np.ndarray)): Dictionary mapping saliency map names + to the numpy array maps. + cmap (matplotlib colormap, optional): Colormap for maps. + Defaults to plt.cm.gray. + """ + import matplotlib.pyplot as plt + + scale = 5 + ax_idx = [[i, j] for i in range(n_rows) for j in range(n_cols)] + fig, ax = plt.subplots( + n_rows, + n_cols, + figsize=(n_rows * scale, n_cols * scale) + ) + + ax[ax_idx[0][0], ax_idx[0][1]].axis('off') + ax[ax_idx[0][0], ax_idx[0][1]].imshow(original) + ax[ax_idx[0][0], ax_idx[0][1]].set_title('Original') + + for i, (map_name, map_img) in enumerate(maps.items()): + ax[ax_idx[i+1][0], ax_idx[i+1][1]].axis('off') + ax[ax_idx[i+1][0], ax_idx[i+1][1]].imshow(map_img, cmap=cmap, vmin=0, vmax=1) + ax[ax_idx[i+1][0], ax_idx[i+1][1]].set_title(map_name) + + fig.subplots_adjust(wspace=0, hspace=0.1)
+ + +
[docs]def multi_plot( + raw_imgs: List[np.ndarray], + processed_imgs: List[np.ndarray], + method: Callable, + cmap: str = 'inferno', + xlabels: Optional[List[str]] = None, + ylabels: Optional[List[str]] =None, + **kwargs +) -> None: + """Creates a plot of saliency maps and overlays for a given set of images. + + The first row will be the raw images. + The second row will be an overlay of the saliency map and the raw image. + The third row will be the saliency maps. + + Args: + raw_imgs (List[np.ndarray]): Raw, unprocessed images. + processed_imgs (List[np.ndarray]): Processed images. + method (Callable): Saliency method. + cmap (str, optional): Colormap. Defaults to 'inferno'. + xlabels (Optional[List[str]], optional): Labels for x-axis. + Defaults to None. + ylabels (Optional[List[str]], optional): Labels for y-axis. + Defaults to None. + + Raises: + ValueError: If length of raw_imgs, processed_imgs are not equal. + ValueError: If xlabels is provided and not a list. + ValueError: If ylabels is provided and not a list. + ValueError: If xlabels is provided and length does not equal raw_imgs. + ValueError: If ylabels is provided and length does not equal raw_imgs. + """ + import matplotlib.pyplot as plt + + # Error checking + if len(raw_imgs) != len(processed_imgs): + raise ValueError( + "Length of raw_imgs ({}) and processed_imgs ({}) unequal".format( + len(raw_imgs), + len(processed_imgs) + ) + ) + if xlabels: + if not isinstance(xlabels, list): + raise ValueError("xlabels must be a list.") + if len(xlabels) != len(raw_imgs): + raise ValueError( + "Length of raw_imgs ({}) and xlabels ({}) unequal".format( + len(raw_imgs), + len(xlabels) + ) + ) + if ylabels: + if not isinstance(ylabels, list): + raise ValueError("ylabels must be a list of length 3.") + if len(ylabels) != 3: + raise ValueError( + f"Unexpected length for ylabels; expected 3, got {len(ylabels)}" + ) + + # Calculate masks ans overlays + masks = [method(p_img, **kwargs) for p_img in processed_imgs] + overlays = [overlay(img, mask) for img, mask in zip(raw_imgs, masks)] + + # Initialize figure. + figsize = (len(raw_imgs)*5, 15) + fig, ax = plt.subplots(3, len(raw_imgs), figsize=figsize) + + # Plot labels if provided. + if xlabels: + for i in range(len(xlabels)): + ax[0, i].set_title(xlabels[i], fontsize=22) + if ylabels: + for i in range(len(ylabels)): + ax[i, 0].set_ylabel(ylabels[i], fontsize=22) + + # Plot the originals, overlays, and masks + for i, img in enumerate(raw_imgs): + remove_ticks(ax[0, i]) + ax[0, i].imshow(Image.fromarray(img)) + for i, ov in enumerate(overlays): + remove_ticks(ax[1, i]) + ax[1, i].imshow(Image.fromarray(ov)) + for i, mask in enumerate(masks): + remove_ticks(ax[2, i]) + ax[2, i].imshow(mask, cmap=cmap) + + fig.subplots_adjust(wspace=0, hspace=0)
+ + +
[docs]def saliency_map_comparison( + orig_imgs: List[np.ndarray], + saliency_fn: List[Callable], + process_fn: Callable, + saliency_labels: List[str] = None, + cmap: str = 'inferno', + **kwargs: Any +) -> None: + """Plots several saliency maps for a list of images. + + Each row is a unique image. + The first column is the original image. Each column after is a saliency + map generated each of the functions provided to `saliency_fn`. + + Args: + orig_imgs (list(np.ndarray)): Original (unprocessed) images for + which to generate saliency maps. + saliency_fn (list(Callable)): List of saliency map functions. + process_fn (Callable): Function for processing images. This function + will be applied to images before images are passed to the + saliency map function. + saliency_labels (list(str), optional): Labels for provided saliency + maps. Defaults to None. + cmap (str, optional): Colormap for saliency maps. + Defaults to 'inferno'. + """ + import matplotlib.pyplot as plt + + def apply_cmap(_img): + cmap_fn = plt.get_cmap(cmap) + return (cmap_fn(_img) * 255).astype(np.uint8) + + n_imgs = len(orig_imgs) + n_saliency = len(saliency_fn) + fig, ax = plt.subplots( + n_imgs, + n_saliency+1, + figsize=((n_saliency+1)*5, n_imgs*5) + ) + if saliency_labels is None: + saliency_labels = [f"Saliency{n}" for n in range(n_saliency)] + assert len(saliency_labels) == len(saliency_fn) + + ax[0, 0].set_title("Original") + for idx, orig in enumerate(orig_imgs): + ax[idx, 0].axis('off') + ax[idx, 0].imshow(orig) + for s, s_fn in enumerate(saliency_fn): + ax[0, s+1].set_title(saliency_labels[s]) + ax[idx, s+1].axis('off') + ax[idx, s+1].imshow(apply_cmap(s_fn(process_fn(orig), **kwargs))) + + fig.subplots_adjust(wspace=0, hspace=0)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/heatmap/index.html b/docs/_modules/slideflow/heatmap/index.html new file mode 100644 index 000000000..374beb7cf --- /dev/null +++ b/docs/_modules/slideflow/heatmap/index.html @@ -0,0 +1,1459 @@ + + + + + + + + + + + + slideflow.heatmap — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.heatmap

+import os
+from collections import namedtuple
+from typing import (TYPE_CHECKING, Any, Callable, Dict, List, Optional, Tuple,
+                    Union)
+
+import numpy as np
+import shapely.geometry as sg
+from mpl_toolkits.axes_grid1.inset_locator import mark_inset, zoomed_inset_axes
+from threading import Thread
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.slide import WSI
+from slideflow.util import log
+
+if TYPE_CHECKING:
+    import matplotlib.pyplot as plt
+    from matplotlib.axes import Axes
+    from PIL import Image
+    try:
+        import tensorflow as tf
+    except ImportError:
+        pass
+    try:
+        import torch
+    except ImportError:
+        pass
+
+Inset = namedtuple("Inset", "x y zoom loc mark1 mark2 axes")
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Heatmap: + """Generate a heatmap of predictions across a whole-slide image. + + This interface is designed to be used with tile-based models, and + does not support multiple-instance learning models. Attention heatmaps + of multiple-instance learning models can be generated using + :func:`slideflow.mil.predict_slide`. + + """ + + def __init__( + self, + slide: Union[str, WSI], + model: str, + stride_div: Optional[int] = None, + batch_size: int = 32, + num_threads: Optional[int] = None, + num_processes: Optional[int] = None, + img_format: str = 'auto', + generate: bool = True, + generator_kwargs: Optional[Dict[str, Any]] = None, + device: Optional["torch.device"] = None, + load_method: Optional[str] = None, + **wsi_kwargs + ) -> None: + """Initialize a heatmap from a path to a slide or a :class:`slideflow.WSI`. + + Examples + Create a heatmap from a path to a slide. + + .. code-block:: python + + model_path = 'path/to/saved_model' + heatmap = sf.Heatmap('slide.svs', model_path) + + Create a heatmap, with grayspace filtering disabled. + + .. code-block:: python + + heatmap = sf.Heatmap(..., grayspace_fraction=1) + + Create a heatmap from a ``sf.WSI`` object. + + .. code-block:: python + + # Load a slide + wsi = sf.WSI(tile_px=299, tile_um=302) + + # Apply Otsu's thresholding to the slide, + # so heatmap is only generated on areas with tissue. + wsi.qc('otsu') + + # Generate the heatmap + heatmap = sf.Heatmap(wsi, model_path) + + Args: + slide (str): Path to slide. + model (str): Path to Tensorflow or PyTorch model. + stride_div (int, optional): Divisor for stride when convoluting + across slide. Defaults to 2. + roi_dir (str, optional): Directory in which slide ROI is contained. + Defaults to None. + rois (list, optional): List of paths to slide ROIs. Alternative to + providing roi_dir. Defaults to None. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + batch_size (int, optional): Batch size for calculating predictions. + Defaults to 32. + num_threads (int, optional): Number of tile worker threads. Cannot + supply both ``num_threads`` (uses thread pool) and + ``num_processes`` (uses multiprocessing pool). Defaults to + CPU core count. + num_processes (int, optional): Number of child processes to spawn + for multiprocessing pool. Defaults to None (does not use + multiprocessing). + enable_downsample (bool, optional): Enable the use of downsampled + slide image layers. Defaults to True. + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. Defaults to 'auto'. + generate (bool): Generate the heatmap after initialization. + If False, heatmap will need to be manually generated by + calling :meth:``Heatmap.generate()``. + generator_kwargs (dict, optional): Keyword arguments passed to + the :meth:`slideflow.WSI.build_generator()`. + device (torch.device, optional): PyTorch device. Defaults to + initializing a new CUDA device. + + Keyword args: + Any keyword argument accepted by :class:`slideflow.WSI`. + """ + if num_processes is not None and num_threads is not None: + raise ValueError("Invalid argument: cannot supply both " + "num_processes and num_threads") + self.insets = [] # type: List[Inset] + + model_config = sf.util.get_model_config(model) + self.uq = model_config['hp']['uq'] + if img_format == 'auto' and 'img_format' not in model_config: + raise errors.HeatmapError( + f"Unable to auto-detect image format from model at {model}. " + "Manually set to png or jpg with Heatmap(img_format=...)") + elif img_format == 'auto': + self.img_format = model_config['img_format'] + else: + self.img_format = img_format + + if sf.util.is_torch_model_path(model): + int_kw = {'device': device} + else: + int_kw = {} + if load_method is not None: + int_kw.update(dict(load_method=load_method)) + + if self.uq: + if sf.util.is_torch_model_path(model): + import slideflow.model.torch + interface_fn = sf.model.torch.UncertaintyInterface + else: + import slideflow.model.tensorflow + interface_fn = sf.model.tensorflow.UncertaintyInterface # type: ignore + self.interface = interface_fn(model, **int_kw) + else: + if sf.util.is_torch_model_path(model): + import slideflow.model.torch + interface_fn = sf.model.torch.Features + else: + import slideflow.model.tensorflow + interface_fn = sf.model.tensorflow.Features # type: ignore + self.interface = interface_fn( # type: ignore + model, + layers=None, + include_preds=True, + **int_kw) + + self.model_path = model + self.num_threads = num_threads + self.num_processes = num_processes + self.batch_size = batch_size + self.device = device + self.tile_px = model_config['tile_px'] + self.tile_um = model_config['tile_um'] + self.num_classes = self.interface.num_classes + self.num_features = self.interface.num_features + self.num_uncertainty = self.interface.num_uncertainty + self.predictions = None + self.uncertainty = None + self._thumb = None + + if isinstance(slide, str): + if stride_div is None: + stride_div = 2 + + self.slide_path = slide + self.stride_div = stride_div + try: + self.slide = WSI( + self.slide_path, + self.tile_px, + self.tile_um, + self.stride_div, + **wsi_kwargs # type: ignore + ) + except errors.SlideLoadError: + raise errors.HeatmapError( + f'Error loading slide {self.slide.name} for heatmap') + elif isinstance(slide, WSI): + + if slide.tile_px != self.tile_px: + raise ValueError( + "Slide tile_px ({}) does not match model ({})".format( + slide.tile_px, self.tile_px)) + if slide.tile_um != self.tile_um: + raise ValueError( + "Slide tile_um ({}) does not match model ({})".format( + slide.tile_um, self.tile_um)) + if stride_div is not None: + log.warn("slide is a WSI; ignoring supplied stride_div.") + if wsi_kwargs: + log.warn("WSI provided; ignoring keyword arguments: " + ", ".join(list(wsi_kwargs.keys()))) + + self.slide_path = slide.path + self.slide = slide + self.stride_div = slide.stride_div + else: + raise ValueError(f"Unrecognized value {slide} for argument slide") + + if generate: + if generator_kwargs is None: + generator_kwargs = {} + self.generate(**generator_kwargs) + elif generator_kwargs: + log.warn("Heatmap generate=False, ignoring generator_kwargs (" + f"{generator_kwargs})") + + @staticmethod + def _prepare_ax(ax: Optional["Axes"] = None) -> "Axes": + """Creates matplotlib figure and axis if one is not supplied, + otherwise clears the axis contents. + + Args: + ax (matplotlib.axes.Axes): Figure axis. If not supplied, + will create a new figure and axis. Otherwise, clears axis + contents. Defaults to None. + + Returns: + matplotlib.axes.Axes: Figure axes. + """ + import matplotlib.pyplot as plt + if ax is None: + fig = plt.figure(figsize=(18, 16)) + ax = fig.add_subplot(111) + fig.subplots_adjust(bottom=0.25, top=0.95) + else: + ax.clear() + return ax + + def generate( + self, + asynchronous: bool = False, + **kwargs + ) -> Optional[Tuple[np.ndarray, Thread]]: + """Manually generate the heatmap. + + This function is automatically called when creating the heatmap if the + heatmap was initialized with ``generate=True`` (default behavior). + + Args: + asynchronous (bool, optional): Generate heatmap in a separate thread, + returning the numpy array which is updated in realtime with + heatmap predictions and the heatmap thread. Defaults to False, + returning None. + callback (Callable, optional): Callback function to call each time + the heatmap grid updated. The callback function should accept + a single argument: a list of nested (x_idx, y_idx) lists, + indicating the grid indices updated. Defaults to None. + + Returns: + ``None`` if ``threaded=False``, otherwise returns a tuple containing + + **grid**: Numpy array containing updated in realtime + with heatmap predictions as they are calculated. + + **Thread**: Thread in which heatmap is generated. + """ + + # Load the slide + def _generate(grid=None): + out = self.interface( + self.slide, + num_threads=self.num_threads, + num_processes=self.num_processes, + batch_size=self.batch_size, + img_format=self.img_format, + dtype=np.float32, + grid=grid, + **kwargs + ) + if self.uq: + self.predictions = out[:, :, :-(self.num_uncertainty)] + self.uncertainty = out[:, :, -(self.num_uncertainty):] + else: + self.predictions = out + self.uncertainty = None + log.info(f"Heatmap complete for [green]{self.slide.name}") + + if asynchronous: + it = self.interface + grid = np.ma.ones(( + self.slide.grid.shape[1], + self.slide.grid.shape[0], + it.num_features + it.num_classes + it.num_uncertainty), + dtype=np.float32) + heatmap_thread = Thread(target=_generate, args=(grid,)) + heatmap_thread.start() + return grid, heatmap_thread + else: + _generate() + return None + + def _format_ax( + self, + ax: "Axes", + thumb_size: Tuple[int, int], + show_roi: bool = True, + **kwargs + ) -> None: + """Formats matplotlib axis in preparation for heatmap plotting. + + Args: + ax (matplotlib.axes.Axes): Figure axis. + show_roi (bool, optional): Include ROI on heatmap. Defaults to True. + """ + ax.tick_params( + axis='x', + top=True, + labeltop=True, + bottom=False, + labelbottom=False + ) + # Plot ROIs + if show_roi: + roi_scale = self.slide.dimensions[0] / thumb_size[0] + annPolys = [ + sg.Polygon(annotation.scaled_coords(roi_scale)) + for annotation in self.slide.rois + ] + for roi in self.slide.rois: + for hole in roi.holes.values(): + annPolys.append(sg.Polygon(hole.scaled_coords(roi_scale))) + for i, poly in enumerate(annPolys): + if poly.geom_type == 'Polygon': + x, y = poly.exterior.xy + ax.plot(x, y, zorder=20, **kwargs) + elif poly.geom_type in ('MultiPolygon', 'GeometryCollection'): + for p in poly.geoms: + if p.geom_type == 'Polygon': + x, y = p.exterior.xy + ax.plot(x, y, zorder=20, **kwargs) + else: + log.warning("Unable to plot ROI {} (geometry={})".format( + i, poly.geom_type + )) + + + + def add_inset( + self, + x: Tuple[int, int], + y: Tuple[int, int], + zoom: int = 5, + loc: int = 1, + mark1: int = 2, + mark2: int = 4, + axes: bool = True + ) -> Inset: + """Adds a zoom inset to the heatmap.""" + _inset = Inset( + x=x, + y=y, + zoom=zoom, + loc=loc, + mark1=mark1, + mark2=mark2, + axes=axes + ) + self.insets += [_inset] + return _inset + + def clear_insets(self) -> None: + """Removes zoom insets.""" + self.insets = [] + + def load(self, path: str) -> None: + """Load heatmap predictions and uncertainty from .npz file. + + This function is an alias for :meth:`slideflow.Heatmap.load_npz()`. + + Args: + path (str, optional): Source .npz file. Must have 'predictions' key + and optionally 'uncertainty'. + + Returns: + None + """ + self.load_npz(path) + + def load_npz(self, path: str) -> None: + """Load heatmap predictions and uncertainty from .npz file. + + Loads predictions from ``'predictions'`` in .npz file, and uncertainty from + ``'uncertainty'`` if present, as generated from + :meth:`slideflow.Heatmap.save_npz()``. This function is the same as + calling ``heatmap.load()``. + + Args: + path (str, optional): Source .npz file. Must have 'predictions' key + and optionally 'uncertainty'. + + Returns: + None + """ + npzfile = np.load(path) + if ('predictions' not in npzfile) and ('logits' in npzfile): + log.warn("Loading predictions from 'logits' key.") + self.predictions = npzfile['logits'] + else: + self.predictions = npzfile['predictions'] + if 'uncertainty' in npzfile: + self.uncertainty = npzfile['uncertainty'] + + def plot_thumbnail( + self, + show_roi: bool = False, + roi_color: str = 'k', + linewidth: int = 5, + width: Optional[int] = None, + mpp: Optional[float] = None, + ax: Optional["Axes"] = None, + ) -> "plt.image.AxesImage": + """Plot a thumbnail of the slide, with or without ROI. + + Args: + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + roi_color (str): ROI line color. Defaults to 'k' (black). + linewidth (int): Width of ROI line. Defaults to 5. + ax (matplotlib.axes.Axes, optional): Figure axis. If not supplied, + will prepare a new figure axis. + + Returns: + plt.image.AxesImage: Result from ax.imshow(). + """ + ax = self._prepare_ax(ax) + if width is None and mpp is None: + width = 2048 + self._thumb = self.slide.thumb(width=width, mpp=mpp) + self._format_ax( + ax, + thumb_size=self._thumb.size, + show_roi=show_roi, + color=roi_color, + linewidth=linewidth, + ) + imshow_thumb = ax.imshow(self._thumb, zorder=0) + + for inset in self.insets: + axins = zoomed_inset_axes(ax, inset.zoom, loc=inset.loc) + axins.imshow(self._thumb) + axins.set_xlim(inset.x[0], inset.x[1]) + axins.set_ylim(inset.y[0], inset.y[1]) + mark_inset( + ax, + axins, + loc1=inset.mark1, + loc2=inset.mark2, + fc='none', + ec='0', + zorder=100 + ) + if not inset.axes: + axins.get_xaxis().set_ticks([]) + axins.get_yaxis().set_ticks([]) + + return imshow_thumb + + def plot_with_logit_cmap( + self, + logit_cmap: Union[Callable, Dict], + interpolation: str = 'none', + ax: Optional["Axes"] = None, + **thumb_kwargs, + ) -> None: + """Plot a heatmap using a specified logit colormap. + + Args: + logit_cmap (obj, optional): Either function or a dictionary use to + create heatmap colormap. Each image tile will generate a list + of predictions of length O, where O is the number of outcomes. + If logit_cmap is a function, then the logit prediction list + will be passed to the function, and the function is expected + to return [R, G, B] values for display. If logit_cmap is a + dictionary, it should map 'r', 'g', and 'b' to indices; the + prediction for these outcome indices will be mapped to the RGB + colors. Thus, the corresponding color will only reflect up to + three outcomes. Example mapping prediction for outcome 0 to the + red colorspace, 3 to green, etc: {'r': 0, 'g': 3, 'b': 1} + interpolation (str, optional): Interpolation strategy to use for + smoothing heatmap. Defaults to 'none'. + ax (matplotlib.axes.Axes, optional): Figure axis. If not supplied, + will prepare a new figure axis. + + Keyword args: + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + roi_color (str): ROI line color. Defaults to 'k' (black). + linewidth (int): Width of ROI line. Defaults to 5. + """ + ax = self._prepare_ax(ax) + self.plot_thumbnail(ax=ax, **thumb_kwargs) + ax.set_facecolor("black") + if callable(logit_cmap): + map_logit = logit_cmap + else: + # Make heatmap with specific logit predictions mapped + # to r, g, and b + def map_logit(logit): + return (logit[logit_cmap['r']], + logit[logit_cmap['g']], + logit[logit_cmap['b']]) + extent = calculate_heatmap_extent( + self.slide, self._thumb, self.predictions + ) + ax.imshow( + [[map_logit(logit) for logit in row] for row in self.predictions], + extent=extent, + interpolation=interpolation, + zorder=10 + ) + ax.set_xlim(0, self._thumb.size[0]) + ax.set_ylim(self._thumb.size[1], 0) + + def plot_uncertainty( + self, + heatmap_alpha: float = 0.6, + cmap: str = 'coolwarm', + interpolation: str = 'none', + ax: Optional["Axes"] = None, + **thumb_kwargs + ): + """Plot heatmap of uncertainty. + + Args: + heatmap_alpha (float, optional): Alpha of heatmap overlay. + Defaults to 0.6. + cmap (str, optional): Matplotlib heatmap colormap. + Defaults to 'coolwarm'. + interpolation (str, optional): Interpolation strategy to use for + smoothing heatmap. Defaults to 'none'. + ax (matplotlib.axes.Axes, optional): Figure axis. If not supplied, + will prepare a new figure axis. + + Keyword args: + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + roi_color (str): ROI line color. Defaults to 'k' (black). + linewidth (int): Width of ROI line. Defaults to 5. + """ + import matplotlib.colors as mcol + + ax = self._prepare_ax(ax) + implot = self.plot_thumbnail(ax=ax, **thumb_kwargs) + if heatmap_alpha == 1: + implot.set_alpha(0) + uqnorm = mcol.TwoSlopeNorm( + vmin=0, + vcenter=self.uncertainty.max()/2, + vmax=self.uncertainty.max() + ) + extent = calculate_heatmap_extent( + self.slide, self._thumb, self.predictions + ) + ax.imshow( + self.uncertainty, + norm=uqnorm, + extent=extent, + cmap=cmap, + alpha=heatmap_alpha, + interpolation=interpolation, + zorder=10 + ) + ax.set_xlim(0, self._thumb.size[0]) + ax.set_ylim(self._thumb.size[1], 0) + + def plot( + self, + class_idx: int, + heatmap_alpha: float = 0.6, + cmap: str = 'coolwarm', + interpolation: str = 'none', + vmin: float = 0, + vmax: float = 1, + vcenter: float = 0.5, + ax: Optional["Axes"] = None, + **thumb_kwargs + ) -> None: + """Plot a predictive heatmap. + + If in a Jupyter notebook, the heatmap will be displayed in the cell + output. If running via script or shell, the heatmap can then be + shown on screen using matplotlib ``plt.show()``: + + .. code-block:: + + import slideflow as sf + import matplotlib.pyplot as plt + + heatmap = sf.Heatmap(...) + heatmap.plot() + plt.show() + + Args: + class_idx (int): Class index to plot. + heatmap_alpha (float, optional): Alpha of heatmap overlay. + Defaults to 0.6. + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + cmap (str, optional): Matplotlib heatmap colormap. + Defaults to 'coolwarm'. + interpolation (str, optional): Interpolation strategy to use for + smoothing heatmap. Defaults to 'none'. + vmin (float): Minimimum value to display on heatmap. + Defaults to 0. + vcenter (float): Center value for color display on heatmap. + Defaults to 0.5. + vmax (float): Maximum value to display on heatmap. + Defaults to 1. + ax (matplotlib.axes.Axes, optional): Figure axis. If not supplied, + will prepare a new figure axis. + + Keyword args: + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + roi_color (str): ROI line color. Defaults to 'k' (black). + linewidth (int): Width of ROI line. Defaults to 5. + """ + import matplotlib.colors as mcol + + if self.predictions is None: + raise errors.HeatmapError( + "Cannot plot Heatmap which is not yet generated; generate with " + "either heatmap.generate() or Heatmap(..., generate=True)" + ) + + ax = self._prepare_ax(ax) + implot = self.plot_thumbnail(ax=ax, **thumb_kwargs) + if heatmap_alpha == 1: + implot.set_alpha(0) + ax.set_facecolor("black") + divnorm = mcol.TwoSlopeNorm( + vmin=vmin, + vcenter=vcenter, + vmax=vmax + ) + extent = calculate_heatmap_extent( + self.slide, self._thumb, self.predictions + ) + ax.imshow( + self.predictions[:, :, class_idx], + norm=divnorm, + extent=extent, + cmap=cmap, + alpha=heatmap_alpha, + interpolation=interpolation, + zorder=10 + ) + ax.set_xlim(0, self._thumb.size[0]) + ax.set_ylim(self._thumb.size[1], 0) + + def save_npz(self, path: Optional[str] = None) -> str: + """Save heatmap predictions and uncertainty in .npz format. + + Saves heatmap predictions to ``'predictions'`` in the .npz file. If uncertainty + was calculated, this is saved to ``'uncertainty'``. A Heatmap instance can + load a saved .npz file with :meth:`slideflow.Heatmap.load()`. + + Args: + path (str, optional): Destination filename for .npz file. Defaults + to {slidename}.npz + + Returns: + str: Path to .npz file. + """ + if path is None: + path = f'{self.slide.name}.npz' + np_kwargs = dict(predictions=self.predictions) + if self.uq: + np_kwargs['uncertainty'] = self.uncertainty + np.savez(path, **np_kwargs) + return path + + def save( + self, + outdir: str, + show_roi: bool = True, + interpolation: str = 'none', + logit_cmap: Optional[Union[Callable, Dict]] = None, + roi_color: str = 'k', + linewidth: int = 5, + **kwargs + ) -> None: + """Saves calculated predictions as heatmap overlays. + + Args: + outdir (str): Path to directory in which to save heatmap images. + show_roi (bool, optional): Overlay ROIs onto heatmap image. + Defaults to True. + interpolation (str, optional): Interpolation strategy to use for + smoothing heatmap. Defaults to 'none'. + logit_cmap (obj, optional): Either function or a dictionary use to + create heatmap colormap. Each image tile will generate a list + of predictions of length O, where O is the number of outcomes. + If logit_cmap is a function, then the logit prediction list + will be passed to the function, and the function is expected + to return [R, G, B] values for display. If logit_cmap is a + dictionary, it should map 'r', 'g', and 'b' to indices; the + prediction for these outcome indices will be mapped to the RGB + colors. Thus, the corresponding color will only reflect up to + three outcomes. Example mapping prediction for outcome 0 to the + red colorspace, 3 to green, etc: {'r': 0, 'g': 3, 'b': 1} + roi_color (str): ROI line color. Defaults to 'k' (black). + linewidth (int): Width of ROI line. Defaults to 5. + + Keyword args: + cmap (str, optional): Matplotlib heatmap colormap. + Defaults to 'coolwarm'. + vmin (float): Minimimum value to display on heatmap. + Defaults to 0. + vcenter (float): Center value for color display on heatmap. + Defaults to 0.5. + vmax (float): Maximum value to display on heatmap. + Defaults to 1. + + """ + with sf.util.matplotlib_backend('Agg'): + import matplotlib.pyplot as plt + + if self.predictions is None: + raise errors.HeatmapError( + "Cannot plot Heatmap which is not yet generated; generate with " + "either heatmap.generate() or Heatmap(..., generate=True)" + ) + + # Save heatmaps in .npz format + self.save_npz(os.path.join(outdir, f'{self.slide.name}.npz')) + + def _savefig(label, bbox_inches='tight', **kwargs): + plt.savefig( + os.path.join(outdir, f'{self.slide.name}-{label}.png'), + bbox_inches=bbox_inches, + **kwargs + ) + + log.info('Saving base figures...') + + # Prepare matplotlib figure + ax = self._prepare_ax() + + thumb_kwargs = dict(roi_color=roi_color, linewidth=linewidth) + + # Save base thumbnail as separate figure + self.plot_thumbnail(show_roi=False, ax=ax, **thumb_kwargs) # type: ignore + _savefig('raw') + + # Save thumbnail + ROI as separate figure + self.plot_thumbnail(show_roi=True, ax=ax, **thumb_kwargs) # type: ignore + _savefig('raw+roi') + + if logit_cmap: + self.plot_with_logit_cmap(logit_cmap, show_roi=show_roi, ax=ax) + _savefig('custom') + else: + heatmap_kwargs = dict( + show_roi=show_roi, + interpolation=interpolation, + **kwargs + ) + save_kwargs = dict( + bbox_inches='tight', + facecolor=ax.get_facecolor(), + edgecolor='none' + ) + # Make heatmap plots and sliders for each outcome category + for i in range(self.num_classes): + log.info(f'Making {i+1}/{self.num_classes}...') + self.plot(i, heatmap_alpha=0.6, ax=ax, **heatmap_kwargs) + _savefig(str(i), **save_kwargs) + + self.plot(i, heatmap_alpha=1, ax=ax, **heatmap_kwargs) + _savefig(f'{i}-solid', **save_kwargs) + + # Uncertainty map + if self.uq: + log.info('Making uncertainty heatmap...') + self.plot_uncertainty(heatmap_alpha=0.6, ax=ax, **heatmap_kwargs) + _savefig('UQ', **save_kwargs) + + self.plot_uncertainty(heatmap_alpha=1, ax=ax, **heatmap_kwargs) + _savefig('UQ-solid', **save_kwargs) + + plt.close() + log.info(f'Saved heatmaps for [green]{self.slide.name}') + + def view(self): + """Load the Heatmap into Slideflow Studio for interactive view. + + See :ref:`studio` for more information. + + """ + from slideflow.studio import Studio + + studio = Studio() + studio.load_slide(self.slide.path, stride=self.stride_div) + studio.load_model(self.model_path) + studio.load_heatmap(self) + studio.run()
+ +class ModelHeatmap(Heatmap): + + def __init__( + self, + slide: Union[str, WSI], + model: Union[str, "torch.nn.Module", "tf.keras.Model"], + *, + img_format: str, + tile_px: Optional[int] = None, + tile_um: Optional[int] = None, + stride_div: Optional[int] = None, + normalizer: Optional[sf.norm.StainNormalizer] = None, + batch_size: int = 32, + num_threads: Optional[int] = None, + num_processes: Optional[int] = None, + generate: bool = True, + uq: bool = False, + load_method: Optional[str] = None, + apply_softmax: Optional[bool] = None, + generator_kwargs: Optional[Dict[str, Any]] = None, + **wsi_kwargs + ): + """Convolutes across a whole slide, calculating predictions and saving + predictions internally for later use. + + Args: + slide (str): Path to slide. + model (str): Path to Tensorflow or PyTorch model. + + Keyword args: + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. + tile_px (int): Tile width in pixels. Required if ``model`` is a + path. Defaults to None. + tile_um (int or str): Tile width in microns (int) or magnification + (str, e.g. "20x"). Required if ``model`` is a path. Defaults + to None. + stride_div (int, optional): Divisor for stride when convoluting + across slide. Defaults to 2. + normalizer (:class:`slideflow.norm.StainNormalizer`): Stain + normalizer to use when preprocessing image tiles. + Defaults to None. + batch_size (int, optional): Batch size for calculating predictions. + Defaults to 32. + num_threads (int, optional): Number of tile worker threads. Cannot + supply both ``num_threads`` (uses thread pool) and + ``num_processes`` (uses multiprocessing pool). Defaults to + CPU core count. + num_processes (int, optional): Number of child processes to spawn + for multiprocessing pool. Defaults to None (does not use + multiprocessing). + generate (bool): Generate the heatmap after initialization. + If False, heatmap will need to be manually generated by + calling :meth:``Heatmap.generate()``. + uq (bool): Calculate uncertainty via dropout (requires model with + dropout layers). Defaults to False. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model with + ``tf.keras.models.load_model()``. If 'weights', will read the + ``params.json`` configuration file, build the model architecture, + and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + apply_softmax (bool): Apply softmax transformation to logits. + Only used for PyTorch models (raises an error if this argument + is specified and the model is not a PyTorch model). + Defaults to True. + roi_dir (str, optional): Directory in which slide ROI is contained. + Defaults to None. + rois (list, optional): List of paths to slide ROIs. Alternative to + providing roi_dir. Defaults to None. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + + """ + if num_processes is not None and num_threads is not None: + raise ValueError("Invalid argument: cannot supply both " + "num_processes and num_threads") + self.uq = uq + self.img_format = img_format + self.num_threads = num_threads + self.num_processes = num_processes + self.batch_size = batch_size + self.insets = [] # type: List[Inset] + if generator_kwargs is None: + generator_kwargs = {} + if apply_softmax is not None: + if sf.util.model_backend(model) == 'tensorflow': + raise ValueError("Keyword argument 'apply_softmax' is invalid " + "for Tensorflow models.") + + if isinstance(slide, str): + + if tile_px is None: + raise ValueError("If slide is a path, must supply tile_px.") + if tile_um is None: + raise ValueError("If slide is a path, must supply tile_um.") + if stride_div is None: + stride_div = 2 + + self.slide_path = slide + self.tile_px = tile_px + self.tile_um = tile_um + self.stride_div = stride_div + try: + self.slide = WSI( + self.slide_path, + self.tile_px, + self.tile_um, + self.stride_div, + **wsi_kwargs # type: ignore + ) + except errors.SlideLoadError: + raise errors.HeatmapError( + f'Error loading slide {self.slide.name} for heatmap') + elif isinstance(slide, WSI): + + if tile_px is not None: + log.warn("slide is a WSI; ignoring supplied tile_px.") + if tile_um is not None: + log.warn("slide is a WSI; ignoring supplied tile_um.") + if stride_div is not None: + log.warn("slide is a WSI; ignoring supplied stride_div.") + if wsi_kwargs: + log.warn("WSI provided; ignoring keyword arguments: " + + ", ".join(list(wsi_kwargs.keys()))) + + self.slide_path = slide.path + self.slide = slide + self.tile_px = slide.tile_px + self.tile_um = slide.tile_um + self.stride_div = slide.stride_div + else: + raise ValueError(f"Unrecognized value {slide} for argument slide") + + if uq and sf.util.model_backend(model) == 'tensorflow': + import slideflow.model.tensorflow + interface_class = sf.model.tensorflow.UncertaintyInterface # type: ignore + interface_kw = {} # type: Dict[str, Any] + elif uq and sf.util.model_backend(model) == 'torch': + import slideflow.model.torch + interface_class = sf.model.torch.UncertaintyInterface # type: ignore + interface_kw = dict(tile_px=self.tile_px, apply_softmax=apply_softmax) + elif sf.util.model_backend(model) == 'tensorflow': + import slideflow.model.tensorflow + interface_class = sf.model.tensorflow.Features # type: ignore + interface_kw = dict(include_preds=True) + elif sf.util.model_backend(model) == 'torch': + import slideflow.model.torch + interface_class = sf.model.torch.Features # type: ignore + interface_kw = dict( + include_preds=True, + tile_px=self.tile_px, + apply_softmax=apply_softmax + ) + else: + raise ValueError(f"Unable to interpret model {model}") + if load_method is not None: + interface_kw.update(dict(load_method=load_method)) + + if isinstance(model, str): + self.interface = interface_class( + model, + layers=None, + **interface_kw) + else: + self.interface = interface_class.from_model( + model, + layers=None, + wsi_normalizer=normalizer, + **interface_kw) + self.num_classes = self.interface.num_classes + self.num_features = self.interface.num_features + self.num_uncertainty = self.interface.num_uncertainty + self.predictions = None + self.uncertainty = None + self.model_path = None + + if generate: + self.generate(**generator_kwargs) + elif generator_kwargs: + log.warn("Heatmap generate=False, ignoring generator_kwargs (" + f"{generator_kwargs})") + + def view(self): + raise NotImplementedError + +# ----------------------------------------------------------------------------- + +def calculate_heatmap_extent( + wsi: "sf.WSI", + thumbnail: "Image", + grid: np.ndarray +) -> Tuple[float, float, float, float]: + """Calculate implot extent for a heatmap grid.""" + full_extract = int(wsi.tile_um / wsi.mpp) + wsi_stride = int(full_extract / wsi.stride_div) + _overlay_wsi_dim = (wsi_stride * (grid.shape[1]), + wsi_stride * (grid.shape[0])) + _overlay_offset_wsi_dim = ( + full_extract/2 - wsi_stride/2, + full_extract/2 - wsi_stride/2 + ) + thumb_ratio = ( + wsi.dimensions[0] / thumbnail.size[0], + wsi.dimensions[1] / thumbnail.size[1] + ) + return ( + _overlay_offset_wsi_dim[0] / thumb_ratio[0], + _overlay_wsi_dim[0] / thumb_ratio[0], + _overlay_wsi_dim[1] / thumb_ratio[1], + _overlay_offset_wsi_dim[1] / thumb_ratio[1] + ) + +# ----------------------------------------------------------------------------- +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/index.html b/docs/_modules/slideflow/io/index.html new file mode 100644 index 000000000..180c3212c --- /dev/null +++ b/docs/_modules/slideflow/io/index.html @@ -0,0 +1,790 @@ + + + + + + + + + + + + slideflow.io — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io

+"""TFRecord reading/writing utilities for both Tensorflow and PyTorch."""
+
+import cv2
+import copy
+import os
+import struct
+import numpy as np
+from PIL import Image
+from multiprocessing.dummy import Pool as DPool
+from os.path import exists, isdir, isfile, join
+from random import shuffle
+from typing import Any, Dict, Optional, Tuple, Union, List
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.io.io_utils import detect_tfrecord_format, convert_dtype
+from slideflow.util import log, tfrecord2idx
+from slideflow.util.tfrecord2idx import get_tfrecord_by_index, get_tfrecord_length
+from rich.progress import Progress
+
+# --- Backend-specific imports and configuration ------------------------------
+
+if sf.backend() == 'tensorflow':
+    from slideflow.io.tensorflow import (
+        get_tfrecord_parser, read_and_return_record, serialized_record
+    )
+    from slideflow.io.tensorflow import auto_decode_image as decode_image
+    from tensorflow.data import TFRecordDataset
+    from tensorflow.io import TFRecordWriter
+
+elif sf.backend() == 'torch':
+    from slideflow.io.torch import (
+        get_tfrecord_parser, read_and_return_record, serialized_record,
+        decode_image
+    )
+    from slideflow.tfrecord import TFRecordWriter
+    from slideflow.tfrecord.torch.dataset import TFRecordDataset
+
+else:
+    raise errors.UnrecognizedBackendError
+
+# -----------------------------------------------------------------------------
+
+
+
[docs]def update_manifest_at_dir( + directory: str, + force_update: bool = False +) -> Optional[Union[str, Dict]]: + """Log number of tiles in each TFRecord file present in the given + directory and all subdirectories, saving manifest to file within + the parent directory. + + """ + manifest_path = join(directory, "manifest.json") + if not exists(manifest_path): + manifest = {} + else: + manifest = sf.util.load_json(manifest_path) + prior_manifest = copy.deepcopy(manifest) + try: + rel_paths = sf.util.get_relative_tfrecord_paths(directory) + except FileNotFoundError: + log.debug(f"Failed to update manifest {directory}; no TFRecords") + return None + + # Verify all tfrecords in manifest exist + for rel_tfr in prior_manifest.keys(): + tfr = join(directory, rel_tfr) + if not exists(tfr): + log.warning(f"TFRecord {tfr} in manifest was not found; removing") + del(manifest[rel_tfr]) + + def process_tfr(rel_tfr): + tfr = join(directory, rel_tfr) + if ((not force_update) + and (rel_tfr in manifest) + and ('total' in manifest[rel_tfr])): + return None + rel_tfr_manifest = {rel_tfr: {}} + try: + total = get_tfrecord_length(tfr) + except (errors.TFRecordsError, OSError): + log.error(f"Corrupt or incomplete TFRecord at {tfr}; removing") + os.remove(tfr) + return None + if not total: + log.error(f"Empty TFRecord at {tfr}; removing") + os.remove(tfr) + return None + rel_tfr_manifest[rel_tfr]['total'] = total + return rel_tfr_manifest + + pool = DPool(8) + if sf.getLoggingLevel() <= 20: + pb = Progress(transient=True) + task = pb.add_task("Updating tfrecord manifest...", total=len(rel_paths)) + pb.start() + else: + pb = None + with sf.util.cleanup_progress(pb): + for m in pool.imap(process_tfr, rel_paths): + if pb is not None: + pb.advance(task) + if m is None: + continue + manifest.update(m) + # Write manifest file + if (manifest != prior_manifest) or (manifest == {}): + sf.util.write_json(manifest, manifest_path) + pool.close() + return manifest
+ + +
[docs]def get_tfrecord_by_location( + tfrecord: str, + location: Tuple[int, int], + decode: bool = True, + *, + locations_array: Optional[List[Tuple[int, int]]] = None, + index_array: Optional[np.ndarray] = None +) -> Any: + '''Reads and returns an individual record from a tfrecord by index, + including slide name and processed image data. + + Args: + tfrecord (str): Path to TFRecord file. + location (tuple(int, int)): ``(x, y)`` tile location. + Searches the TFRecord for the tile that corresponds to this + location. + decode (bool): Decode the associated record, returning Tensors. + Defaults to True. + + Returns: + Unprocessed raw TFRecord bytes if ``decode=False``, otherwise a + tuple containing ``(slide, image)``, where ``image`` is a + uint8 Tensor. + ''' + if isinstance(location, list): + location = tuple(location) + if (not isinstance(location, tuple) + or len(location) != 2 + or not isinstance(location[0], (int, np.integer)) + or not isinstance(location[1], (int, np.integer))): + raise IndexError(f"index must be a tuple of two ints. Got: {location}") + + # Use index files, if available. + index = tfrecord2idx.find_index(tfrecord) + if locations_array is not None or (index and tfrecord2idx.index_has_locations(index)): + if locations_array is None: + locations = tfrecord2idx.get_locations_from_index(index) + else: + locations = locations_array + try: + idx = locations.index(location) + except ValueError: + log.error( + f"Unable to find record with location {location} in {tfrecord}" + ) + return False, False + record = tfrecord2idx.get_tfrecord_by_index(tfrecord, idx, index_array=index_array) + slide = record['slide'] + image = sf.io.decode_image(record['image_raw']) if decode else record['image_raw'] + return slide, image + + else: + parser = get_tfrecord_parser( + tfrecord, + ('slide', 'image_raw', 'loc_x', 'loc_y'), + decode_images=decode + ) + dataset = TFRecordDataset(tfrecord) + for i, record in enumerate(dataset): + slide, image, loc_x, loc_y = parser(record) + if (loc_x, loc_y) == location: + if decode: + return slide, image + else: + slide = bytes(record['slide']).decode('utf-8') + images = bytes(record['image_raw']) + return slide, images + + log.error( + f"Unable to find record with location {location} in {tfrecord}" + ) + return False, False
+ + +
[docs]def write_tfrecords_multi(input_directory: str, output_directory: str) -> None: + """Write multiple tfrecords, one for each slide, from a directory of images. + + Scans a folder for subfolders, assumes subfolders are slide names. + Assembles all image tiles within subfolders, assuming the subfolder is the + slide name. Collects all image tiles and exports into multiple tfrecord + files, one for each slide. + + Args: + input_directory (str): Directory of images. + output_directory (str): Directory in which to write TFRecord files. + + """ + log.info("No location data available; writing (0,0) for all locations.") + slide_dirs = [ + _dir for _dir in os.listdir(input_directory) + if isdir(join(input_directory, _dir)) + ] + total_tiles = 0 + for slide_dir in slide_dirs: + total_tiles += write_tfrecords_single( + join(input_directory, slide_dir), + output_directory, + f'{slide_dir}.tfrecords', + slide_dir + ) + log.info( + f"Wrote {total_tiles} tiles across {len(slide_dirs)} tfrecords " + f"in [green]{output_directory}" + )
+ + +
[docs]def write_tfrecords_single( + input_directory: str, + output_directory: str, + filename: str, + slide: str +) -> int: + """Scans a folder for image tiles, annotates using the provided slide, + exports into a single tfrecord file. + + Args: + input_directory (str): Directory of images. + output_directory (str): Directory in which to write TFRecord file. + filename (str): TFRecord filename (without path). + slide (str): Slide name to assign to records inside TFRecord. + + Returns: + int: Number of records written. + + """ + if not exists(output_directory): + os.makedirs(output_directory) + tfrecord_path = join(output_directory, filename) + image_labels = {} + files = [ + f for f in os.listdir(input_directory) + if ((isfile(join(input_directory, f))) + and (sf.util.path_to_ext(f) in ("jpg", "jpeg", "png", "tif", "tiff"))) + ] + for tile in files: + image_labels.update({ + join(input_directory, tile): bytes(slide, 'utf-8') + }) + keys = list(image_labels.keys()) + shuffle(keys) + writer = TFRecordWriter(tfrecord_path) + for filename in keys: + label = image_labels[filename] + if filename.endswith(".tif") or filename.endswith(".tiff"): + img = np.array(Image.open(filename).convert("RGB")) + img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) + image_string = cv2.imencode(".png", img)[1].tobytes() + else: + image_string = open(filename, 'rb').read() + record = serialized_record(label, image_string, 0, 0) + writer.write(record) + writer.close() + log.info(f"Wrote {len(keys)} images to {sf.util.green(tfrecord_path)}") + return len(keys)
+ + +
[docs]def write_tfrecords_merge( + input_directory: str, + output_directory: str, + filename: str +) -> int: + """Scans a folder for subfolders, assumes subfolders are slide names. + Assembles all image tiles within subfolders and labels using the provided + annotation_dict, assuming the subfolder is the slide name. Collects all + image tiles and exports into a single tfrecord file. + + Args: + input_directory (str): Directory of images. + output_directory (str): Directory in which to write TFRecord file. + filename (str): TFRecord filename (without path). + + Returns: + int: Number of records written. + """ + tfrecord_path = join(output_directory, filename) + if not exists(output_directory): + os.makedirs(output_directory) + image_labels = {} + slide_dirs = [ + _dir for _dir in os.listdir(input_directory) + if isdir(join(input_directory, _dir)) + ] + for slide_dir in slide_dirs: + directory = join(input_directory, slide_dir) + files = [ + f for f in os.listdir(directory) + if ((isfile(join(directory, f))) + and (sf.util.path_to_ext(f) in ("jpg", "jpeg", "png"))) + ] + for tile in files: + tgt = join(input_directory, slide_dir, tile) + image_labels.update({ + tgt: bytes(slide_dir, 'utf-8') + }) + keys = list(image_labels.keys()) + shuffle(keys) + writer = TFRecordWriter(tfrecord_path) + for filename in keys: + label = image_labels[filename] + image_string = open(filename, 'rb').read() + record = serialized_record(label, image_string, 0, 0) + writer.write(record) + writer.close() + log.info(f"Wrote {len(keys)} images to {sf.util.green(tfrecord_path)}") + return len(keys)
+ + +
[docs]def extract_tiles(tfrecord: str, destination: str) -> None: + """Extracts images within a TFRecord to a destination folder. + + Args: + tfrecord (str): Path to tfrecord. + destination (str): Destination path to write loose images. + + """ + if not exists(destination): + os.makedirs(destination) + log.info(f"Extracting tiles from tfrecord {sf.util.green(tfrecord)}") + log.info(f"Saving tiles to directory {sf.util.green(destination)}") + + dataset = TFRecordDataset(tfrecord) + _, img_type = detect_tfrecord_format(tfrecord) + parser = get_tfrecord_parser( + tfrecord, + ('slide', 'image_raw'), + to_numpy=True, + decode_images=False + ) + for i, record in enumerate(dataset): + slide, image_raw = parser(record) # type: ignore + slidename = slide if type(slide) == str else slide.decode('utf-8') + dest_folder = join(destination, slidename) + if not exists(dest_folder): + os.makedirs(dest_folder) + tile_filename = f"tile{i}.{img_type}" + image_string = open(join(dest_folder, tile_filename), 'wb') + image_string.write(image_raw) + image_string.close()
+ + +
[docs]def get_locations_from_tfrecord(filename: str) -> List[Tuple[int, int]]: + """Return list of tile locations (X, Y) for all items in the TFRecord.""" + + # Use the TFRecord index file, if one exists and it has info stored. + index = tfrecord2idx.find_index(filename) + if index and tfrecord2idx.index_has_locations(index): + return tfrecord2idx.get_locations_from_index(index) + + # Otherwise, read the TFRecord manually. + out_list = [] + for i in range(sf.io.get_tfrecord_length(filename)): + record = sf.io.get_tfrecord_by_index(filename, i) + loc_x = record['loc_x'] + loc_y = record['loc_y'] + out_list.append((loc_x, loc_y)) + return out_list
+ + +
[docs]def tfrecord_has_locations( + filename: str, + check_x: int = True, + check_y: bool = False +) -> bool: + """Check if a given TFRecord has location information stored.""" + index = tfrecord2idx.find_index(filename) + if index and tfrecord2idx.index_has_locations(index): + if check_y: + return np.load(index)['locations'].shape[1] == 2 + return True + record = sf.io.get_tfrecord_by_index(filename, 0) + return (((not check_x) or 'loc_x' in record ) and ((not check_y) or 'loc_y' in record ))
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/io_utils/index.html b/docs/_modules/slideflow/io/io_utils/index.html new file mode 100644 index 000000000..8bec6a695 --- /dev/null +++ b/docs/_modules/slideflow/io/io_utils/index.html @@ -0,0 +1,700 @@ + + + + + + + + + + + + slideflow.io.io_utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io.io_utils

+from __future__ import absolute_import
+
+import imghdr
+import io
+import os
+import struct
+import sys
+import numpy as np
+from typing import List, Optional, Tuple, Any, Union, TYPE_CHECKING
+
+from slideflow import errors, log
+from slideflow.util import tfrecord2idx
+
+if TYPE_CHECKING:
+    import tensorflow as tf
+    import torch
+
+
+def _np_float_to_uint8(img):
+    return ((img + 1) * 127.5).clip(0, 255).astype(np.uint8)
+
+
+def _np_uint8_to_float(img):
+    return ((img.astype(np.float32) / 127.5) - 1)
+
+
+def _is_np_uint8(img):
+    return isinstance(img, np.ndarray) and img.dtype == np.uint8
+
+
+def _is_np_float32(img):
+    return isinstance(img, np.ndarray) and img.dtype == np.float32
+
+
+def _is_tf_uint8(img):
+    import tensorflow as tf
+    return isinstance(img, tf.Tensor) and img.dtype == tf.uint8
+
+
+def _is_tf_float(img):
+    import tensorflow as tf
+    return (isinstance(img, tf.Tensor) and
+            img.dtype == tf.float16 or img.dtype == tf.float32)
+
+
+def _is_torch_uint8(img):
+    import torch
+    return isinstance(img, torch.Tensor) and img.dtype == torch.uint8
+
+
+def _is_torch_float(img):
+    import torch
+    return (isinstance(img, torch.Tensor) and
+            img.dtype == torch.float16 or img.dtype == torch.float32)
+
+
+
[docs]def detect_tfrecord_format(tfr: str) -> Tuple[Optional[List[str]], + Optional[str]]: + '''Detects tfrecord format. + + Args: + tfr (str): Path to tfrecord. + + Returns: + A tuple containing + + list(str): List of detected features. + + str: Image file type (png/jpeg) + ''' + try: + record = tfrecord2idx.get_tfrecord_by_index(tfr, index=0) + except errors.EmptyTFRecordsError: + log.debug(f"Unable to detect format for {tfr}; file empty.") + return None, None + img_type = imghdr.what('', record['image_raw']) + return list(record.keys()), img_type
+ + +
[docs]def convert_dtype( + img: Any, + dtype: Union[np.dtype, "tf.dtypes.DType", "torch.dtype"] +) -> Any: + """Converts an image from one type to another. + + Images can be converted to and from numpy arrays, Torch Tensors and + Tensorflow Tensors. Images can also be converted from standardized + float images to RGB uint8 images, and vice versa. + + Supported formats for starting and ending dtype include: + + .. list-table:: + :widths: 20 80 + :header-rows: 0 + + * - ``np.uint8`` + - Image in RGB (WHC) uint8 format. + * - ``np.float32`` + - RGB (WHC) image. If the source image is a numpy uint8 or torch uint8, + it will be standardized with ``(img / 127.5) - 1``. + If the source image is a tensorflow image, + standardization uses ``tf.image.per_image_standardization()``. + * - ``torch.uint8`` + - Image in RGB (CWH) uint8 format. + * - ``torch.float32`` + - Image converted with ``(img / 127.5) - 1`` and WHC -> CWH. + * - ``tf.uint8`` + - Image in RGB (WHC) uint8 format. + * - ``tf.float32`` + - Image converted with ``tf.image.per_image_standardization()`` + + Args: + img (Any): Input image or batch of images. + start_dtype (type): Starting dtype. + end_dtype (type): Target dtype for conversion. + + Returns: + Converted image or batch of images. + """ + + # Import necessary packages + if 'tensorflow' in sys.modules: + import tensorflow as tf + if 'torch' in sys.modules: + import torch + from slideflow.io.torch import cwh_to_whc, whc_to_cwh + + # Verify dtypes are valid + def _valid_dtype(_dtype): + if 'tensorflow' in sys.modules: + if _dtype in (tf.uint8, tf.float32, tf.float16): + return True + if 'torch' in sys.modules: + if _dtype in (torch.uint8, torch.float32, torch.float16): + return True + return _dtype in (np.uint8, np.float32) + + _valid_str = ("np.uint8, np.float32, " + "tf.uint8, tf.float16, tf.float32, " + "torch.uint8, torch.float16, torch.float32") + if not _valid_dtype(dtype): + raise ValueError(f"Unrecognized dtype {dtype}. Expected: {_valid_str}") + if not _valid_dtype(img.dtype): + raise ValueError(f"Image has unrecognized dtype {img.dtype}. " + f"Expected: {_valid_str}") + + # --- np.uint8 conversions ------------------------------------------------ + elif _is_np_uint8(img): + + if dtype is np.uint8: + return img + + if dtype is np.float32: + return _np_uint8_to_float(img) + + if 'torch' in sys.modules and dtype is torch.uint8: + return whc_to_cwh(torch.from_numpy(img)) + + if 'torch' in sys.modules and dtype in (torch.float16, torch.float32): + assert isinstance(dtype, torch._C.dtype) + return (whc_to_cwh(torch.from_numpy(img).to(dtype)) / 127.5) - 1 + + if 'tensorflow' in sys.modules and dtype is tf.uint8: + return tf.convert_to_tensor(img, dtype=tf.uint8) + + if 'tensorflow' in sys.modules and dtype in (tf.float16, tf.float32): + return tf.cast( + tf.image.per_image_standardization( + tf.convert_to_tensor(img, dtype=tf.uint8)), dtype) + + # --- np.float32 conversions ---------------------------------------------- + elif _is_np_float32(img): + + if dtype is np.float32: + return img + + if dtype is np.uint8: + return _np_float_to_uint8(img) + + if 'torch' in sys.modules and dtype is torch.uint8: + return whc_to_cwh(torch.from_numpy(_np_float_to_uint8(img))) + + if 'torch' in sys.modules and dtype in (torch.float16, torch.float32): + assert isinstance(dtype, torch._C.dtype) + return whc_to_cwh(torch.from_numpy(img).to(dtype)) + + if 'tensorflow' in sys.modules and dtype is tf.uint8: + return tf.convert_to_tensor(_np_float_to_uint8(img)) + + if 'tensorflow' in sys.modules and dtype in (tf.float16, tf.float32): + return tf.cast( + tf.image.per_image_standardization( + tf.convert_to_tensor(_np_float_to_uint8(img))), dtype) + + # --- torch.uint8 conversions --------------------------------------------- + elif 'torch' in sys.modules and _is_torch_uint8(img): + + if dtype is torch.uint8: + return img + + if dtype is np.uint8: + return img.cpu().numpy() + + if dtype is np.float32: + return _np_uint8_to_float(img.cpu().numpy()) + + if dtype in (torch.float16, torch.float32): + return (img.to(dtype) / 127.5) - 1 + + if 'tensorflow' in sys.modules and dtype is tf.uint8: + return tf.convert_to_tensor(cwh_to_whc(img).cpu().numpy()) + + if 'tensorflow' in sys.modules and dtype in (tf.float16, tf.float32): + return tf.cast( + tf.image.per_image_standardization( + tf.convert_to_tensor(cwh_to_whc(img).cpu().numpy())), dtype) + + # --- torch.float32 conversions ------------------------------------------- + elif 'torch' in sys.modules and _is_torch_float(img): + + if dtype in (torch.float16, torch.float32) and dtype == img.dtype: + return img + + if dtype is np.uint8: + return _np_float_to_uint8(cwh_to_whc(img).cpu().numpy()) + + if dtype is np.float32: + return cwh_to_whc(img).cpu().numpy() + + if dtype is torch.uint8: + return ((img + 1) * 127.5).clamp(0, 255).to(torch.uint8) + + if 'tensorflow' in sys.modules and dtype is tf.uint8: + return tf.convert_to_tensor( + cwh_to_whc( + ((img + 1) * 127.5).clamp(0, 255).to(torch.uint8)).cpu().numpy()) + + if 'tensorflow' in sys.modules and dtype in (tf.float16, tf.float32): + return tf.cast( + tf.image.per_image_standardization( + tf.convert_to_tensor( + cwh_to_whc( + ((img + 1) * 127.5).clamp(0, 255).to(torch.uint8)).cpu().numpy())), dtype) + + # --- tf.uint8 conversions ------------------------------------------------ + elif 'tensorflow' in sys.modules and _is_tf_uint8(img): + + if dtype is tf.uint8: + return img + + if dtype is np.uint8: + return img.numpy() + + if dtype is np.float32: + return tf.cast( + tf.image.per_image_standardization(img), tf.float32).numpy() + + if 'torch' in sys.modules and dtype in (torch.float16, torch.float32): + assert isinstance(dtype, torch._C.dtype) + return (torch.from_numpy(img.numpy()).to(dtype) / 127.5) - 1 + + if 'torch' in sys.modules and dtype is torch.uint8: + return torch.from_numpy(img.numpy()) + + if dtype in (tf.float16, tf.float32): + return tf.cast( + tf.image.per_image_standardization(img), dtype) + + # --- tf.float32 conversions ---------------------------------------------- + elif 'tensorflow' in sys.modules and _is_tf_float(img): + + if dtype in (tf.float16, tf.float32) and dtype == img.dtype: + return img + + if dtype is np.float32: + return img.numpy() + + if (dtype in (tf.uint8, np.uint8) + or ('torch' in sys.modules and dtype is torch.uint8)): + raise ValueError( + "Unable to convert standardized Tensorflow tensors to " + "uint8 (Tensorflow standardization is uni-directional)") + + if 'torch' in sys.modules and dtype in (torch.float16, torch.float32): + raise ValueError( + "Unable to convert standardized Tensorflow tensors to " + "PyTorch-standardized tensors (Tensorflow standardization is " + "uni-directional)") + + else: + raise ValueError(f"Unable to convert from {img.dtype} to {dtype}")
+ +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/tensorflow/index.html b/docs/_modules/slideflow/io/tensorflow/index.html new file mode 100644 index 000000000..e30f83a2c --- /dev/null +++ b/docs/_modules/slideflow/io/tensorflow/index.html @@ -0,0 +1,1479 @@ + + + + + + + + + + + + slideflow.io.tensorflow — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io.tensorflow

+import os
+import shutil
+import numpy as np
+import multiprocessing as mp
+import tensorflow as tf
+from functools import partial
+from glob import glob
+from os import listdir
+from os.path import exists, isfile, join
+from random import randint, shuffle
+from rich.progress import track, Progress
+from rich import print as richprint
+from typing import (TYPE_CHECKING, Any, Callable, Dict, Iterable, List,
+                    Optional, Tuple, Union)
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.io import gaussian
+from slideflow.io.io_utils import detect_tfrecord_format
+from slideflow.util import Labels
+from slideflow.util import log
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+    from tensorflow.core.example.feature_pb2 import Example, Feature
+
+
+FEATURE_DESCRIPTION = {
+    'slide': tf.io.FixedLenFeature([], tf.string),
+    'image_raw': tf.io.FixedLenFeature([], tf.string),
+    'loc_x': tf.io.FixedLenFeature([], tf.int64),
+    'loc_y': tf.io.FixedLenFeature([], tf.int64)
+}
+
+
+def _bytes_feature(value: bytes) -> "Feature":
+    """Returns a bytes_list from a string / byte."""
+    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
+
+
+def _int64_feature(value: int) -> "Feature":
+    """Returns an int64_list from a bool / enum / int / uint."""
+    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
+
+
+def _apply_otsu(wsi):
+    wsi.qc('otsu')
+    return wsi
+
+
+
[docs]def read_and_return_record( + record: bytes, + parser: Callable, + assign_slide: Optional[bytes] = None +) -> "Example": + """Process raw TFRecord bytes into a format that can be written with + ``tf.io.TFRecordWriter``. + + Args: + record (bytes): Raw TFRecord bytes (unparsed) + parser (Callable): TFRecord parser, as returned by + :func:`sf.io.get_tfrecord_parser()` + assign_slide (str, optional): Slide name to override the record with. + Defaults to None. + + Returns: + Dictionary mapping record key to a tuple containing (bytes, dtype). + """ + features = parser(record) + if assign_slide: + features['slide'] = assign_slide + tf_example = tfrecord_example(**features) + return tf_example.SerializeToString()
+ + +def _print_record(filename: str) -> None: + dataset = tf.data.TFRecordDataset(filename) + parser = get_tfrecord_parser( + filename, + ('slide', 'loc_x', 'loc_y'), + to_numpy=True, + error_if_invalid=False + ) + if parser is None: + raise errors.TFRecordsError(f"Unable to read TFRecord {filename}") + for i, record in enumerate(dataset): + slide, loc_x, loc_y = parser(record) + line = f"[magenta]{filename}[/]: Record {i}: Slide: " + line += f"[green]{str(slide)}[/] Loc: {(loc_x, loc_y)}" + richprint(line) + + +
[docs]@tf.function +def preprocess_uint8( + img: tf.Tensor, + normalizer: Optional["StainNormalizer"] = None, + standardize: bool = True, + resize_px: Optional[int] = None, + resize_method: str = 'lanczos3', + resize_aa: bool = True, + as_dict: bool = True +) -> Dict[str, tf.Tensor]: + """Process batch of tensorflow images, resizing, normalizing, + and standardizing. + + Args: + img (tf.Tensor): Batch of tensorflow images (uint8). + normalizer (sf.norm.StainNormalizer, optional): Normalizer. + Defaults to None. + standardize (bool, optional): Standardize images. Defaults to True. + resize_px (Optional[int], optional): Resize images. Defaults to None. + resize_method (str, optional): Resize method. Defaults to 'lanczos3'. + resize_aa (bool, optional): Apply antialiasing during resizing. + Defaults to True. + + Returns: + Dict[str, tf.Tensor]: Processed image. + """ + if resize_px is not None: + img = tf.image.resize( + img, + (resize_px, resize_px), + method=resize_method, + antialias=resize_aa + ) + img = tf.cast(img, tf.uint8) + if normalizer is not None: + img = normalizer.tf_to_tf(img) # type: ignore + if standardize: + img = tf.image.per_image_standardization(img) + if as_dict: + return {'tile_image': img} + else: + return img
+ + +
[docs]@tf.function +def process_image( + record: Union[tf.Tensor, Dict[str, tf.Tensor]], + *args: Any, + standardize: bool = False, + augment: Union[bool, str] = False, + transform: Optional[Callable] = None, + size: Optional[int] = None +) -> Tuple[Union[Dict, tf.Tensor], ...]: + """Applies augmentations and/or standardization to an image Tensor. + + Args: + record (Union[tf.Tensor, Dict[str, tf.Tensor]]): Image Tensor. + + Keyword Args: + standardize (bool, optional): Standardize images. Defaults to False. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + + Combine letters to define augmentations, such as ``'xyrj'``. + A value of True will use ``'xyrjb'``. + Note: this function does not support stain augmentation. + transform (Callable, optional): Arbitrary transform function. + Performs transformation after augmentations but before standardization. + Defaults to None. + size (int, optional): Set the image shape. Defaults to None. + + """ + if isinstance(record, dict): + image = record['tile_image'] + else: + image = record + if size is not None: + image.set_shape([size, size, 3]) + if augment is True or (isinstance(augment, str) and 'j' in augment): + # Augment with random compession + image = tf.cond(tf.random.uniform( + shape=[], # pylint: disable=unexpected-keyword-arg + minval=0, + maxval=1, + dtype=tf.float32 + ) < 0.5, + true_fn=lambda: tf.image.adjust_jpeg_quality( + image, tf.random.uniform( + shape=[], # pylint: disable=unexpected-keyword-arg + minval=50, + maxval=100, + dtype=tf.int32 + ) + ), + false_fn=lambda: image) + if augment is True or (isinstance(augment, str) and 'r' in augment): + # Rotate randomly 0, 90, 180, 270 degrees + image = tf.image.rot90( + image, + tf.random.uniform(shape=[], minval=0, maxval=4, dtype=tf.int32) + ) # pylint: disable=unexpected-keyword-arg + # Random flip and rotation + if augment is True or (isinstance(augment, str) and 'x' in augment): + image = tf.image.random_flip_left_right(image) + if augment is True or (isinstance(augment, str) and 'y' in augment): + image = tf.image.random_flip_up_down(image) + if augment is True or (isinstance(augment, str) and 'b' in augment): + # Augment with random gaussian blur (p=0.1) + uniform_kwargs = { + 'shape': [], + 'minval': 0, + 'maxval': 1, + 'dtype': tf.float32 + } + image = tf.cond( + tf.random.uniform(**uniform_kwargs) < 0.1, + true_fn=lambda: tf.cond( + tf.random.uniform(**uniform_kwargs) < 0.5, + true_fn=lambda: tf.cond( + tf.random.uniform(**uniform_kwargs) < 0.5, + true_fn=lambda: tf.cond( + tf.random.uniform(**uniform_kwargs) < 0.5, + true_fn=lambda: gaussian.auto_gaussian(image, sigma=2.0), + false_fn=lambda: gaussian.auto_gaussian(image, sigma=1.5), + ), + false_fn=lambda: gaussian.auto_gaussian(image, sigma=1.0), + ), + false_fn=lambda: gaussian.auto_gaussian(image, sigma=0.5), + ), + false_fn=lambda: image + ) + if isinstance(augment, str) and 'i' in augment: + raise NotImplementedError("Random pixel interpolation not implemented.") + if transform is not None: + image = transform(image) + if standardize: + image = tf.image.per_image_standardization(image) + + if isinstance(record, dict): + to_return = {k: v for k, v in record.items() if k != 'tile_image'} + to_return['tile_image'] = image + return tuple([to_return] + list(args)) + else: + return tuple([image] + list(args))
+ + +
[docs]@tf.function +def decode_image( + img_string: bytes, + img_type: str, + crop_left: Optional[int] = None, + crop_width: Optional[int] = None, + resize_target: Optional[int] = None, + resize_method: str = 'lanczos3', + resize_aa: bool = True, + size: Optional[int] = None +) -> tf.Tensor: + """Decodes an image. + + Args: + img_string (bytes): Image bytes (JPG/PNG). + img_type (str): Type of image data; 'jpg', 'jpeg', or 'png'. + crop_left (int, optional): Crop image starting at this top-left + coordinate. Defaults to None. + crop_width (int, optional): Crop image to this width. + Defaults to None. + resize_target (int, optional): Resize image, post-crop, to this target + size in pixels. Defaults to None. + resize_method (str, optional): Resizing method, if applicable. + Defaults to 'lanczos3'. + resize_aa (bool, optional): If resizing, use antialiasing. + Defaults to True. + size (int, optional): Set the image size/width (pixels). + Defaults to None. + + Returns: + tf.Tensor: Processed image (uint8). + """ + tf_decoders = { + 'png': tf.image.decode_png, + 'jpeg': tf.image.decode_jpeg, + 'jpg': tf.image.decode_jpeg + } + decoder = tf_decoders[img_type.lower()] + image = decoder(img_string, channels=3) + if crop_left is not None: + image = tf.image.crop_to_bounding_box( + image, crop_left, crop_left, crop_width, crop_width + ) + if resize_target is not None: + image = tf.image.resize(image, (resize_target, resize_target), method=resize_method, antialias=resize_aa) + image.set_shape([resize_target, resize_target, 3]) + elif size: + image.set_shape([size, size, 3]) + return image
+ + +def auto_decode_image(img_string: bytes, *, img_type: Optional[str] = None): + if img_type is None: + import imghdr + img_type = imghdr.what('', img_string) + return decode_image(img_string, img_type) + + +
[docs]def get_tfrecord_parser( + tfrecord_path: str, + features_to_return: Optional[Iterable[str]] = None, + to_numpy: bool = False, + decode_images: bool = True, + img_size: Optional[int] = None, + error_if_invalid: bool = True, + **decode_kwargs: Any +) -> Optional[Callable]: + + """Returns a tfrecord parsing function based on the specified parameters. + + Args: + tfrecord_path (str): Path to tfrecord to parse. + features_to_return (list or dict, optional): Designates format for how + features should be returned from parser. If a list of feature names + is provided, the parsing function will return tfrecord features as + a list in the order provided. If a dictionary of labels (keys) + mapping to feature names (values) is provided, features will be + returned from the parser as a dictionary matching the same format. + If None, will return all features as a list. + to_numpy (bool, optional): Convert records from tensors->numpy arrays. + Defaults to False. + decode_images (bool, optional): Decode image strings into arrays. + Defaults to True. + standardize (bool, optional): Standardize images into the range (0,1). + Defaults to False. + img_size (int): Width of images in pixels. Will call tf.set_shape(...) + if provided. Defaults to False. + normalizer (:class:`slideflow.norm.StainNormalizer`): Stain normalizer + to use on images. Defaults to None. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + error_if_invalid (bool, optional): Raise an error if a tfrecord cannot + be read. Defaults to True. + """ + features, img_type = detect_tfrecord_format(tfrecord_path) + if features is None: + log.debug(f"Unable to read tfrecord at {tfrecord_path} - is it empty?") + return None + if features_to_return is None: + features_to_return = {k: k for k in features} + feature_description = { + k: v for k, v in FEATURE_DESCRIPTION.items() + if k in features + } + + def parser(record): + features = tf.io.parse_single_example(record, feature_description) + + def process_feature(f): + if f not in features and f in ('loc_x', 'loc_y'): + return None + elif f not in features and error_if_invalid: + raise errors.TFRecordsError(f"Unknown TFRecord feature {f}") + elif f not in features: + return None + elif f == 'image_raw' and decode_images: + return decode_image( + features['image_raw'], + img_type, + size=img_size, + **decode_kwargs + ) + elif to_numpy: + return features[f].numpy() + else: + return features[f] + + if type(features_to_return) == dict: + return { + label: process_feature(f) + for label, f in features_to_return.items() + } + else: + return [process_feature(f) for f in features_to_return] + + return parser
+ + +
[docs]def parser_from_labels(labels: Labels) -> Callable: + """Create a label parsing function used for parsing slides into single + or multi-outcome labels. + + Args: + labels (dict): Dictionary mapping slide names to outcome labels. + + Returns: + Callable: Label parsing function. + + """ + outcome_labels = np.array(list(labels.values())) + slides = list(labels.keys()) + if len(outcome_labels.shape) == 1: + outcome_labels = np.expand_dims(outcome_labels, axis=1) + with tf.device('/cpu'): + annotations_tables = [] + for oi in range(outcome_labels.shape[1]): + annotations_tables += [tf.lookup.StaticHashTable( + tf.lookup.KeyValueTensorInitializer( + slides, + outcome_labels[:, oi] + ), -1 + )] + + def label_parser(image, slide): + if outcome_labels.shape[1] > 1: + label = [ + annotations_tables[oi].lookup(slide) + for oi in range(outcome_labels.shape[1]) + ] + else: + label = annotations_tables[0].lookup(slide) + return image, label + + return label_parser
+ + +
[docs]def interleave( + paths: List[str], + *, + augment: bool = False, + batch_size: Optional[int], + clip: Optional[Dict[str, int]] = None, + deterministic: bool = False, + drop_last: bool = False, + from_wsi: bool = False, + incl_loc: Optional[str] = None, + incl_slidenames: bool = False, + infinite: bool = True, + img_size: int, + labels: Optional[Labels] = None, + normalizer: Optional["StainNormalizer"] = None, + num_parallel_reads: int = 4, + num_shards: Optional[int] = None, + pool: Optional["mp.pool.Pool"] = None, + prob_weights: Optional[Dict[str, float]] = None, + rois: Optional[List[str]] = None, + roi_method: str = 'auto', + shard_idx: Optional[int] = None, + standardize: bool = True, + tile_um: Optional[int] = None, + tfrecord_parser: Optional[Callable] = None, + transform: Optional[Callable] = None, + **decode_kwargs: Any +) -> Iterable: + """Generate an interleaved dataset from a collection of tfrecord files. + + The interleaved dataset samples from tfrecord files randomly according to + balancing, if provided. Requires manifest for balancing. Assumes TFRecord + files are named by slide. + + Args: + paths (list(str)): List of paths to TFRecord files or whole-slide + images. + + Keyword Args: + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + batch_size (int): Batch size. + clip (dict, optional): Dict mapping tfrecords to number of tiles to + take per tfrecord. Defaults to None. + deterministic (bool, optional): When num_parallel_calls is specified, + if this boolean is specified, it controls the order in which the + transformation produces elements. If set to False, the + transformation is allowed to yield elements out of order to trade + determinism for performance. Defaults to False. + drop_last (bool, optional): Drop the last non-full batch. + Defaults to False. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + incl_loc (str, optional): 'coord', 'grid', or None. Return (x,y) + coordinates ('coord') for each tile center along with tile + images, or the (x,y) grid coordinates for each tile. + Defaults to 'coord'. + incl_slidenames (bool, optional): Include slidenames as third returned + variable. Defaults to False. + infinite (bool, optional): Create an finite dataset. WARNING: If + infinite is False && balancing is used, some tiles will be skipped. + Defaults to True. + img_size (int): Image width in pixels. + labels (dict or str, optional): Dict or function. If dict, must map + slide names to outcome labels. If function, function must accept an + image (tensor) and slide name (str), and return a dict + {'image_raw': image (tensor)} and label (int or float). If not + provided, all labels will be None. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer to use on images. Defaults to None. + num_parallel_reads (int, optional): Number of parallel reads for each + TFRecordDataset. Defaults to 4. + num_shards (int, optional): Shard the tfrecord datasets, used for + multiprocessing datasets. Defaults to None. + pool (multiprocessing.Pool): Shared multiprocessing pool. Useful + if ``from_wsi=True``, for sharing a unified processing pool between + dataloaders. Defaults to None. + prob_weights (dict, optional): Dict mapping tfrecords to probability of + including in batch. Defaults to None. + rois (list(str), optional): List of ROI paths. Only used if + from_wsi=True. Defaults to None. + roi_method (str, optional): Method for extracting ROIs. Only used if + from_wsi=True. Defaults to 'auto'. + shard_idx (int, optional): Index of the tfrecord shard to use. + Defaults to None. + standardize (bool, optional): Standardize images to (0,1). + Defaults to True. + tile_um (int, optional): Size of tiles to extract from WSI, in + microns. Only used if from_wsi=True. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + transform (Callable, optional): Arbitrary transform function. + Performs transformation after augmentations but before + standardization. Defaults to None. + **decode_kwargs (dict): Keyword arguments to pass to + :func:`slideflow.io.tensorflow.decode_image`. + + """ + if not len(paths): + raise errors.TFRecordsNotFoundError + _path_type = "slides" if from_wsi else "tfrecords" + log.debug( + f'Interleaving {len(paths)} {_path_type}: infinite={infinite}, ' + f'num_parallel_reads={num_parallel_reads}') + if from_wsi and not tile_um: + raise ValueError("`tile_um` required for interleave() " + "if `from_wsi=True`") + + if num_shards: + log.debug(f'num_shards={num_shards}, shard_idx={shard_idx}') + if isinstance(labels, dict): + label_parser = parser_from_labels(labels) + elif callable(labels) or labels is None: + label_parser = labels # type: ignore + else: + raise ValueError( + f"Unrecognized type for labels: {type(labels)} (must be dict" + " or function)") + + datasets = [] + weights = [] if prob_weights else None # type: Optional[List] + + + if from_wsi: + pb = Progress(transient=True) + read_task = pb.add_task('Reading slides...', total=len(paths), visible=False) + otsu_task = pb.add_task("Otsu thresholding...", total=len(paths), visible=False) + interleave_task = pb.add_task('Interleaving...', total=len(paths)) + pb.start() + else: + pb = None + with tf.device('cpu'), sf.util.cleanup_progress(pb): + features_to_return = ['image_raw', 'slide'] + if incl_loc: + features_to_return += ['loc_x', 'loc_y'] + + if from_wsi: + assert tile_um is not None + pb.update(read_task, visible=True) + pb.update(otsu_task, visible=True) + + def base_parser(record): + return tuple([record[f] for f in features_to_return]) + + # Load slides and apply Otsu's thresholding + if pool is None and sf.slide_backend() == 'cucim': + pool = mp.Pool( + sf.util.num_cpu(default=8), + initializer=sf.util.set_ignore_sigint + ) + elif pool is None: + pool = mp.dummy.Pool(sf.util.num_cpu(default=16)) + wsi_list = [] + to_remove = [] + otsu_list = [] + for path in paths: + try: + wsi = sf.WSI( + path, + img_size, + tile_um, + rois=rois, + roi_method=roi_method, + verbose=False + ) + wsi_list += [wsi] + pb.advance(read_task) + except errors.SlideLoadError as e: + log.error(f"Error reading slide {path}: {e}") + to_remove += [path] + for path in to_remove: + paths.remove(path) + for task in (read_task, otsu_task, interleave_task): + pb.update(task, total=len(paths)) + for wsi in pool.imap(_apply_otsu, wsi_list): + otsu_list += [wsi] + pb.advance(otsu_task) + est_num_tiles = sum([wsi.estimated_num_tiles for wsi in otsu_list]) + elif tfrecord_parser is None: + base_parser = None # type: ignore + for i in range(len(paths)): + if base_parser is not None: + continue + if i > 0: + log.debug(f"Failed to get parser, trying again (n={i})...") + base_parser = get_tfrecord_parser( + paths[i], + features_to_return, + img_size=img_size, + **decode_kwargs) + else: + base_parser = tfrecord_parser + + for t, tfr in enumerate(paths): + if from_wsi: + tf_dts = otsu_list[t].tensorflow( + pool=pool, + lazy_iter=True, + incl_slidenames=True, + grayspace_fraction=1, + incl_loc=incl_loc, + ) + tfr = sf.util.path_to_name(tfr) + else: + tf_dts = tf.data.TFRecordDataset( + tfr, + num_parallel_reads=num_parallel_reads + ) + if num_shards: + tf_dts = tf_dts.shard(num_shards, index=shard_idx) + if clip: + tf_dts = tf_dts.take( + clip[tfr] // (num_shards if num_shards else 1) + ) + if infinite: + tf_dts = tf_dts.repeat() + datasets += [tf_dts] + if prob_weights: + weights += [prob_weights[tfr]] # type: ignore + if from_wsi: + pb.advance(interleave_task) + + # ------- Interleave and parse datasets ------------------------------- + sampled_dataset = tf.data.Dataset.sample_from_datasets( + datasets, + weights=weights + ) + dataset = _get_parsed_datasets( + sampled_dataset, + base_parser=base_parser, # type: ignore + label_parser=label_parser, + include_slidenames=incl_slidenames, + include_loc=incl_loc, + deterministic=deterministic + ) + # ------- Apply normalization ----------------------------------------- + if normalizer: + if not isinstance(normalizer, sf.norm.StainNormalizer): + raise ValueError( + f"Expected normalizer to be type StainNormalizer, got: {type(normalizer)}" + ) + if normalizer.vectorized: + log.debug("Using vectorized normalization") + norm_batch_size = 32 if not batch_size else batch_size + dataset = dataset.batch(norm_batch_size, drop_remainder=drop_last) + else: + log.debug("Using per-image normalization") + dataset = dataset.map( + partial(normalizer.tf_to_tf, augment=(isinstance(augment, str) + and 'n' in augment)), + num_parallel_calls=tf.data.AUTOTUNE, + deterministic=deterministic, + ) + if normalizer.vectorized: + dataset = dataset.unbatch() + elif isinstance(augment, str) and 'n' in augment: + raise ValueError( + "Stain augmentation (n) requires a stain normalizer, which was not " + "provided. Augmentation string: {}".format(augment) + ) + + # ------- Standardize and augment images ------------------------------ + dataset = dataset.map( + partial( + process_image, + standardize=standardize, + augment=augment, + transform=transform, + size=img_size + ), + num_parallel_calls=tf.data.AUTOTUNE, + deterministic=deterministic + ) + # ------- Batch and prefetch ------------------------------------------ + if batch_size: + dataset = dataset.batch(batch_size, drop_remainder=drop_last) + dataset = dataset.prefetch(tf.data.AUTOTUNE) + if from_wsi: + dataset.est_num_tiles = est_num_tiles + return dataset
+ + +def _get_parsed_datasets( + tfrecord_dataset: tf.data.Dataset, + base_parser: Callable, + label_parser: Optional[Callable] = None, + include_slidenames: bool = False, + include_loc: Optional[str] = None, + deterministic: bool = False +) -> tf.data.Dataset: + """Return a parsed dataset. + + Args: + tfrecord_dataset (tf.data.Dataset): Dataset to be parsed; should be + a raw TFRecord reading dataset, yielding bytes. + base_parser (Callable): Base TFRecord parser which parses bytes into + features. + label_parser (Optional[Callable], optional): Function to parse input + (image, slide) into (image, label). Defaults to None. + include_slidenames (bool, optional): Yield slide names as a third + returned value. Defaults to False. + include_loc (Optional[str], optional): Yield location X and Y coords + as two additional values. If include_slidenames is true, these will + follow slide names. Defaults to None. + deterministic (bool, optional): Read from TFRecords in order, at the + expense of performance. Defaults to False. + + Returns: + tf.data.Dataset: Parsed dataset. + """ + + def final_parser(record): + if include_loc: + image, slide, loc_x, loc_y = base_parser(record) + else: + image, slide = base_parser(record) + image, label = label_parser(image, slide) if label_parser else (image, None) + + to_return = [image, label] + if include_slidenames: + to_return += [slide] + if include_loc: + to_return += [loc_x, loc_y] + return tuple(to_return) + + return tfrecord_dataset.map( + final_parser, + num_parallel_calls=tf.data.AUTOTUNE, + deterministic=deterministic + ) + + +
[docs]def tfrecord_example( + slide: bytes, + image_raw: bytes, + loc_x: Optional[int] = 0, + loc_y: Optional[int] = 0 +) -> "Example": + """Return a Tensorflow Data example for TFRecord storage. + + Args: + slide (bytes): Slide name. + image_raw (bytes): Image bytes. + loc_x (Optional[int], optional): X coordinate of image. Defaults to 0. + loc_y (Optional[int], optional): Y coordinate of image. Defaults to 0. + + Returns: + Example: Tensorflow Data example. + + """ + feature = { + 'slide': _bytes_feature(slide), + 'image_raw': _bytes_feature(image_raw), + } + if loc_x is not None: + feature.update({'loc_x': _int64_feature(loc_x)}) + if loc_y is not None: + feature.update({'loc_y': _int64_feature(loc_y)}) + return tf.train.Example(features=tf.train.Features(feature=feature))
+ + +
[docs]def serialized_record( + slide: bytes, + image_raw: bytes, + loc_x: int = 0, + loc_y: int = 0 +) -> bytes: + """Serialize a record for TFRecord storage. + + The serialized record will be in a data format ready to be written + by a TFRecordWriter. + + Args: + slide (bytes): Slide name. + image_raw (bytes): Image bytes. + loc_x (int, optional): X coordinate of image. Defaults to 0. + loc_y (int, optional): Y coordinate of image. Defaults to 0. + + Returns: + bytes: Serialized record. + + """ + return tfrecord_example(slide, image_raw, loc_x, loc_y).SerializeToString()
+ + +
[docs]def multi_image_example(slide: bytes, image_dict: Dict) -> "Example": + """Returns a Tensorflow Data example for storage with multiple images. + + Args: + slide (bytes): Slide name. + image_dict (Dict): Dictionary of image names and image bytes. + + Returns: + Example: Tensorflow Data example. + + """ + feature = { + 'slide': _bytes_feature(slide) + } + for image_label in image_dict: + feature.update({ + image_label: _bytes_feature(image_dict[image_label]) + }) + return tf.train.Example(features=tf.train.Features(feature=feature))
+ + +
[docs]def join_tfrecord( + input_folder: str, + output_file: str, + assign_slide: str = None +) -> None: + """Randomly sample from tfrecords in the input folder with shuffling, + and combine into a single tfrecord file. + + Args: + input_folder (str): Folder containing tfrecord files. + output_file (str): Output tfrecord file. + assign_slide (str, optional): Assign a slide name to all images. + Defaults to None. + + """ + writer = tf.io.TFRecordWriter(output_file) + tfrecord_files = glob(join(input_folder, "*.tfrecords")) + datasets = [] + if assign_slide: + slide = assign_slide.encode('utf-8') + features, img_type = detect_tfrecord_format(tfrecord_files[0]) + parser = get_tfrecord_parser( + tfrecord_files[0], + decode_images=False, + to_numpy=True + ) + for tfrecord in tfrecord_files: + n_feat, n_img_type = detect_tfrecord_format(tfrecord) + if n_feat != features or n_img_type != img_type: + raise errors.TFRecordsError( + "Mismatching tfrecord format found, unable to merge" + ) + dataset = tf.data.TFRecordDataset(tfrecord) + dataset = dataset.shuffle(1000) + dataset_iter = iter(dataset) + datasets += [dataset_iter] + while len(datasets): + index = randint(0, len(datasets)-1) + try: + record = next(datasets[index]) + except StopIteration: + del(datasets[index]) + continue + writer.write( + read_and_return_record(record, parser, slide) # type: ignore + )
+ + +
[docs]def split_tfrecord(tfrecord_file: str, output_folder: str) -> None: + """Split records from a single tfrecord into individual tfrecord + files, stratified by slide. + + Args: + tfrecord_file (str): Path to tfrecord file. + output_folder (str): Path to output folder. + + """ + dataset = tf.data.TFRecordDataset(tfrecord_file) + parser = get_tfrecord_parser(tfrecord_file, ['slide'], to_numpy=True) + full_parser = get_tfrecord_parser( + tfrecord_file, + decode_images=False, + to_numpy=True + ) + writers = {} # type: ignore + for record in dataset: + slide = parser(record) # type: ignore + shortname = sf.util._shortname(slide.decode('utf-8')) + if shortname not in writers.keys(): + tfrecord_path = join(output_folder, f"{shortname}.tfrecords") + writer = tf.io.TFRecordWriter(tfrecord_path) + writers.update({shortname: writer}) + else: + writer = writers[shortname] + writer.write( + read_and_return_record(record, full_parser) # type: ignore + ) + for slide in writers.keys(): + writers[slide].close()
+ + + + + +
[docs]def checkpoint_to_tf_model(models_dir: str, model_name: str) -> None: + """Convert a checkpoint file into a saved model. + + Args: + models_dir: Directory containing the model. + model_name: Name of the model to convert. + + """ + checkpoint = join(models_dir, model_name, "cp.ckpt") + tf_model = join(models_dir, model_name, "untrained_model") + updated_tf_model = join(models_dir, model_name, "checkpoint_model") + model = tf.keras.models.load_model(tf_model) + model.load_weights(checkpoint) + try: + model.save(updated_tf_model) + except KeyError: + # Not sure why this happens, something to do with the optimizer? + log.debug("KeyError encountered in checkpoint_to_tf_model") + pass
+ + +
[docs]def transform_tfrecord( + origin: str, + target: str, + assign_slide: Optional[str] = None, + hue_shift: Optional[float] = None, + resize: Optional[float] = None, +) -> None: + """Transform images in a single tfrecord. + + Can perform hue shifting, resizing, or re-assigning slide label. + + Args: + origin: Path to the original tfrecord file. + target: Path to the new tfrecord file. + assign_slide: If provided, will assign this slide name to all + records in the new tfrecord. + hue_shift: If provided, will shift the hue of all images by + this amount. + resize: If provided, will resize all images to this size. + + """ + log.info(f"Transforming tiles in tfrecord [green]{origin}") + log.info(f"Saving to new tfrecord at [green]{target}") + if assign_slide: + log.info(f"Assigning slide name [bold]{assign_slide}") + if hue_shift: + log.info(f"Shifting hue by [bold]{hue_shift}") + if resize: + log.info(f"Resizing records to ({resize}, {resize})") + dataset = tf.data.TFRecordDataset(origin) + writer = tf.io.TFRecordWriter(target) + parser = get_tfrecord_parser( + origin, + ('slide', 'image_raw', 'loc_x', 'loc_y'), + decode_images=(hue_shift is not None or resize is not None), + error_if_invalid=False, + to_numpy=True + ) + + def process_image(image_string): + if hue_shift: + decoded_image = tf.image.decode_png(image_string, channels=3) + adjusted_image = tf.image.adjust_hue(decoded_image, hue_shift) + encoded_image = tf.io.encode_jpeg(adjusted_image, quality=80) + return encoded_image.numpy() + elif resize: + decoded_image = tf.image.decode_png(image_string, channels=3) + resized_image = tf.image.resize( + decoded_image, + (resize, resize), + method=tf.image.ResizeMethod.NEAREST_NEIGHBOR + ) + encoded_image = tf.io.encode_jpeg(resized_image, quality=80) + return encoded_image.numpy() + else: + return image_string + + for record in dataset: + slide, image_raw, loc_x, loc_y = parser(record) # type: ignore + if assign_slide and isinstance(assign_slide, str): + slidename = bytes(assign_slide, 'utf-8') + elif assign_slide: + slidename = bytes(assign_slide(slide), 'utf-8') + else: + slidename = slide + image_processed_data = process_image(image_raw) + tf_example = tfrecord_example( + slidename, + image_processed_data, + loc_x, + loc_y + ) + writer.write(tf_example.SerializeToString()) + writer.close()
+ + +
[docs]def shuffle_tfrecord(target: str) -> None: + """Shuffle records in a TFRecord, saving the original to a .old file. + + Args: + target: Path to the tfrecord file. + + """ + + old_tfrecord = target+".old" + shutil.move(target, old_tfrecord) + dataset = tf.data.TFRecordDataset(old_tfrecord) + writer = tf.io.TFRecordWriter(target) + extracted_tfrecord = [] + for record in dataset: + extracted_tfrecord += [record.numpy()] + shuffle(extracted_tfrecord) + for record in extracted_tfrecord: + writer.write(record) + writer.close()
+ + +
[docs]def shuffle_tfrecords_by_dir(directory: str) -> None: + """For each TFRecord in a directory, shuffle records in the TFRecord, + saving the original to a .old file. + + Args: + directory: Path to the directory containing tfrecord files. + + """ + records = [tfr for tfr in listdir(directory) if tfr[-10:] == ".tfrecords"] + for record in records: + log.info(f'Working on {record}') + shuffle_tfrecord(join(directory, record))
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/torch/data_utils/index.html b/docs/_modules/slideflow/io/torch/data_utils/index.html new file mode 100644 index 000000000..a4749fe43 --- /dev/null +++ b/docs/_modules/slideflow/io/torch/data_utils/index.html @@ -0,0 +1,677 @@ + + + + + + + + + + + + slideflow.io.torch.data_utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io.torch.data_utils

+"""Data utilities for Torch datasets."""
+
+import pandas as pd
+import numpy as np
+
+from slideflow import errors
+from slideflow.util import tfrecord2idx, to_onehot
+from slideflow.io.io_utils import detect_tfrecord_format
+from typing import (TYPE_CHECKING, Any, Callable, Dict, Iterable,
+                    Optional, Tuple, Union)
+
+from .augment import compose_augmentations
+from .img_utils import decode_image
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+
+# -------------------------------------------------------------------------
+
+
+FEATURE_DESCRIPTION = {
+    'image_raw': 'byte',
+    'slide': 'byte',
+    'loc_x': 'int',
+    'loc_y': 'int'
+}
+
+# -------------------------------------------------------------------------
+
+def process_labels(
+    labels: Optional[Dict[str, Any]] = None,
+    onehot: bool = False
+) -> Tuple[Optional[Union[Dict[str, Any], pd.DataFrame]],
+           Optional[np.ndarray],
+           Optional[np.ndarray],
+           int]:
+    """Analyze labels to determine unique labels, label probabilities, and
+    number of outcomes.
+
+    Args:
+        labels (dict): Dict mapping slide names to labels.
+        onehot (bool, optional): Onehot encode outcomes. Defaults to False.
+
+    Returns:
+        labels (dict): Dict mapping slide names to labels.
+        unique_labels (np.ndarray): Unique labels.
+        label_prob (np.ndarray): Label probabilities.
+        num_outcomes (int): Number of outcomes.
+
+    """
+    # Weakly supervised labels from slides.
+    if labels is not None and not isinstance(labels, (str, pd.DataFrame)):
+        if onehot:
+            _all_labels_raw = np.array(list(labels.values()))
+            _unique_raw = np.unique(_all_labels_raw)
+            max_label = np.max(_unique_raw)
+            labels = {
+                k: to_onehot(v, max_label+1)  # type: ignore
+                for k, v in labels.items()
+            }
+            num_outcomes = 1
+        else:
+            first_label = list(labels.values())[0]
+            if not isinstance(first_label, list):
+                num_outcomes = 1
+            else:
+                num_outcomes = len(first_label)
+
+        _all_labels = np.array(list(labels.values()))
+        unique_labels = np.unique(_all_labels, axis=0)
+        _lbls = np.array([
+            np.sum(_all_labels == i)
+            for i in unique_labels
+        ])
+        label_prob = _lbls / len(_all_labels)
+
+    # Strongly supervised tile labels from a dataframe.
+    elif isinstance(labels, (pd.DataFrame, str)):
+        if isinstance(labels, str):
+            df = pd.read_parquet(labels)
+        else:
+            df = labels
+        if 'label' not in df.columns:
+            raise ValueError('Could not find column "label" in the '
+                             f'tile labels dataframe at {labels}.')
+        labels = df
+        unique_labels = None
+        label_prob = None
+        num_outcomes = 1
+    else:
+        unique_labels = None
+        label_prob = None  # type: ignore
+        num_outcomes = 1
+    return labels, unique_labels, label_prob, num_outcomes
+
+# -------------------------------------------------------------------------
+
+def load_index(tfr):
+    if isinstance(tfr, bytes):
+        tfr = tfr.decode('utf-8')
+    try:
+        index = tfrecord2idx.load_index(tfr)
+    except OSError:
+        raise errors.TFRecordsError(
+            f"Could not find index path for TFRecord {tfr}"
+        )
+    return index
+
+
+
[docs]def read_and_return_record( + record: bytes, + parser: Callable, + assign_slide: Optional[str] = None +) -> Dict: + """Process raw TFRecord bytes into a format that can be written with + ``tf.io.TFRecordWriter``. + + Args: + record (bytes): Raw TFRecord bytes (unparsed) + parser (Callable): TFRecord parser, as returned by + :func:`sf.io.get_tfrecord_parser()` + assign_slide (str, optional): Slide name to override the record with. + Defaults to None. + + Returns: + Dictionary mapping record key to a tuple containing (bytes, dtype). + + """ + parsed = parser(record) + if assign_slide: + parsed['slide'] = assign_slide + parsed['slide'] = parsed['slide'].encode('utf-8') + return {k: (v, FEATURE_DESCRIPTION[k]) for k, v in parsed.items()}
+ + +
[docs]def serialized_record( + slide: bytes, + image_raw: bytes, + loc_x: int = 0, + loc_y: int = 0 +): + """Returns a serialized example for TFRecord storage, ready to be written + by a TFRecordWriter.""" + + example = { + 'image_raw': (image_raw, FEATURE_DESCRIPTION['image_raw']), + 'slide': (slide, FEATURE_DESCRIPTION['slide']), + 'loc_x': (loc_x, FEATURE_DESCRIPTION['loc_x']), + 'loc_y': (loc_y, FEATURE_DESCRIPTION['loc_y']), + } + return example
+ + +
[docs]def get_tfrecord_parser( + tfrecord_path: str, + features_to_return: Iterable[str] = None, + decode_images: bool = True, + standardize: bool = False, + normalizer: Optional["StainNormalizer"] = None, + augment: bool = False, + **kwargs +) -> Callable: + + """Gets tfrecord parser using dareblopy reader. Torch implementation; + different than sf.io.tensorflow + + Args: + tfrecord_path (str): Path to tfrecord to parse. + features_to_return (list or dict, optional): Designates format for how + features should be returned from parser. If a list of feature names + is provided, the parsing function will return tfrecord features as + a list in the order provided. If a dictionary of labels (keys) + mapping to feature names (values) is provided, features will be + returned from the parser as a dictionary matching the same format. + If None, will return all features as a list. + decode_images (bool, optional): Decode raw image strings into image + arrays. Defaults to True. + standardize (bool, optional): Standardize images into the range (0,1). + Defaults to False. + normalizer (:class:`slideflow.norm.StainNormalizer`): Stain normalizer + to use on images. Defaults to None. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + Note: this function does not support stain augmentation. + + Returns: + A tuple containing + + func: Parsing function + + dict: Detected feature description for the tfrecord + """ + + features, img_type = detect_tfrecord_format(tfrecord_path) + if features is None or img_type is None: + raise errors.TFRecordsError(f"Unable to read TFRecord {tfrecord_path}") + if features_to_return is None: + features_to_return = {k: k for k in features} + elif not all(f in features for f in features_to_return): + detected = ",".join(features) + _ftrs = list(features_to_return.keys()) # type: ignore + raise errors.TFRecordsError( + f'Not all features {",".join(_ftrs)} ' + f'were found in the tfrecord {detected}' + ) + + # Build the transformations / augmentations. + transform = compose_augmentations( + augment=augment, + standardize=standardize, + normalizer=normalizer, + whc=True + ) + parser = TFRecordParser( + features_to_return, + decode_images, + img_type, + transform + ) + return parser
+ +# ------------------------------------------------------------------------- + +class TFRecordParser: + + def __init__(self, features_to_return, decode_images, img_type, transform=None): + self.features_to_return = features_to_return + self.decode_images = decode_images + self.img_type = img_type + self.transform = transform + + def __call__(self, record): + """Each item in args is an array with one item, as the dareblopy reader + returns items in batches and we have set our batch_size = 1 for + interleaving. + """ + features = {} + if ('slide' in self.features_to_return): + slide = bytes(record['slide']).decode('utf-8') + features['slide'] = slide + if ('image_raw' in self.features_to_return): + img = bytes(record['image_raw']) + if self.decode_images: + features['image_raw'] = decode_image( + img, + img_type=self.img_type, + transform=self.transform + ) + else: + features['image_raw'] = img + if ('loc_x' in self.features_to_return): + features['loc_x'] = record['loc_x'][0] + if ('loc_y' in self.features_to_return): + features['loc_y'] = record['loc_y'][0] + if type(self.features_to_return) == dict: + return { + label: features[f] + for label, f in self.features_to_return.items() + } + else: + return [features[f] for f in self.features_to_return] +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/torch/indexed/index.html b/docs/_modules/slideflow/io/torch/indexed/index.html new file mode 100644 index 000000000..b1d6c9958 --- /dev/null +++ b/docs/_modules/slideflow/io/torch/indexed/index.html @@ -0,0 +1,727 @@ + + + + + + + + + + + + slideflow.io.torch.indexed — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io.torch.indexed

+"""Indexable, map-style multi-TFRecord dataset & weighted sampler."""
+
+import slideflow as sf
+import multiprocessing as mp
+import numpy as np
+import pandas as pd
+import torch
+
+from typing import Any, Callable, Dict, List, Optional, Union, TYPE_CHECKING
+from slideflow.io import detect_tfrecord_format
+from slideflow.tfrecord.torch.dataset import IndexedMultiTFRecordDataset
+from slideflow.util import Labels, detuple, log
+
+from .img_utils import decode_image, whc_to_cwh
+from .data_utils import process_labels, get_tfrecord_parser, load_index
+from .augment import compose_augmentations
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+
+# -----------------------------------------------------------------------------
+
+class WeightedInfiniteSampler(torch.utils.data.Sampler):
+    """Sample from a dataset with weighted TFRecord probabilities.
+
+    Args:
+        dataset (torch.utils.data.Dataset): Dataset to sample from.
+        weights (list(float)): TFRecord weights for each sample in the dataset.
+            If None, will sample from all TFRecords with equal probability.
+            Defaults to None.
+
+    """
+    def __init__(self, dataset, weights=None):
+        self.dataset = dataset
+        if weights is None:
+            weights = [0.5 for _ in range(len(dataset.tfrecords))]
+        self.weights = weights / np.sum(weights)
+        self.num_tfrecords = len(weights)
+
+    def __iter__(self):
+        while True:
+            # Choose a random TFRecord.
+            tfr_idx = np.random.choice(self.num_tfrecords, p=self.weights)
+            # Find matching tiles in the sampled tfrecord
+            all_tile_idx = (self.dataset.interleave_index[:, 0] == tfr_idx)
+            if not len(all_tile_idx):
+                # TFRecord is empty.
+                continue
+            # Return a random tile from the tfrecord
+            yield np.random.choice(np.where(all_tile_idx)[0])
+
+    def __len__(self):
+        return self.num_samples
+
+
+
[docs]class IndexedInterleaver(IndexedMultiTFRecordDataset): + + def __init__( + self, + tfrecords: List[str], + *, + labels: Optional[Labels] = None, + incl_slidenames: bool = False, + incl_loc: bool = False, + rank: int = 0, + num_replicas: int = 1, + augment: Union[bool, str] = False, + standardize: bool = True, + normalizer: Optional["StainNormalizer"] = None, + clip: Optional[Dict[str, int]] = None, + use_labels: bool = True, + onehot: bool = False, + indices: Optional[List[np.ndarray]] = None, + transform: Optional[Any] = None, + tfrecord_parser: Optional[Callable] = None, + **kwargs + ): + """Interleave TFRecords with an indexable ``torch.utils.data.Dataset``. + + Provides an alternative TFRecord IO pipeline to ``InterleaveIterator``, + which only supports Iterable-style datasets. This class supports + both Iterable and Indexable datasets. + + Differences from ``InterleaveIterator``: + - Supports direct indexing. + - No "infinite" argument. Looping is handled by the dataloader. + - No "prob_weights" argument. Sampling is handled by the dataloader. + - Does not support dynamic reading from WSI ("from_wsi", "tile_um", "rois", "roi_method", and "pool" arguments). + + Args: + tfrecords (list(str)): Path to tfrecord files to interleave. + + Keyword Args: + labels (dict, optional): Dict mapping slide names to labels. + Defaults to None. + incl_slidenames (bool, optional): Include slide names when iterated + (returns image, label, slide). Defaults to False. + incl_loc (bool, optional): Include location info (tile center + coordinates). Returns samples in the form ``(returns ..., loc_x, + loc_y)``. Defaults to False. + rank (int, optional): Which GPU replica this dataset is used for. + Assists with synchronization across GPUs. Defaults to 0. + num_replicas (int, optional): Total number of GPU replicas. + Defaults to 1. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + standardize (bool, optional): Standardize images to mean 0 and + variance of 1. Defaults to True. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer. Defaults to None. + clip (list(int), optional): Array of maximum tiles to take for each + tfrecord. Defaults to None. + use_labels (bool, optional): Enable use of labels (disabled for + non-conditional GANs). Defaults to True. + onehot (bool, optional): Onehot encode outcomes. Defaults to False. + indices (numpy.ndarray, optional): Indices in form of array, + with np.loadtxt(index_path, dtype=np.int64) for each tfrecord. + Defaults to None. + transform (Callable, optional): Arbitrary torchvision transform + function. Performs transformation after augmentations but + before standardization. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + compression_type (str, optional): Compression type for TFRecords. + Either 'gzip' or None. Defaults to None. + shuffle (bool): Shuffle records within TFRecord files during + reading. Defaults to False. + seed (int, optional): Seed for random TFRecord interleaving and + intra-tfrecord shuffling. Defaults to None. + + """ + self.readers = [] + self.tfrecords = np.array(tfrecords).astype(np.string_) + if not len(self.tfrecords): + raise ValueError("No tfrecords provided.") + self.indices = self._load_indices(indices) + self.incl_slidenames = incl_slidenames + self.incl_loc = incl_loc + self.use_labels = use_labels + self.onehot = onehot + self.rank = rank + self.num_replicas = num_replicas + self.parser = self.build_parser(tfrecord_parser) + self.img_format = detect_tfrecord_format(self.tfrecords[0])[1] + (self.labels, + self.unique_labels, + self.label_prob, + self.num_outcomes) = process_labels(labels, onehot=onehot) + if isinstance(self.labels, pd.DataFrame): + self._prepare_tfrecord_subsample() + + # Automatically set shard to rank/num_replicas + if self.rank == 0: + log.info( + f'Interleaving {len(self.tfrecords)} tfrecords: ' + f'num_replicas={self.num_replicas}' + ) + + # Clip tfrecords. + if clip: + _clip = [clip[(t if isinstance(t, str) else t.decode('utf-8'))] for t in self.tfrecords] + else: + _clip = None + + # Set up image transformations. + self._image_transform = compose_augmentations( + augment=augment, + standardize=standardize, + normalizer=normalizer, + transform=transform + ) + + # Prepare TFRecord interleaver. + super().__init__( + tfrecords, + indices=self.indices, + shard=(self.rank, self.num_replicas), + clip=_clip, + transform=self.parser, + **kwargs + ) + + def __repr__(self) -> str: + n_records = self.tfrecords.shape[0] + msg = f"<IndexedInterleaver object (num_records={n_records}, num_tiles" + msg += f"={self.num_tiles}, rank=({self.rank} of {self.num_replicas}))" + return msg + + def _load_indices(self, indices: Optional[List[np.ndarray]] = None) -> List[np.ndarray]: + """Load TFRecord index files.""" + if indices is None: + indices = [] + with mp.dummy.Pool(16) as pool: + log.debug("Loading indices...") + for index in pool.imap(load_index, self.tfrecords): + indices += [index] + return indices + else: + log.debug("Using provided indices.") + return indices + + def _label_parser( + self, + slide: str, + loc_x: Optional[int] = None, + loc_y: Optional[int] = None + ): + """Parse labels and location information from a record.""" + + # Label. + if self.labels is not None and self.num_outcomes > 1: + label = self.labels[slide] + label = { + f'out-{i}': torch.tensor(l) + for i, l in enumerate(label) # type: ignore + } + elif isinstance(self.labels, pd.DataFrame): + label = self.labels.loc[f'{slide}-{loc_x}-{loc_y}'].label + label = torch.tensor(label) + elif self.labels is not None: + label = self.labels[slide] + label = torch.tensor(label) + else: + label = torch.tensor(0) + + # Slide/location information. + if self.incl_slidenames and self.incl_loc: + return label, slide, loc_x, loc_y + elif self.incl_slidenames: + return label, slide + elif self.incl_loc: + return label, loc_x, loc_y + else: + return label, + + def _prepare_tfrecord_subsample(self): + """Prepare custom TFRecord indices to only read tiles in the labels dataframe.""" + + # Prepare TFRecord subsample if there are fewer tiles in the + # tiles dataframe than there are in the TFRecords + + self.indices = [] + if self.rank == 0: + log.debug("Subsampling TFRecords using tile-level labels...") + + n_tiles = 0 + orig_n_tiles = 0 + with mp.dummy.Pool(16) as pool: + + # Load the original (full) indices + for index, tfr in zip(pool.imap(load_index, self.tfrecords), self.tfrecords): + orig_n_tiles += len(index) + tfr = tfr.decode('utf-8') + slide = sf.util.path_to_name(tfr) + loc = sf.io.get_locations_from_tfrecord(tfr) + + # Check which TFRecord indices are in the labels dataframe + in_df = np.array([f'{slide}-{x}-{y}' in self.labels.index for (x,y) in loc]) + + # Subsample indices based on what is in the labels dataframe + ss_index = index[in_df] + n_tiles += len(ss_index) + + self.indices += [ss_index] + + if not n_tiles: + raise ValueError("No tiles found in TFRecords matching the " + "labels dataframe.") + + if self.rank == 0: + diff = orig_n_tiles - n_tiles + log.debug( + "TFRecord subsampling complete (kept: {}, removed: {}).".format( + n_tiles, diff + )) + if len(self.labels) - n_tiles: + log.debug( + "{} labels in the dataframe have no corresponding tile.".format( + len(self.labels) - n_tiles + ) + ) + if diff: + log.warning(f"Labels not found for {diff} tiles. These " + "tiles will be skipped.") + + + def build_parser( + self, + tfrecord_parser: Optional[Callable] = None + ) -> Callable: + """Build a parser function for TFRecords. + + The parser will be responsible for processing images and labels. + + """ + ftrs = ['image_raw', 'slide', 'loc_x', 'loc_y'] + base_parser = tfrecord_parser or get_tfrecord_parser(self.tfrecords[0], + ftrs, + decode_images=False) + + def parser(*args): + """Parse an image and slide/location information.""" + img, *out = base_parser(*args) + img = decode_image(img, img_type=self.img_format) + img = whc_to_cwh(img) + img = self._image_transform(img) + out = self._label_parser(*out) + return detuple(img, out) + + return parser
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/io/torch/iterable/index.html b/docs/_modules/slideflow/io/torch/iterable/index.html new file mode 100644 index 000000000..9f71ac6ed --- /dev/null +++ b/docs/_modules/slideflow/io/torch/iterable/index.html @@ -0,0 +1,1242 @@ + + + + + + + + + + + + slideflow.io.torch.iterable — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.io.torch.iterable

+"""Iterable-style TFRecord interleavers for PyTorch.
+
+Includes support for streaming data from whole-slide images, as well as StyleGAN2
+compatibility.
+
+"""
+
+import multiprocessing as mp
+import random
+import threading
+import numpy as np
+import pandas as pd
+import torchvision
+import torch
+from queue import Queue
+from typing import (TYPE_CHECKING, Any, Callable, Dict, Iterable, List,
+                    Optional, Tuple, Union)
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.io import detect_tfrecord_format
+from slideflow.tfrecord.torch.dataset import MultiTFRecordDataset
+from slideflow.tfrecord.iterator_utils import RandomSampler
+from slideflow.util import Labels, log
+
+from .img_utils import whc_to_cwh
+from .img_utils import decode_image
+from .data_utils import process_labels, load_index, get_tfrecord_parser
+
+from .augment import compose_augmentations
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+
+# -----------------------------------------------------------------------------
+
+
[docs]class InterleaveIterator(torch.utils.data.IterableDataset): + """Pytorch Iterable Dataset that interleaves tfrecords with the + interleave() function below. Serves as a bridge between the python + generator returned by interleave() and the pytorch DataLoader class. + """ + + def __init__( + self, + tfrecords: List[str], + *, + img_size: Optional[int] = None, + labels: Optional[Labels] = None, + incl_slidenames: bool = False, + incl_loc: bool = False, + rank: int = 0, + num_replicas: int = 1, + augment: Union[str, bool] = False, + standardize: bool = True, + num_tiles: Optional[int] = None, + infinite: bool = True, + prob_weights: Optional[Dict[str, float]] = None, + normalizer: Optional["StainNormalizer"] = None, + clip: Optional[List[int]] = None, + chunk_size: int = 1, + use_labels: bool = True, + model_type: str = 'classification', + onehot: bool = False, + indices: Optional[np.ndarray] = None, + from_wsi: bool = False, + tile_um: Optional[int] = None, + rois: Optional[List[str]] = None, + roi_method: str = 'auto', + pool: Optional[Any] = None, + transform: Optional[Any] = None, + **interleave_kwargs + ) -> None: + """Pytorch IterableDataset that interleaves tfrecords with + :func:`slideflow.io.torch.interleave`. + + Args: + tfrecords (list(str)): Path to tfrecord files to interleave. + + Keyword Args: + img_size (int): Image width in pixels. + labels (dict, optional): Dict mapping slide names to labels. + Defaults to None. + incl_slidenames (bool, optional): Include slide names when iterated + (returns image, label, slide). Defaults to False. + incl_loc (bool, optional): Include location info (tile center + coordinates). Returns samples in the form ``(returns ..., loc_x, + loc_y)``. Defaults to False. + rank (int, optional): Which GPU replica this dataset is used for. + Assists with synchronization across GPUs. Defaults to 0. + num_replicas (int, optional): Total number of GPU replicas. + Defaults to 1. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + standardize (bool, optional): Standardize images to mean 0 and + variance of 1. Defaults to True. + num_tiles (int, optional): Dict mapping tfrecord names to number + of total tiles. Defaults to None. + infinite (bool, optional): Inifitely loop through dataset. + Defaults to True. + prob_weights (list(float), optional): Probability weights for + interleaving tfrecords. Defaults to None. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer. Defaults to None. + clip (list(int), optional): Array of maximum tiles to take for each + tfrecord. Defaults to None. + chunk_size (int, optional): Chunk size for image decoding. + Defaults to 1. + use_labels (bool, optional): Enable use of labels (disabled for + non-conditional GANs). Defaults to True. + model_type (str, optional): Used to generate random labels + (for StyleGAN2). Not required. Defaults to 'classification'. + onehot (bool, optional): Onehot encode outcomes. Defaults to False. + indices (numpy.ndarray, optional): Indices in form of array, + with np.loadtxt(index_path, dtype=np.int64) for each tfrecord. + Defaults to None. + max_size (bool, optional): Unused argument present for legacy + compatibility; will be removed. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + tile_um (int, optional): Size of tiles to extract from WSI, in + microns. Only used if from_wsi=True. Defaults to None. + rois (list(str), optional): List of ROI paths. Only used if + from_wsi=True. Defaults to None. + roi_method (str, optional): Method for extracting ROIs. Only used if + from_wsi=True. Defaults to 'auto'. + pool (multiprocessing.Pool): Shared multiprocessing pool. Useful + if ``from_wsi=True``, for sharing a unified processing pool between + dataloaders. Defaults to None. + transform (Callable, optional): Arbitrary torchvision transform + function. Performs transformation after augmentations but + before standardization. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + """ + if normalizer is not None and not isinstance(normalizer, sf.norm.StainNormalizer): + raise ValueError( + f"Expected normalizer to be type StainNormalizer, got: {type(normalizer)}" + ) + self.tfrecords = np.array(tfrecords).astype(np.string_) + self.prob_weights = None if prob_weights is None else np.array(prob_weights) + self.clip = clip + self.indices = indices + self.img_size = img_size + self.rank = rank + self.num_replicas = num_replicas + self.augment = augment + self.standardize = standardize + self.infinite = infinite + self.use_labels = use_labels + self.chunk_size = chunk_size + self.normalizer = normalizer + self.onehot = onehot + self.incl_slidenames = incl_slidenames + self.incl_loc = incl_loc + self.num_tiles = num_tiles + self.model_type = model_type + self.from_wsi = from_wsi + self.tile_um = tile_um + self.rois = rois + self.roi_method = roi_method + self.pool = pool + self.transform = transform + self.interleave_kwargs = interleave_kwargs + self._label_shape = None + (self.labels, + self.unique_labels, + self.label_prob, + self.num_outcomes) = process_labels(labels, onehot=onehot) + if isinstance(self.labels, pd.DataFrame): + self._prepare_tfrecord_subsample() + + @property + def name(self) -> str: + return 'slideflow-interleave-iterator' + + def __len__(self) -> Optional[int]: + return self.num_tiles + + def _parser( + self, + image: torch.Tensor, + slide: str, + loc_x: Optional[int] = None, + loc_y: Optional[int] = None + ) -> List[torch.Tensor]: + """Parse a standardize PyTorch image (WHC) and slide/location + information, to a CWH image formatted for model input.""" + + if self.labels is not None and not isinstance(self.labels, pd.DataFrame): + label = self.labels[slide] + elif self.labels is not None: + label = self.labels.loc[f'{slide}-{loc_x}-{loc_y}'].label + else: + label = 0 + + image = whc_to_cwh(image) + to_return = [image] # type: List[Any] + + # Support for multiple outcome labels + if self.num_outcomes > 1: + to_return += [{ + f'out-{i}': torch.tensor(l) + for i, l in enumerate(label) # type: ignore + }] + else: + to_return += [torch.tensor(label)] + + if self.incl_slidenames: + to_return += [slide] + if self.incl_loc: + to_return += [loc_x, loc_y] + return to_return + + def __repr__(self) -> str: + n_records = self.tfrecords.shape[0] + msg = f"<InterleaveIterator object (num_records={n_records}, num_tiles" + msg += f"={self.num_tiles}, infinite={self.infinite}, rank=(" + msg += f"{self.rank} of {self.num_replicas}), augment={self.augment}, " + msg += f"standardize={self.standardize})>" + return msg + + def __del__(self): + self.close() + + def __iter__(self): + worker_info = torch.utils.data.get_worker_info() + worker_id = 0 if not worker_info else worker_info.id + num_workers = 1 if not worker_info else worker_info.num_workers + + queue_retriever = interleave( + self.tfrecords, + incl_loc=True, # Read from TFRecord. Handle with ._parser() + standardize=self.standardize, + augment=self.augment, + prob_weights=self.prob_weights, + clip=self.clip, + infinite=self.infinite, + normalizer=self.normalizer, + num_replicas=self.num_replicas * num_workers, + rank=self.rank + worker_id, + chunk_size=self.chunk_size, + indices=self.indices, + tile_px=self.img_size, + from_wsi=self.from_wsi, + tile_um=self.tile_um, + rois=self.rois, + roi_method=self.roi_method, + pool=self.pool, + transform=self.transform, + **self.interleave_kwargs + ) + self.close = queue_retriever.close + try: + for record in queue_retriever: + yield self._parser(*record) + # Closes open files if iterator terminated early + except GeneratorExit as e: + log.debug("Generator exit triggered") + queue_retriever.close() + del(queue_retriever) + raise e + + def _prepare_tfrecord_subsample(self): + """Prepare custom TFRecord indices to only read tiles in the labels dataframe.""" + + # Prepare TFRecord subsample if there are fewer tiles in the + # tiles dataframe than there are in the TFRecords + if (self.num_tiles != len(self.labels)): + + self.indices = [] + if self.rank == 0: + log.debug("Subsampling TFRecords using tile-level labels...") + + n_tiles = 0 + with mp.dummy.Pool(16) as pool: + + # Load the original (full) indices + for index, tfr in zip(pool.imap(load_index, self.tfrecords), self.tfrecords): + tfr = tfr.decode('utf-8') + slide = sf.util.path_to_name(tfr) + loc = sf.io.get_locations_from_tfrecord(tfr) + + # Check which TFRecord indices are in the labels dataframe + in_df = np.array([f'{slide}-{x}-{y}' in self.labels.index for (x,y) in loc]) + + # Subsample indices based on what is in the labels dataframe + ss_index = index[in_df] + n_tiles += len(ss_index) + + self.indices += [ss_index] + + if not n_tiles: + raise ValueError("No tiles found in TFRecords matching the " + "labels dataframe.") + + if self.rank == 0: + diff = self.num_tiles - n_tiles + log.debug( + "TFRecord subsampling complete (kept: {}, removed: {}).".format( + n_tiles, diff + )) + if len(self.labels) - n_tiles: + log.debug( + "{} labels in the dataframe have no corresponding tile.".format( + len(self.labels) - n_tiles + ) + ) + if diff: + log.warning(f"Labels not found for {diff} tiles. These " + "tiles will be skipped.") + self.num_tiles = n_tiles + + def close(self) -> None: + pass
+ + +class StyleGAN2Interleaver(InterleaveIterator): + """Iterator to enable compatibility with StyleGAN2.""" + + def __init__( + self, + resolution=None, # Ignored argument, for StyleGAN2/3 compatibility. + xflip=None, # Ignored argument, for StyleGAN2/3 compatibility. + normalizer=None, + normalizer_source=None, + crop=None, + resize=None, + max_size=None, #ignore argument, for StyleGAN2/3 compatibility. + **kwargs + ): + super().__init__(**kwargs) + + # Assemble crop/resize transformations. + transforms = [] + if crop is not None: + transforms.append(torchvision.transforms.RandomCrop(crop)) + if resize is not None: + transforms.append(torchvision.transforms.Resize(resize)) + if len(transforms): + self.transform = torchvision.transforms.Compose(transforms) + + # Update the final image size. + if resize is not None: + self.img_size = resize + elif crop is not None: + self.img_size = crop + if self.img_size is None: + raise ValueError("Must specify either crop, resize, or img_size.") + + if normalizer: + self.normalizer = sf.norm.autoselect( + normalizer, + source=normalizer_source, + device='cpu', + backend='torch' + ) + + @property + def resolution(self) -> int: + """For use with StyleGAN2""" + return self.img_size + + @property + def image_shape(self) -> Tuple[int, int, int]: + """For use with StyleGAN2""" + return (3, self.resolution, self.resolution) + + @property + def num_channels(self) -> int: + """For use with StyleGAN2""" + assert len(self.image_shape) == 3 # CHW + return self.image_shape[0] + + @property + def label_shape(self) -> Union[int, Tuple[int, ...]]: + """For use with StyleGAN2""" + if self.use_labels and self.unique_labels is not None: + return self.unique_labels[0].shape + elif self._label_shape is not None: + return self._label_shape + else: + return 0 + + @property + def label_dim(self) -> int: + """For use with StyleGAN2""" + if self.use_labels: + assert len(self.label_shape) == 1 # type: ignore + return self.label_shape[0] # type: ignore + else: + return 0 + + @property + def has_labels(self) -> bool: + """For use with StyleGAN2""" + return (self.use_labels + and any(x != 0 for x in self.label_shape)) # type: ignore + + def get_label(self, idx: Any) -> Any: + """Returns a random label. Used for compatibility with StyleGAN2.""" + if self.use_labels and self.model_type == 'classification': + return random.choices( + self.unique_labels, + weights=self.label_prob, # type: ignore + k=1 + )[0] + elif self.use_labels: + return [np.random.rand()] + else: + return np.zeros((1,)) + + +class TileLabelInterleaver(StyleGAN2Interleaver): + """Pytorch Iterable Dataset that interleaves tfrecords with the + as the `InterleaveIterator`, but applies tile-specific labels. + + Labels should be onehot encoded. + + """ + def __init__( + self, + tile_labels: str, + resolution: Any = None, # Ignored, for StyleGAN2/3 compatibility. + xflip: Any = None, # Ignored, for StyleGAN2/3 compatibility. + labels: Any = None, # Ignored, for StyleGAN2/3 compatibility. + **kwargs: Any, + ) -> None: + """Initializes an InterleaveIterator modified to use tile-level labels. + + Args: + tile_labels (str): Location of parquet-format pandas DataFrame + containing tile-level labels. Labels are indexed by the slide + name and X/Y location, with the format {slide}-{loc_x}-{loc_y}. + Labels are determined by the `label` columns. Labels should + be onehot encoded. + """ + super().__init__(labels=tile_labels, **kwargs) + self._process_labels_df() + if not isinstance(self.labels, pd.DataFrame): + raise ValueError("Labels must be a pandas DataFrame.") + + def _process_labels_df(self) -> None: + assert isinstance(self.labels, pd.DataFrame) + first_row = next(self.labels.itertuples()) + self._label_shape = first_row.label.shape + if self.rank == 0 and (self.num_tiles != len(self.labels)): + log.warning(f"Number of tiles ({self.num_tiles}) does not equal the " + f"number of labels ({len(self.labels)}). ") + + def get_label(self, idx: Any) -> Any: + """Returns a random label. Used for compatibility with StyleGAN2.""" + idx = np.random.randint(len(self.labels)) + return self.labels.iloc[idx].label + +# ----------------------------------------------------------------------------- + +def _apply_otsu(wsi): + wsi.qc('otsu') + return wsi + + +def multi_slide_loader( + slides: List["sf.WSI"], + weights: Optional[Union[List[float], np.ndarray]] = None, + shard: Optional[Tuple[int, int]] = None, + infinite: bool = True, + **kwargs +) -> Iterable[Union[Dict[str, np.ndarray], + Tuple[Dict[str, np.ndarray], + Dict[str, List[np.ndarray]]]]]: + """Create an iterator by reading and merging multiple slide dataloaders. + + Args: + slides (list of str): List of slide paths. + weights (list of float): Weights for sampling from each slide. + If not provided, will perform uniform sampling. + shard (tuple(int, int), optional): If provided, will only extract + tiles from the shard with index `shard[0]` out of `shard[1]` + shards. Defaults to None. + infinite (bool, optional): Whether the returned iterator should be + infinite or not. Defaults to True. + + Returns: + + it (iterator): A repeating iterator that generates batches of data, + interleaving from the provided slides. + + """ + if weights is not None: + weights_list = weights + else: + weights_list = np.array( # type: ignore + [0.5 for _ in range(len(slides))] + ) + loaders = [slide.torch(lazy_iter=True, + shard=shard, + infinite=infinite, + **kwargs) + for slide in slides] + return RandomSampler(loaders, weights_list, shard=None) + + +def interleave( + paths: List[str], + prob_weights: Optional[Dict[str, float]] = None, + incl_loc: bool = False, + clip: Optional[Dict[str, int]] = None, + infinite: bool = True, + augment: Union[bool, str] = False, + standardize: bool = True, + normalizer: Optional["StainNormalizer"] = None, + num_threads: int = 4, + chunk_size: int = 1, + num_replicas: int = 1, + rank: int = 0, + indices: Optional[List[str]] = None, + from_wsi: bool = False, + tile_px: Optional[int] = None, + tile_um: Optional[int] = None, + rois: Optional[List[str]] = None, + roi_method: str = 'auto', + pool: Optional[Any] = None, + transform: Optional[Any] = None, + tfrecord_parser: Optional[Callable] = None, +): + + """Returns a generator that interleaves records from a collection of + tfrecord files, sampling from tfrecord files randomly according to + balancing if provided (requires manifest). Assumes TFRecord files are + named by slide. + + Different than tensorflow backend implementation (sf.io.tensorflow). + Supports Pytorch. Use interleave_dataloader for the torch DataLoader class; + use this function directly to get images from a generator with no PyTorch + data processing. + + Args: + paths (list(str)): List of paths to TFRecord files or slides. + prob_weights (dict, optional): Dict mapping tfrecords to probability of + including in batch. Defaults to None. + incl_loc (bool, optional): Include location info (tile center + coordinates). Returns samples in the form ``(returns ..., loc_x, + loc_y)``. Defaults to False. + clip (dict, optional): Dict mapping tfrecords to number of tiles to + take per tfrecord. Defaults to None. + infinite (bool, optional): Create an finite dataset. WARNING: If + infinite is False && balancing is used, some tiles will be skipped. + Defaults to True. + augment (str or bool): Image augmentations to perform. Augmentations include: + + * ``'x'``: Random horizontal flip + * ``'y'``: Random vertical flip + * ``'r'``: Random 90-degree rotation + * ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + * ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) + * ``'n'``: Random :ref:`stain_augmentation` (requires stain normalizer) + + Combine letters to define augmentations, such as ``'xyrjn'``. + A value of True will use ``'xyrjb'``. + standardize (bool, optional): Standardize images to (0,1). + Defaults to True. + normalizer (:class:`slideflow.norm.StainNormalizer`, optional): + Normalizer to use on images. Defaults to None. + num_threads (int, optional): Number of threads to use decoding images. + Defaults to 4. + chunk_size (int, optional): Chunk size for image decoding. + Defaults to 8. + num_replicas (int, optional): Number of total workers reading the + dataset with this interleave function, defined as number of + gpus * number of torch DataLoader workers. Used to interleave + results among workers without duplications. Defaults to 1. + rank (int, optional): Worker ID to identify which worker this + represents. Used to interleave results among workers without + duplications. Defaults to 0 (first worker). + indices (list(str)): Paths to TFRecord index files. If not provided, + will generate. Defaults to None. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + tile_px (int, optional): Size of tiles to extract from WSI, in pixels. + Only used if from_wsi=True. Defaults to None. + tile_um (int, optional): Size of tiles to extract from WSI, in + microns. Only used if from_wsi=True. Defaults to None. + rois (list(str), optional): List of ROI paths. Only used if + from_wsi=True. Defaults to None. + roi_method (str, optional): Method for extracting ROIs. Only used if + from_wsi=True. Defaults to 'auto'. + pool (multiprocessing.Pool): Shared multiprocessing pool. Useful + if ``from_wsi=True``, for sharing a unified processing pool between + dataloaders. Defaults to None. + transform (Callable, optional): Arbitrary torchvision transform + function. Performs transformation after augmentations but + before standardization. Defaults to None. + tfrecord_parser (Callable, optional): Custom parser for TFRecords. + Defaults to None. + + """ + if not len(paths): + raise errors.TFRecordsNotFoundError + if rank == 0: + _path_type = "slides" if from_wsi else "tfrecords" + log.debug( + f'Interleaving {len(paths)} {_path_type}: ' + f'infinite={infinite}, num_replicas={num_replicas}' + ) + if from_wsi and (not tile_um or not tile_px): + raise ValueError("`tile_um` and `tile_px` required for interleave() " + "if `from_wsi=True`") + if prob_weights is not None: + assert len(prob_weights) == len(paths) + else: + prob_weights = None + should_close = False if pool is not None else True + + if incl_loc: + features_to_return = ['image_raw', 'slide', 'loc_x', 'loc_y'] + else: + features_to_return = ['image_raw', 'slide'] + + if from_wsi: + assert tile_um is not None and tile_px is not None + if rank == 0: + log.info(f"Reading {len(paths)} slides and thresholding...") + + # ---- Load slides and apply Otsu thresholding ------------------------ + if pool is None and sf.slide_backend() == 'cucim': + pool = mp.Pool( + sf.util.num_cpu(default=8), + initializer=sf.util.set_ignore_sigint + ) + elif pool is None: + pool = mp.dummy.Pool(sf.util.num_cpu(default=16)) + wsi_list = [] + to_remove = [] + otsu_list = [] + for path in paths: + if isinstance(path, bytes): + path= path.decode('utf-8') + try: + wsi = sf.WSI( + path, + tile_px, + tile_um, + rois=rois, + roi_method=roi_method, + verbose=False + ) + wsi_list += [wsi] + except errors.SlideLoadError as e: + log.error(f"Error reading slide {path}: {e}") + to_remove += [path] + for path in to_remove: + paths.remove(path) + for wsi in pool.imap(_apply_otsu, wsi_list): + otsu_list += [wsi] + + # ---- Prepare parsing ----------------------------------------------- + img_type = 'numpy' + + def base_parser(record): + if type(features_to_return) == dict: + return { + label: record[f] + for label, f in features_to_return.items() + } + else: + return [record[f] for f in features_to_return] + + # ---- Interleave from slides ----------------------------------------- + random_sampler = multi_slide_loader( + otsu_list, + pool=pool, + weights=prob_weights, + shard=(rank, num_replicas), + incl_slidenames=True, + incl_loc=incl_loc, + grayspace_fraction=1, + infinite=infinite + ) + sampler_iter = iter(random_sampler) + else: + # ---- Get the base TFRecord parser, based on the first tfrecord ------ + _, img_type = detect_tfrecord_format(paths[0]) + if tfrecord_parser is not None: + base_parser = tfrecord_parser + else: + base_parser = get_tfrecord_parser( + paths[0], + features_to_return, + decode_images=False + ) + # ---- Set up TFRecord indexes for sharding --------------------------- + # Index files not created in this interleave function, as there may be + # multiple instances of this function running across processes, + # & having each create indices would result in conflicts / corruption. + if indices is None: + indices = [] + if pool is None: + pool = mp.dummy.Pool(16) + log.debug("Loading indices...") + for index in pool.imap(load_index, paths): + indices += [index] + pool.close() + else: + log.debug("Using provided indices.") + + # ---- Interleave and batch datasets ---------------------------------- + random_sampler = MultiTFRecordDataset( + paths, + indices, + prob_weights, + shard=(rank, num_replicas), + clip=[clip[(t if isinstance(t, str) else t.decode('utf-8'))] for t in paths] if clip else None, + infinite=infinite + ) + sampler_iter = iter(random_sampler) + + # Compose augmentation transformations + transform_fn = compose_augmentations( + augment=augment, + standardize=standardize, + normalizer=normalizer, + transform=transform, + whc=True + ) + + # Worker to decode images and process records + def threading_worker(record): + record = base_parser(record) + record[0] = decode_image( + record[0], # Image is the first returned variable + img_type=img_type, + transform=transform_fn, + ) + return record + + # Randomly interleaves datasets according to weights, reading parsed + # records to a buffer and sending parsed results to a queue after + # reaching a set buffer size + class QueueRetriever: + def __init__(self, sampler, num_threads): + self.sampler = sampler + self.closed = False + self.raw_q = Queue(num_threads) + self.proc_q = Queue(num_threads) + self.n_threads = num_threads + self.n_closed = 0 + self.il_closed = False + self._close_complete = False + + def interleaver(): + msg = [] + while not self.closed: + try: + record = next(sampler_iter) + msg += [record] + if len(msg) < chunk_size: + continue + else: + self.raw_q.put(msg) + msg = [] + except (StopIteration): + break + except (ValueError, OSError): # Occurs when files closed + break + self.raw_q.put(msg) + for _ in range(self.n_threads): + self.raw_q.put(None) + self.il_closed = True + + # Reads a buffer batch of images/labels and processes images + def decoder(): + while True: + records = self.raw_q.get() + if records is None: + break + decoded = [threading_worker(record) for record in records] + self.proc_q.put(decoded) + self.proc_q.put(None) + + # Parallelize the tfrecord reading interleaver + # and the image processing decoder + self.il_thread = threading.Thread(target=interleaver) + self.il_thread.start() + self.proc_threads = [ + threading.Thread(target=decoder) + for _ in range(self.n_threads) + ] + for proc in self.proc_threads: + proc.start() + + def __iter__(self): + while True: + record = self.proc_q.get() + if record is None: + self.n_closed += 1 + if self.n_closed == self.n_threads: + break + else: + for item in record: + yield item + + def __del__(self): + self.close() + + def close(self): + if self._close_complete: + return + log.debug("Closing QueueRetriever") + self.closed = True + + # Clear out the queue + while self.n_closed < self.n_threads: + record = self.proc_q.get() + if record is None: + self.n_closed += 1 + + if from_wsi and should_close: + pool.close() + else: + self.sampler.close() + self._close_complete = True + + return QueueRetriever(random_sampler, num_threads) +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/mil/_params/index.html b/docs/_modules/slideflow/mil/_params/index.html new file mode 100644 index 000000000..f10f4146b --- /dev/null +++ b/docs/_modules/slideflow/mil/_params/index.html @@ -0,0 +1,1310 @@ + + + + + + + + + + + + slideflow.mil._params — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.mil._params

+"""Model and trainer configuration for MIL models."""
+
+import numpy as np
+import os
+import torch
+import slideflow as sf
+import pandas as pd
+from torch import nn
+from typing import Optional, Union, Callable, List, Tuple, Any, TYPE_CHECKING
+from slideflow import log, errors, Dataset
+
+from ._registry import get_trainer, build_model_config
+
+if TYPE_CHECKING:
+    from fastai.learner import Learner
+
+# -----------------------------------------------------------------------------
+
+
[docs]def mil_config(model: Union[str, Callable], trainer: str = 'fastai', **kwargs): + """Create a multiple-instance learning (MIL) training configuration. + + All models by default are trained with the FastAI trainer. Additional + trainers and additional models can be installed with ``slideflow-extras``. + + Args: + model (str, Callable): Either the name of a model, or a custom torch + module. Valid model names include ``"attention_mil"``, + ``"transmil"``, and ``"bistro.transformer"``. + trainer (str): Type of MIL trainer to use. Only 'fastai' is available, + unless additional trainers are installed. + **kwargs: All additional keyword arguments are passed to + :class:`slideflow.mil.TrainerConfig` + + """ + return get_trainer(trainer)(model=model, **kwargs)
+ +# ----------------------------------------------------------------------------- + +
[docs]class TrainerConfig: + + tag = 'fastai' + + def __init__( + self, + model: Union[str, Callable] = 'attention_mil', + *, + aggregation_level: str = 'slide', + lr: Optional[float] = None, + wd: float = 1e-5, + bag_size: int = 512, + max_val_bag_size: Optional[int] = None, + fit_one_cycle: bool = True, + epochs: int = 32, + batch_size: int = 64, + drop_last: bool = True, + save_monitor: str = 'valid_loss', + weighted_loss: bool = True, + **kwargs + ): + r"""Training configuration for FastAI MIL models. + + This configuration should not be created directly, but rather should + be created through :func:`slideflow.mil.mil_config`, which will create + and prepare an appropriate trainer configuration. + + Args: + model (str, Callable): Either the name of a model, or a custom torch + module. Valid model names include ``"attention_mil"``, + ``"transmil"``, and ``"bistro.transformer"``. + + Keyword args: + aggregation_level (str): When equal to ``'slide'`` each bag + contains tiles from a single slide. When equal to ``'patient'`` + tiles from all slides of a patient are grouped together. + lr (float, optional): Learning rate. If ``fit_one_cycle=True``, + this is the maximum learning rate. If None, uses the Leslie + Smith `LR Range test <https://arxiv.org/abs/1506.01186>`_ to + find an optimal learning rate. Defaults to None. + wd (float): Weight decay. Only used if ``fit_one_cycle=False``. + Defaults to 1e-5. + bag_size (int): Bag size. Defaults to 512. + max_val_bag_size (int, optional): Maximum validation bag size. If + None, all validation bags will be unclipped and unpadded (full size). + Defaults to None. + fit_one_cycle (bool): Use `1cycle <https://sgugger.github.io/the-1cycle-policy.html>`_ + learning rate schedule. Defaults to True. + epochs (int): Maximum number of epochs. Defaults to 32. + batch_size (int): Batch size. Defaults to 64. + **kwargs: All additional keyword arguments are passed to + :class:`slideflow.mil.MILModelConfig`. + + """ + self._aggregation_level = aggregation_level + self.lr = lr + self.wd = wd + self.bag_size = bag_size + self.max_val_bag_size = max_val_bag_size + self.fit_one_cycle = fit_one_cycle + self.epochs = epochs + self.batch_size = batch_size + self.drop_last = drop_last + self.save_monitor = save_monitor + self.weighted_loss = weighted_loss + if isinstance(model, str): + self.model_config = build_model_config(model, **kwargs) + else: + sf.log.info("Attempting to load custom model class for MIL training.") + from slideflow.mil import MILModelConfig + self.model_config = MILModelConfig(model, **kwargs) + self.model_config.verify_trainer(self) + + def __str__(self): + out = f"{self.__class__.__name__}(" + for p, val in self.to_dict().items(): + if p != 'model_config': + out += '\n {}={!r}'.format(p, val) + out += '\n)' + return out + + @property + def model_fn(self): + """MIL model architecture (class/module).""" + return self.model_config.model_fn + + @property + def loss_fn(self): + """MIL loss function.""" + return self.model_config.loss_fn + + @property + def is_multimodal(self): + """Whether the model is multimodal.""" + return self.model_config.is_multimodal + + @property + def model_type(self): + """Type of model (classification or regression).""" + return self.model_config.model_type + + @property + def aggregation_level(self): + """Aggregation level for MIL training.""" + if hasattr(self, '_aggregation_level'): + return self._aggregation_level + else: + return 'slide' + + @aggregation_level.setter + def aggregation_level(self, value): + if value not in ('slide', 'patient'): + raise ValueError("Aggregation level must be either 'slide' or 'patient'.") + self._aggregation_level = value + + def _verify_eval_params(self, **kwargs): + pass + + def is_classification(self): + """Whether the model is a classification model.""" + return self.model_config.is_classification() + + def get_metrics(self): + """Get model metrics. + + Returns: + List[Callable]: List of metrics to use for model evaluation. + Defaults to RocAuc for classification models, and mse and Pearson + correlation coefficient for regression models. + + """ + from fastai.vision.all import RocAuc, mse, PearsonCorrCoef + + model_metrics = self.model_config.get_metrics() + + if self.is_classification(): + fallback = [RocAuc()] + else: + fallback = [mse, PearsonCorrCoef()] + return model_metrics or fallback + + def prepare_training( + self, + outcomes: Union[str, List[str]], + exp_label: Optional[str], + outdir: Optional[str] + ) -> str: + """Prepare for training. + + Sets up the output directory for the model. + + Args: + outcomes (str, list(str)): Outcomes. + exp_label (str): Experiment label. + outdir (str): Output directory. + + Returns: + str: Output directory. + + """ + log.info("Training FastAI MIL model with config:") + log.info(f"{str(self)}") + # Set up experiment label + if exp_label is None: + try: + if isinstance(self.model_config.model, str): + model_name = self.model_config.model + else: + model_name = self.model_config.model.__name__ + exp_label = '{}-{}'.format( + model_name, + "-".join(outcomes if isinstance(outcomes, list) else [outcomes]) + ) + except Exception: + exp_label = 'no_label' + # Set up output model directory + if outdir: + if not os.path.exists(outdir): + os.makedirs(outdir) + outdir = sf.util.create_new_model_dir(outdir, exp_label) + return outdir + + def build_model(self, n_in: int, n_out: int, **kwargs): + """Build the model. + + Args: + n_in (int): Number of input features. + n_out (int): Number of output features. + + Keyword args: + **kwargs: Additional keyword arguments to pass to the model constructor. + + Returns: + torch.nn.Module: PyTorch model. + + """ + if self.model_config.model_kwargs: + model_kw = self.model_config.model_kwargs + else: + model_kw = dict() + return self.model_config.build_model(n_in, n_out, **model_kw, **kwargs) + + def to_dict(self): + """Converts this training configuration to a dictionary.""" + d = {k:v for k,v in vars(self).items() + if k not in ( + 'self', + 'model_fn', + 'loss_fn', + 'build_model', + 'is_multimodal' + ) and not k.startswith('_')} + if self.model_config is None: + return d + else: + d.update(self.model_config.to_dict()) + del d['model_config'] + return d + + def json_dump(self): + """Converts this training configuration to a JSON-compatible dict.""" + return dict( + trainer=self.tag, + params=self.to_dict() + ) + + def predict(self, model, bags, attention=False, **kwargs): + """Generate model prediction from bags. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + bags (torch.Tensor): Bags, with shape ``(n_bags, n_tiles, n_features)``. + + Keyword args: + attention (bool): Whether to return attention maps. + + Returns: + Tuple[np.ndarray, List[np.ndarray]]: Predictions and attention. + + """ + self._verify_eval_params(**kwargs) + return self.model_config.predict(model, bags, attention=attention, **kwargs) + + def batched_predict( + self, + model: "torch.nn.Module", + loaded_bags: torch.Tensor, + **kwargs + ) -> Tuple[np.ndarray, List[np.ndarray]]: + """Generate predictions from a batch of bags. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + loaded_bags (torch.Tensor): Loaded bags, with shape ``(n_bags, n_tiles, n_features)``. + + Keyword args: + device (torch.device, optional): Device on which to run the model. + If None, uses the default device. + forward_kwargs (dict, optional): Additional keyword arguments to + pass to the model's forward function. + attention (bool): Whether to return attention maps. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to 'avg'. + uq (bool): Whether to return uncertainty quantification. + + Returns: + Tuple[np.ndarray, List[np.ndarray]]: Predictions and attention. + + """ + return self.model_config.batched_predict(model, loaded_bags, **kwargs) + + def train( + self, + train_dataset: Dataset, + val_dataset: Optional[Dataset], + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + outdir: str = 'mil', + exp_label: Optional[str] = None, + **kwargs + ) -> "Learner": + """Train a multiple-instance learning (MIL) model. + + Args: + config (:class:`slideflow.mil.TrainerConfig`): + Trainer and model configuration. + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (str): Either a path to directory with \*.pt files, or a list + of paths to individual \*.pt files. Each file should contain + exported feature vectors, with each file containing all tile + features for one patient. + + Keyword args: + outdir (str): Directory in which to save model and results. + exp_label (str): Experiment label, used for naming the subdirectory + in the ``{project root}/mil`` folder, where training history + and the model will be saved. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Not available for multi-modal MIL models. Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + """ + from slideflow.mil.train import _train_mil, _train_multimodal_mil + + # Prepare output directory + outdir = self.prepare_training(outcomes, exp_label, outdir) + + # Use training data as validation if no validation set is provided + if val_dataset is None: + sf.log.info( + "Training without validation; metrics will be calculated on training data." + ) + val_dataset = train_dataset + + # Check if multimodal training + if self.is_multimodal: + train_fn = _train_multimodal_mil + else: + train_fn = _train_mil + + # Execute training + return train_fn( + self, + train_dataset, + val_dataset, + outcomes, + bags, + outdir=outdir, + **kwargs + ) + + def eval( + self, + model: torch.nn.Module, + dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + outdir: str = 'mil', + attention_heatmaps: bool = False, + uq: bool = False, + aggregation_level: Optional[str] = None, + params: Optional[dict] = None, + **heatmap_kwargs + ) -> pd.DataFrame: + """Evaluate a multiple-instance learning model. + + Saves results for the evaluation in the target folder, including + predictions (parquet format), attention (Numpy format for each slide), + and attention heatmaps (if ``attention_heatmaps=True``). + + Logs classifier metrics (AUROC and AP) to the console. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + dataset (sf.Dataset): Dataset to evaluation. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + + Keyword arguments: + outdir (str): Path at which to save results. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Not available for multi-modal MIL models. Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of predictions. + """ + from slideflow.mil.eval import run_eval + + params_to_verify = dict( + attention_heatmaps=attention_heatmaps, + heatmap_kwargs=heatmap_kwargs, + uq=uq, + aggregation_level=aggregation_level + ) + + self._verify_eval_params(**params_to_verify) + self.model_config._verify_eval_params(**params_to_verify) + + eval_kwargs = dict( + dataset=dataset, + outcomes=outcomes, + bags=bags, + config=self, + outdir=outdir, + params=params, + aggregation_level=(aggregation_level or self.aggregation_level) + ) + + return run_eval( + model, + attention_heatmaps=attention_heatmaps, + uq=uq, + **heatmap_kwargs, + **eval_kwargs + ) + + def _build_dataloader( + self, + bags, + targets, + encoder, + dataset_kwargs, + dataloader_kwargs, + ) -> "torch.utils.DataLoader": + + return self.model_config._build_dataloader( + bags, + targets, + encoder, + dataset_kwargs=dataset_kwargs, + dataloader_kwargs=dataloader_kwargs + ) + + def build_train_dataloader( + self, + bags, + targets, + encoder, + *, + dataset_kwargs = None, + dataloader_kwargs = None + ) -> "torch.utils.DataLoader": + """Build a training dataloader. + + Args: + bags (list): List of bags. + targets (list): List of targets. + encoder (torch.nn.Module): Encoder for bags. + + Keyword args: + dataset_kwargs (dict): Keyword arguments for the dataset. + dataloader_kwargs (dict): Keyword arguments for the dataloader. + + Returns: + torch.utils.DataLoader: Training dataloader. + + """ + dataset_kwargs = dataset_kwargs or dict() + dataloader_kwargs = dataloader_kwargs or dict() + + # Dataset kwargs + if 'bag_size' not in dataset_kwargs: + dataset_kwargs['bag_size'] = self.bag_size + + # Dataloader kwargs + if 'drop_last' not in dataloader_kwargs: + dataloader_kwargs['drop_last'] = self.drop_last + if 'batch_size' not in dataloader_kwargs: + dataloader_kwargs['batch_size'] = self.batch_size + if 'shuffle' not in dataloader_kwargs: + dataloader_kwargs['shuffle'] = True + + return self._build_dataloader( + bags, + targets, + encoder, + dataset_kwargs=dataset_kwargs, + dataloader_kwargs=dataloader_kwargs + ) + + def build_val_dataloader( + self, + bags, + targets, + encoder, + *, + dataset_kwargs = None, + dataloader_kwargs = None + ) -> "torch.utils.DataLoader": + """Build a validation dataloader. + + Args: + bags (list): List of bags. + targets (list): List of targets. + encoder (torch.nn.Module): Encoder for bags. + + Keyword args: + dataset_kwargs (dict): Keyword arguments for the dataset. + dataloader_kwargs (dict): Keyword arguments for the dataloader. + + Returns: + torch.utils.DataLoader: Validation dataloader. + + """ + dataset_kwargs = dataset_kwargs or dict() + dataloader_kwargs = dataloader_kwargs or dict() + + # Dataset kwargs + if 'bag_size' not in dataset_kwargs: + dataset_kwargs['bag_size'] = None + if 'max_bag_size' not in dataset_kwargs: + dataset_kwargs['max_bag_size'] = self.max_val_bag_size + + # Dataloader kwargs + if 'batch_size' not in dataloader_kwargs: + dataloader_kwargs['batch_size'] = 1 + + return self._build_dataloader( + bags, + targets, + encoder, + dataset_kwargs=dataset_kwargs, + dataloader_kwargs=dataloader_kwargs + ) + + def inspect_batch(self, batch) -> Tuple[int, int]: + """Inspect a batch of data. + + Args: + batch: One batch of data. + + Returns: + Tuple[int, int]: Number of input and output features. + + """ + return self.model_config.inspect_batch(batch) + + def run_metrics(self, df, level='slide', outdir=None): + """Run metrics and save plots to disk. + + Args: + df (pd.DataFrame): Dataframe with predictions and outcomes. + level (str): Level at which to calculate metrics. Either 'slide' or 'patient'. + outdir (str): Output directory for saving metrics. + + """ + self.model_config.run_metrics(df, level=level, outdir=outdir)
+ + +# ----------------------------------------------------------------------------- + +
[docs]class MILModelConfig: + + losses = { + 'cross_entropy': nn.CrossEntropyLoss, + 'mse': nn.MSELoss, + 'mae': nn.L1Loss, + 'huber': nn.SmoothL1Loss + } + + def __init__( + self, + model: Union[str, Callable] = 'attention_mil', + *, + use_lens: Optional[bool] = None, + apply_softmax: bool = True, + model_kwargs: Optional[dict] = None, + validate: bool = True, + loss: Union[str, Callable] = 'cross_entropy', + **kwargs + ) -> None: + """Model configuration for an MIL model. + + Args: + model (str, Callable): Either the name of a model, or a custom torch + module. Valid model names include ``"attention_mil"`` and + ``"transmil"``. Defaults to 'attention_mil'. + + Keyword args: + use_lens (bool, optional): Whether the model expects a second + argument to its ``.forward()`` function, an array with the + bag size for each slide. If None, will default to True for + ``'attention_mil'`` models and False otherwise. + Defaults to None. + apply_softmax (bool): Whether to apply softmax to model outputs. + Defaults to True. Ignored if the model is not a classification + model. + model_kwargs (dict, optional): Additional keyword arguments to pass + to the model constructor. Defaults to None. + validate (bool): Whether to validate the keyword arguments. If True, + will raise an error if any unrecognized keyword arguments are + passed. Defaults to True. + loss (str, Callable): Loss function. Defaults to 'cross_entropy'. + + """ + self.model = model + self._apply_softmax = apply_softmax + self.model_kwargs = model_kwargs + self.loss = loss + if use_lens is None and (hasattr(self.model_fn, 'use_lens') + and self.model_fn.use_lens): + self.use_lens = True + elif use_lens is None: + self.use_lens = False + else: + self.use_lens = use_lens + if kwargs and validate: + raise errors.UnrecognizedHyperparameterError("Unrecognized parameters: {}".format( + ', '.join(list(kwargs.keys())) + )) + elif kwargs: + log.warning("Ignoring unrecognized parameters: {}".format( + ', '.join(list(kwargs.keys())) + )) + + @property + def apply_softmax(self): + """Whether softmax will be applied to model outputs.""" + return self.is_classification() and self._apply_softmax + + @property + def model_fn(self): + """MIL model architecture (class/module).""" + if not isinstance(self.model, str): + return self.model + return sf.mil.get_model(self.model) + + @property + def loss_fn(self): + """MIL loss function.""" + return self.losses[self.loss] + + @property + def is_multimodal(self): + """Whether the model is multimodal.""" + return ((isinstance(self.model, str) and self.model.lower() == 'mm_attention_mil') + or (hasattr(self.model_fn, 'is_multimodal') + and self.model_fn.is_multimodal)) + + @property + def rich_name(self): + return f"[bold]{self.model_fn.__name__}[/]" + + @property + def model_type(self): + """Type of model (classification or regression).""" + if self.loss == 'cross_entropy': + return 'classification' + else: + return 'regression' + + def is_classification(self): + """Whether the model is a classification model.""" + return self.model_type == 'classification' + + def verify_trainer(self, trainer): + pass + + def get_metrics(self): + return None + + def to_dict(self): + """Converts this model configuration to a dictionary.""" + d = {k:v for k,v in vars(self).items() + if k not in ( + 'self', + 'model_fn', + 'loss_fn', + 'build_model', + 'is_multimodal' + ) and not k.startswith('_')} + if not isinstance(d['model'], str): + d['model'] = d['model'].__name__ + return d + + def _verify_eval_params(self, **kwargs): + """Verify evaluation parameters for the model.""" + + if self.is_multimodal: + if kwargs.get('attention_heatmaps'): + raise ValueError( + "Attention heatmaps cannot yet be exported for multi-modal " + "models. Please use Slideflow Studio for visualization of " + "multi-modal attention." + ) + if kwargs.get('heatmap_kwargs'): + kwarg_names = ', '.join(list(kwargs['heatmap_kwargs'].keys())) + raise ValueError( + f"Unrecognized keyword arguments: '{kwarg_names}'. Attention " + "heatmap keyword arguments are not currently supported for " + "multi-modal models." + ) + + def inspect_batch(self, batch) -> Tuple[int, int]: + """Inspect a batch of data. + + Args: + batch: One batch of data. + + Returns: + Tuple[int, int]: Number of input and output features. + + """ + if self.is_multimodal: + if self.use_lens: + n_in = [b[0].shape[-1] for b in batch[:-1]] + else: + n_in = [b.shape[-1] for b in batch[:-1][0]] + else: + n_in = batch[0].shape[-1] + targets = batch[-1] + if len(targets.shape) == 1: + n_out = 1 + else: + n_out = targets.shape[-1] + return n_in, n_out + + def build_model(self, n_in: int, n_out: int, **kwargs): + """Build the model. + + Args: + n_in (int): Number of input features. + n_out (int): Number of output features. + + Keyword args: + **kwargs: Additional keyword arguments to pass to the model constructor. + + Returns: + torch.nn.Module: PyTorch model. + + """ + log.info(f"Building model {self.rich_name} (n_in={n_in}, n_out={n_out})") + return self.model_fn(n_in, n_out, **kwargs) + + def _build_dataloader( + self, + bags, + targets, + encoder, + *, + dataset_kwargs = None, + dataloader_kwargs = None + ) -> "torch.utils.DataLoader": + from fastai.vision.all import DataLoader + from slideflow.mil import data as data_utils + + dataset_kwargs = dataset_kwargs or dict() + dataloader_kwargs = dataloader_kwargs or dict() + + if 'use_lens' not in dataset_kwargs: + dataset_kwargs['use_lens'] = self.use_lens + + if self.is_multimodal: + dts_fn = data_utils.build_multibag_dataset + else: + dts_fn = data_utils.build_dataset + + dataset = dts_fn(bags, targets, encoder=encoder, **dataset_kwargs) + dataloader = DataLoader(dataset, **dataloader_kwargs) + return dataloader + + def predict(self, model, bags, attention=False, apply_softmax=None, **kwargs): + """Generate model prediction from bags. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + bags (torch.Tensor): Bags, with shape ``(n_bags, n_tiles, n_features)``. + + Keyword args: + attention (bool): Whether to return attention maps. + apply_softmax (bool): Whether to apply softmax to model outputs. + attention_pooling (bool): Whether to pool attention maps with + average pooling. Defaults to None. + + Returns: + Tuple[np.ndarray, List[np.ndarray]]: Predictions and attention. + + """ + self._verify_eval_params(**kwargs) + + from slideflow.mil.eval import predict_from_bags, predict_from_multimodal_bags + + if apply_softmax is None: + apply_softmax = self.apply_softmax + + pred_fn = predict_from_multimodal_bags if self.is_multimodal else predict_from_bags + return pred_fn( + model, + bags, + attention=attention, + use_lens=self.use_lens, + apply_softmax=apply_softmax, + **kwargs + ) + + def batched_predict( + self, + model: "torch.nn.Module", + loaded_bags: torch.Tensor, + *, + device: Optional[Any] = None, + forward_kwargs: Optional[dict] = None, + attention: bool = False, + attention_pooling: Optional[str] = None, + uq: bool = False, + apply_softmax: Optional[bool] = None + ) -> Tuple[np.ndarray, List[np.ndarray]]: + """Generate predictions from a batch of bags. + + More efficient than calling :meth:`predict` multiple times. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + loaded_bags (torch.Tensor): Loaded bags, with shape ``(n_bags, n_tiles, n_features)``. + + Keyword args: + device (torch.device, optional): Device on which to run the model. + If None, uses the default device. + forward_kwargs (dict, optional): Additional keyword arguments to + pass to the model's forward function. + attention (bool): Whether to return attention maps. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to None. + uq (bool): Whether to return uncertainty quantification. + + Returns: + Tuple[np.ndarray, List[np.ndarray]]: Predictions and attention. + + """ + from slideflow.mil.eval import run_inference + + if apply_softmax is None: + apply_softmax = self.apply_softmax + + return run_inference( + model, + loaded_bags, + attention=attention, + attention_pooling=attention_pooling, + forward_kwargs=forward_kwargs, + apply_softmax=apply_softmax, + use_lens=self.use_lens, + device=device, + uq=uq, + ) + + def run_metrics(self, df, level='slide', outdir=None) -> None: + """Run metrics and save plots to disk. + + Args: + df (pd.DataFrame): Dataframe with predictions and outcomes. + level (str): Level at which to calculate metrics. Either 'slide' or 'patient'. + outdir (str): Output directory for saving metrics. + + """ + if self.is_classification(): + sf.stats.metrics.classification_metrics(df, level=level, data_dir=outdir) + else: + sf.stats.metrics.regression_metrics(df, level=level, data_dir=outdir)
+ +# ----------------------------------------------------------------------------- + +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/mil/eval/index.html b/docs/_modules/slideflow/mil/eval/index.html new file mode 100644 index 000000000..788923166 --- /dev/null +++ b/docs/_modules/slideflow/mil/eval/index.html @@ -0,0 +1,1595 @@ + + + + + + + + + + + + slideflow.mil.eval — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.mil.eval

+"""Tools for evaluation MIL models."""
+
+import os
+import inspect
+import pandas as pd
+import slideflow as sf
+import numpy as np
+
+from rich.progress import Progress, track
+from os.path import join, exists, dirname
+from typing import Union, List, Optional, Callable, Tuple, Any, TYPE_CHECKING
+from slideflow import Dataset, log, errors
+from slideflow.util import path_to_name
+from slideflow.model.extractors import rebuild_extractor
+from slideflow.stats.metrics import ClassifierMetrics
+from ._params import TrainerConfig
+from . import utils
+
+if TYPE_CHECKING:
+    import torch
+    from .features import MILFeatures
+    from slideflow.norm import StainNormalizer
+    from slideflow.model.base import BaseFeatureExtractor
+
+# -----------------------------------------------------------------------------
+# User-facing API for evaluation and prediction.
+
+
[docs]def eval_mil( + weights: str, + dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + config: Optional[TrainerConfig] = None, + *, + outdir: str = 'mil', + attention_heatmaps: bool = False, + uq: bool = False, + aggregation_level: Optional[str] = None, + **heatmap_kwargs +) -> pd.DataFrame: + """Evaluate a multiple-instance learning model. + + Saves results for the evaluation in the target folder, including + predictions (parquet format), attention (Numpy format for each slide), + and attention heatmaps (if ``attention_heatmaps=True``). + + Logs classifier metrics (AUROC and AP) to the console. + + Args: + weights (str): Path to model weights to load. + dataset (sf.Dataset): Dataset to evaluation. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for building model. If ``weights`` is a path to a + model directory, will attempt to read ``mil_params.json`` from this + location and load saved configuration. Defaults to None. + + Keyword arguments: + outdir (str): Path at which to save results. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Not available for multi-modal MIL models. Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + aggregation_level (str, optional): Aggregation level for predictions. + Either 'slide' or 'patient'. Defaults to None (uses the model + configuration). + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + """ + if isinstance(bags, str): + utils._verify_compatible_tile_size(weights, bags) + + model, config = utils.load_model_weights(weights, config) + model.eval() + params = { + 'model_path': weights, + 'eval_bags': bags, + 'eval_filters': dataset._filters, + 'mil_params': sf.util.load_json(join(weights, 'mil_params.json')) + } + return config.eval( + model, + dataset, + outcomes, + bags, + outdir=outdir, + attention_heatmaps=attention_heatmaps, + uq=uq, + params=params, + aggregation_level=aggregation_level, + **heatmap_kwargs + )
+ + +
[docs]def predict_mil( + model: Union[str, Callable], + dataset: "sf.Dataset", + outcomes: Union[str, List[str]], + bags: Union[str, np.ndarray, List[str]], + *, + config: Optional[TrainerConfig] = None, + attention: bool = False, + aggregation_level: Optional[str] = None, + **kwargs +) -> Union[pd.DataFrame, Tuple[pd.DataFrame, List[np.ndarray]]]: + """Generate predictions for a dataset from a saved MIL model. + + Args: + model (torch.nn.Module): Model from which to generate predictions. + dataset (sf.Dataset): Dataset from which to generation predictions. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + + Keyword args: + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for the MIL model. Required if model is a loaded ``torch.nn.Module``. + Defaults to None. + attention (bool): Whether to calculate attention scores. Defaults to False. + uq (bool): Whether to generate uncertainty estimates. Experimental. Defaults to False. + aggregation_level (str): Aggregation level for predictions. Either 'slide' + or 'patient'. Defaults to None. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of predictions. + + list(np.ndarray): Attention scores (if ``attention=True``) + """ + # Load the model + if isinstance(model, str): + model_path = model + model, config = utils.load_model_weights(model_path, config) + model.eval() + + if isinstance(bags, str): + utils._verify_compatible_tile_size(model_path, bags) + elif config is None: + raise ValueError("If model is not a path, a TrainerConfig object must be provided via the 'config' argument.") + + # Validate aggregation level. + if aggregation_level is None: + aggregation_level = config.aggregation_level + if aggregation_level not in ('slide', 'patient'): + raise ValueError( + f"Unrecognized aggregation level: '{aggregation_level}'. " + "Must be either 'patient' or 'slide'." + ) + + # Prepare labels. + labels, _ = utils.get_labels(dataset, outcomes, config.is_classification(), format='id') + + # Prepare bags and targets. + slides = list(labels.keys()) + if isinstance(bags, str): + bags = dataset.get_bags(bags) + else: + bags = np.array([b for b in bags if path_to_name(b) in slides]) + + # Aggregate bags by slide or patient. + if aggregation_level == 'patient': + # Get nested list of bags, aggregated by slide. + slide_to_patient = dataset.patients() + n_slide_bags = len(bags) + bags, y_true = utils.aggregate_bags_by_patient(bags, labels, slide_to_patient) + log.info(f"Aggregated {n_slide_bags} slide bags to {len(bags)} patient bags.") + + # Create prediction dataframe. + patients = [slide_to_patient[path_to_name(b[0])] for b in bags] + df_dict = dict(patient=patients, y_true=y_true) + + else: + # Ensure slide names are sorted according to the bags. + slides = [path_to_name(b) for b in bags] + y_true = np.array([labels[s] for s in slides]) + + # Create prediction dataframe. + df_dict = dict(slide=slides) + + # Handle continous outcomes. + if len(y_true.shape) > 1: + for i in range(y_true.shape[-1]): + df_dict[f'y_true{i}'] = y_true[:, i] + else: + df_dict['y_true'] = y_true + + # Inference. + model.eval() + pred_out = config.predict(model, bags, attention=attention, **kwargs) + if kwargs.get('uq'): + y_pred, y_att, y_uq = pred_out + else: + y_pred, y_att = pred_out + + # Update dataframe with predictions. + for i in range(y_pred.shape[-1]): + df_dict[f'y_pred{i}'] = y_pred[:, i] + if kwargs.get('uq'): + for i in range(y_uq.shape[-1]): + df_dict[f'uncertainty{i}'] = y_uq[:, i] + df = pd.DataFrame(df_dict) + + if attention: + return df, y_att + else: + return df
+ + +
[docs]def predict_multimodal_mil( + model: Union[str, Callable], + dataset: "sf.Dataset", + outcomes: Union[str, List[str]], + bags: Union[np.ndarray, List[List[str]]], + *, + config: Optional[TrainerConfig] = None, + attention: bool = False, + aggregation_level: Optional[str] = None, + **kwargs +) -> Union[pd.DataFrame, Tuple[pd.DataFrame, List[np.ndarray]]]: + """Generate predictions for a dataset from a saved multimodal MIL model. + + Args: + model (torch.nn.Module): Model from which to generate predictions. + dataset (sf.Dataset): Dataset from which to generation predictions. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + + Keyword args: + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for the MIL model. Required if model is a loaded ``torch.nn.Module``. + Defaults to None. + attention (bool): Whether to calculate attention scores. Defaults to False. + uq (bool): Whether to generate uncertainty estimates. Defaults to False. + aggregation_level (str): Aggregation level for predictions. Either 'slide' + or 'patient'. Defaults to None. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of predictions. + + list(np.ndarray): Attention scores (if ``attention=True``) + """ + + # Load the model + if isinstance(model, str): + model_path = model + model, config = utils.load_model_weights(model_path, config) + model.eval() + + # Verify tile size compatibility for each bag source. + for b in bags: + if isinstance(b, str): + utils._verify_compatible_tile_size(model_path, b) + elif config is None: + raise ValueError("If model is not a path, a TrainerConfig object must be provided via the 'config' argument.") + + # Validate aggregation level. + if aggregation_level is not None and aggregation_level != 'slide': + raise ValueError( + f"Unrecognized aggregation level: '{aggregation_level}'. " + "Multimodal MIL models only support 'slide' aggregation." + ) + + # Prepare labels. + labels, _ = utils.get_labels(dataset, outcomes, config.is_classification(), format='id') + + # Prepare bags and targets. + slides = list(labels.keys()) + + # Load multimodal bags. + if isinstance(bags[0], str): + bags, val_slides = utils._get_nested_bags(dataset, bags) + + # This is where we would aggregate bags by slide or patient. + # This is not yet supported. + + # Ensure slide names are sorted according to the bags. + slides = [path_to_name(b[0]) for b in bags] + y_true = np.array([labels[s] for s in slides]) + + # Create prediction dataframe. + df_dict = dict(slide=slides) + + # Handle continous outcomes. + if len(y_true.shape) > 1: + for i in range(y_true.shape[-1]): + df_dict[f'y_true{i}'] = y_true[:, i] + else: + df_dict['y_true'] = y_true + + # Inference. + model.eval() + y_pred, y_att = config.predict(model, bags, attention=attention, **kwargs) + + # Update dataframe with predictions. + for i in range(y_pred.shape[-1]): + df_dict[f'y_pred{i}'] = y_pred[:, i] + df = pd.DataFrame(df_dict) + + if attention: + return df, y_att + else: + return df
+ + +
[docs]def predict_slide( + model: str, + slide: Union[str, sf.WSI], + extractor: Optional["BaseFeatureExtractor"] = None, + *, + normalizer: Optional["StainNormalizer"] = None, + config: Optional[TrainerConfig] = None, + attention: bool = False, + native_normalizer: Optional[bool] = True, + extractor_kwargs: Optional[dict] = None, + **kwargs +) -> Tuple[np.ndarray, Optional[np.ndarray]]: + """Generate predictions (and attention) for a single slide. + + Args: + model (str): Path to MIL model. + slide (str): Path to slide. + extractor (:class:`slideflow.mil.BaseFeatureExtractor`, optional): + Feature extractor. If not provided, will attempt to auto-detect + extractor from model. + + .. note:: + If the extractor has a stain normalizer, this will be used to + normalize the slide before extracting features. + + Keyword Args: + normalizer (:class:`slideflow.stain.StainNormalizer`, optional): + Stain normalizer. If not provided, will attempt to use stain + normalizer from extractor. + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for building model. If None, will attempt to read + ``mil_params.json`` from the model directory and load saved + configuration. Defaults to None. + attention (bool): Whether to return attention scores. Defaults to + False. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to None. + native_normalizer (bool, optional): Whether to use PyTorch/Tensorflow-native + stain normalization, if applicable. If False, will use the OpenCV/Numpy + implementations. Defaults to None, which auto-detects based on the + slide backend (False if libvips, True if cucim). This behavior is due + to performance issued when using native stain normalization with + libvips-compatible multiprocessing. + + Returns: + Tuple[np.ndarray, Optional[np.ndarray]]: Predictions and attention scores. + Attention scores are None if ``attention`` is False. For single-channel attention, + this is a masked 2D array with the same shape as the slide grid (arranged as a + heatmap, with unused tiles masked). For multi-channel attention, this is a + masked 3D array with shape ``(n_channels, X, Y)``. + + """ + # Try to auto-determine the extractor + if native_normalizer is None: + native_normalizer = (sf.slide_backend() == 'cucim') + if extractor is None: + extractor, detected_normalizer = rebuild_extractor( + model, allow_errors=True, native_normalizer=native_normalizer + ) + if extractor is None: + raise ValueError( + "Unable to auto-detect feature extractor used for model {}. " + "Please specify an extractor.".format(model) + ) + else: + detected_normalizer = None + + # Determine stain normalization + if detected_normalizer is not None and normalizer is not None: + log.warning( + "Bags were generated with a stain normalizer, but a different stain " + "normalizer was provided to this function. Overriding with provided " + "stain normalizer." + ) + elif detected_normalizer is not None: + normalizer = detected_normalizer + + # Load model + model_fn, config = utils.load_model_weights(model, config) + model_fn.eval() + mil_params = sf.util.load_json(join(model, 'mil_params.json')) + if 'bags_extractor' not in mil_params: + raise ValueError( + "Unable to determine extractor used for model {}. " + "Please specify an extractor.".format(model) + ) + bags_params = mil_params['bags_extractor'] + + # Load slide + if isinstance(slide, str): + if not all(k in bags_params for k in ('tile_px', 'tile_um')): + raise ValueError( + "Unable to determine tile size for slide {}. " + "Either slide must be a slideflow.WSI object, or tile_px and " + "tile_um must be specified in mil_params.json.".format(slide) + ) + slide = sf.WSI( + slide, + tile_px=bags_params['tile_px'], + tile_um=bags_params['tile_um'] + ) + elif not isinstance(slide, sf.WSI): + raise ValueError("slide must either be a str (path to a slide) or a " + "WSI object.") + + # Verify that the slide has the same tile size as the bags + if 'tile_px' in bags_params and 'tile_um' in bags_params: + bag_px, bag_um = bags_params['tile_px'], bags_params['tile_um'] + if not sf.util.is_tile_size_compatible(slide.tile_px, slide.tile_um, bag_px, bag_um): + log.error(f"Slide tile size (px={slide.tile_px}, um={slide.tile_um}) does not match the tile size " + f"used for bags (px={bag_px}, um={bag_um}). Predictions may be unreliable.") + + # Convert slide to bags + if extractor_kwargs is None: + extractor_kwargs = dict() + masked_bags = extractor(slide, normalizer=normalizer, **extractor_kwargs) + original_shape = masked_bags.shape + masked_bags = masked_bags.reshape((-1, masked_bags.shape[-1])) + if len(masked_bags.mask.shape): + mask = masked_bags.mask.any(axis=1) + valid_indices = np.where(~mask) + bags = masked_bags[valid_indices] + else: + valid_indices = np.arange(masked_bags.shape[0]) + bags = masked_bags + bags = np.expand_dims(bags, axis=0).astype(np.float32) + + sf.log.info("Generated feature bags for {} tiles".format(bags.shape[1])) + + # Generate predictions. + y_pred, raw_att = config.predict(model_fn, bags, attention=attention, **kwargs) + + # Reshape attention to match original shape + if attention and raw_att is not None and len(raw_att): + y_att = raw_att[0] + + # If attention is a 1D array + if len(y_att.shape) == 1: + # Create a fully masked array of shape (X, Y) + att_heatmap = np.ma.masked_all(masked_bags.shape[0], dtype=y_att.dtype) + + # Unmask and fill the transformed data into the corresponding positions + att_heatmap[valid_indices] = y_att + y_att = np.reshape(att_heatmap, original_shape[0:2]) + + # If attention is a 2D array (multi-channel attention) + elif len(y_att.shape) == 2: + att_heatmaps = [] + for i in range(y_att.shape[0]): + att = y_att[i] + att_heatmap = np.ma.masked_all(masked_bags.shape[0], dtype=att.dtype) + att_heatmap[valid_indices] = att + att_heatmap = np.reshape(att_heatmap, original_shape[0:2]) + att_heatmaps.append(att_heatmap) + y_att = np.ma.stack(att_heatmaps, axis=0) + else: + y_att = None + + return y_pred, y_att
+ +# ----------------------------------------------------------------------------- +# Prediction from bags. + +
[docs]def predict_from_bags( + model: "torch.nn.Module", + bags: Union[np.ndarray, List[str]], + *, + attention: bool = False, + attention_pooling: Optional[str] = None, + use_lens: bool = False, + device: Optional[Any] = None, + apply_softmax: Optional[bool] = None, + uq: bool = False +) -> Tuple[np.ndarray, List[np.ndarray]]: + """Generate MIL predictions for a list of bags. + + Predictions are generated for each bag in the list one at a time, and not batched. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + bags (np.ndarray, list(str)): Bags to generate predictions for. Each bag + should contain PyTorch array of features from all tiles in a slide, + with the shape ``(n_tiles, n_features)``. + + Keyword Args: + attention (bool): Whether to calculate attention scores. Defaults to False. + attention_pooling (str, optional): Pooling strategy for attention scores. + Can be 'avg', 'max', or None. Defaults to None. + use_lens (bool): Whether to use the length of each bag as an additional + input to the model. Defaults to False. + device (str, optional): Device on which to run inference. Defaults to None. + apply_softmax (bool): Whether to apply softmax to the model output. Defaults + to True for categorical outcomes, False for continuous outcomes. + uq (bool): Whether to generate uncertainty estimates. Defaults to False. + + Returns: + Tuple[np.ndarray, List[np.ndarray]]: Predictions and attention scores. + + """ + import torch + + attention, uq = utils._validate_model(model, attention, uq, allow_errors=True) + model.eval() + + y_pred = [] + y_att = [] + uncertainty = [] + device = utils._detect_device(model, device, verbose=True) + + for bag in bags: + if utils._is_list_of_paths(bag): + # If bags are passed as a list of paths, load them individually. + loaded = torch.cat([utils._load_bag(b).to(device) for b in bag], dim=0) + else: + loaded = utils._load_bag(bag).to(device) + loaded = torch.unsqueeze(loaded, dim=0) + + with torch.inference_mode(): + # Run inference. + _y_pred, _y_att, _y_uq = run_inference( + model, + loaded, + attention=attention, + attention_pooling=attention_pooling, + uq=uq, + apply_softmax=apply_softmax, + device=device, + use_lens=use_lens + ) + + # Convert to numpy. + if _y_pred is not None: + _y_pred = _y_pred.cpu().numpy() + if _y_att is not None: + _y_att = _y_att.cpu().numpy() + if _y_uq is not None: + _y_uq = _y_uq.cpu().numpy() + + # Append to running lists. + y_pred.append(_y_pred) + if _y_att is not None: + y_att.append(_y_att) + if _y_uq is not None: + uncertainty.append(_y_uq) + + yp = np.concatenate(y_pred, axis=0) + if uq: + uncertainty = np.concatenate(uncertainty, axis=0) + return yp, y_att, uncertainty + else: + return yp, y_att
+ + +
[docs]def predict_from_multimodal_bags( + model: "torch.nn.Module", + bags: Union[List[np.ndarray], List[List[str]]], + *, + attention: bool = True, + attention_pooling: Optional[str] = None, + use_lens: bool = True, + device: Optional[Any] = None, + apply_softmax: Optional[bool] = None, +) -> Tuple[np.ndarray, List[List[np.ndarray]]]: + """Generate multi-mag MIL predictions for a nested list of bags. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + bags (list(list(str))): Nested list of bags to generate predictions for. + Each bag should contain PyTorch array of features from all tiles in a slide, + with the shape ``(n_tiles, n_features)``. + + Keyword Args: + attention (bool): Whether to calculate attention scores. Defaults to False. + attention_pooling (str, optional): Pooling strategy for attention scores. + Can be 'avg', 'max', or None. Defaults to None. + use_lens (bool): Whether to use the length of each bag as an additional + input to the model. Defaults to False. + device (str, optional): Device on which to run inference. Defaults to None. + apply_softmax (bool): Whether to apply softmax to the model output. Defaults + to True for categorical outcomes, False for continuous + + Returns: + Tuple[np.ndarray, List[List[np.ndarray]]]: Predictions and attention scores. + + """ + import torch + + y_pred = [] + n_mag = len(bags[0]) + y_att = [[] for _ in range(n_mag)] + device = utils._detect_device(model, device, verbose=True) + + # Ensure the model has attention capabilities. + if attention and not hasattr(model, 'calculate_attention'): + log.warning( + "Model '{}' does not have a method 'calculate_attention'. " + "Unable to calculate or display attention heatmaps.".format( + model.__class__.__name__ + ) + ) + attention = False + + for bag in bags: + loaded = [torch.unsqueeze(utils._load_bag(b).to(device), dim=0) + for b in bag] + with torch.inference_mode(): + if use_lens: + model_args = [(mag_bag, torch.from_numpy(np.array([mag_bag.shape[1]])).to(device)) + for mag_bag in loaded] + else: + model_args = (loaded,) + model_out = model(*model_args) + if attention: + raw_att = model.calculate_attention(*model_args) + for mag in range(n_mag): + att = torch.squeeze(raw_att[mag], dim=0) + att = utils._pool_attention(torch.squeeze(att), pooling=attention_pooling) + # If we have multi-channel attention, then the attenion channel (last) needs to + # be moved to the first dimension. + if len(att.shape) == 2: + att = torch.moveaxis(att, -1, 0) + y_att[mag].append(att.cpu().numpy()) + if apply_softmax: + model_out = torch.nn.functional.softmax(model_out, dim=1) + y_pred.append(model_out.cpu().numpy()) + yp = np.concatenate(y_pred, axis=0) + return yp, y_att
+ +# ----------------------------------------------------------------------------- +# Low-level runners for inference and evaluation. + +def run_inference( + model: "torch.nn.Module", + loaded_bags: "torch.Tensor", + *, + attention: bool = False, + attention_pooling: Optional[str] = None, + uq: bool = False, + forward_kwargs: Optional[dict] = None, + apply_softmax: Optional[bool] = None, + use_lens: Union[bool, "torch.Tensor"] = False, + device: Optional[Any] = None +) -> Tuple[np.ndarray, Optional[np.ndarray], Optional[np.ndarray]]: + """Low-level interface for running inference on a MIL model. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + loaded_bags (torch.Tensor): Loaded bags to run inference on. + + Keyword Args: + attention (bool): Whether to calculate attention scores. Defaults to False. + attention_pooling (str, optional): Pooling strategy for attention scores. + Can be 'avg', 'max', or None. Defaults to None. + uq (bool): Whether to generate uncertainty estimates. Defaults to False. + forward_kwargs (dict, optional): Additional keyword arguments to pass to + the model's forward function. Defaults to None. + apply_softmax (bool): Whether to apply softmax to the model output. Defaults + to True for categorical outcomes, False for continuous outcomes. + use_lens (bool, torch.Tensor): Whether to use the length of each bag as an + additional input to the model. If a tensor is passed, this will be used + as the lens. Defaults to False. + device (str, optional): Device on which to run inference. Defaults to None. + + Returns: + Tuple[np.ndarray, Optional[np.ndarray], Optional[np.ndarray]: Predictions, + attention scores, and uncertainty estimates. For multi-dimensional attention, + the first dimension of the attention scores will be the attention channel. + + """ + import torch + + if forward_kwargs is None: + forward_kwargs = dict() + + y_pred, y_att, y_uncertainty = None, None, None + + # Prepare lens + device = utils._detect_device(model, device, verbose=False) + if isinstance(use_lens, bool) and use_lens: + lens = torch.full((loaded_bags.shape[0],), loaded_bags.shape[1], device=device) + model_args = (loaded_bags, lens) + elif use_lens is not False and use_lens is not None: + model_args = (loaded_bags, use_lens) + else: + model_args = (loaded_bags,) + + if uq and 'uq' in inspect.signature(model.forward).parameters: + kw = dict(uq=True, **forward_kwargs) + elif uq: + raise RuntimeError("Model does not support UQ.") + else: + kw = forward_kwargs + + # Check if the model can return attention during inference. + # If so, this saves us a forward pass through the model. + if attention and 'return_attention' in inspect.signature(model.forward).parameters: + model_out, y_att = model(*model_args, return_attention=True, **kw) + # Otherwise, use the model's `calculate_attention` function directly. + elif attention: + model_out = model(*model_args, **kw) + y_att = model.calculate_attention(*model_args) + else: + model_out = model(*model_args, **kw) + + # Parse uncertainty from model output. + if uq: + y_pred, y_uncertainty = model_out + else: + y_pred = model_out + + if attention: + y_att = utils._pool_attention(torch.squeeze(y_att), pooling=attention_pooling) + # If we have multi-channel attention, then the attenion channel (last) needs to + # be moved to the first dimension. + if len(y_att.shape) == 2: + y_att = torch.moveaxis(y_att, -1, 0) + + if apply_softmax: + y_pred = torch.nn.functional.softmax(y_pred, dim=1) + return y_pred, y_att, y_uncertainty + + +def run_eval( + model: "torch.nn.Module", + dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + config: TrainerConfig, + *, + outdir: str = 'mil', + attention_heatmaps: bool = False, + uq: bool = False, + params: Optional[dict] = None, + aggregation_level: Optional[str] = None, + **heatmap_kwargs +) -> pd.DataFrame: + """Evaluate a standard, single-mode multi-instance learning model. + + Args: + model (torch.nn.Module): Loaded PyTorch MIL model. + dataset (sf.Dataset): Dataset to evaluation. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for building model. + + Keyword arguments: + outdir (str): Path at which to save results. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of predictions. + """ + # Generate predictions. + predict_kwargs = dict( + model=model, + dataset=dataset, + config=config, + outcomes=outcomes, + bags=bags, + attention=True, + aggregation_level=aggregation_level + ) + if config.is_multimodal: + if uq: + log.warning("Uncertainty estimates are not supported for multi-modal models.") + df, y_att = predict_multimodal_mil(**predict_kwargs) + else: + df, y_att = predict_mil(uq=uq, **predict_kwargs) + + # Save results. + if outdir: + if not exists(outdir): + os.makedirs(outdir) + model_dir = sf.util.get_new_model_dir(outdir, config.model_config.model) + if params is not None: + sf.util.write_json(params, join(model_dir, 'mil_params.json')) + pred_out = join(model_dir, 'predictions.parquet') + df.to_parquet(pred_out) + log.info(f"Predictions saved to [green]{pred_out}[/]") + else: + model_dir = None + + # Print classification metrics, including per-category accuracy) + metrics_df = utils.rename_df_cols(df, outcomes, categorical=config.is_classification()) + config.run_metrics(metrics_df, level='slide', outdir=model_dir) + + # Export attention + if outdir and y_att: + if 'slide' in df.columns: + slides_or_patients = df.slide.values + elif 'patient' in df.columns: + slides_or_patients = df.patient.values + else: + raise ValueError("Malformed dataframe; cannot find 'slide' or 'patient' column.") + utils._export_attention(join(model_dir, 'attention'), y_att, slides_or_patients) + + # Attention heatmaps + # Not supported for multimodal models + if attention_heatmaps and not config.is_multimodal: + log.warning("Cannot generate attention heatmaps for multi-modal models.") + elif outdir and y_att and attention_heatmaps: + generate_attention_heatmaps( + outdir=join(model_dir, 'heatmaps'), + dataset=dataset, + bags=bags, # type: ignore + attention=y_att, + **heatmap_kwargs + ) + + return df + +# ----------------------------------------------------------------------------- +# Tile-level predictions. + +
[docs]def get_mil_tile_predictions( + weights: str, + dataset: "sf.Dataset", + bags: Union[str, np.ndarray, List[str]], + *, + config: Optional[TrainerConfig] = None, + outcomes: Union[str, List[str]] = None, + dest: Optional[str] = None, + uq: bool = False, + device: Optional[Any] = None, + tile_batch_size: int = 512, + **kwargs +) -> pd.DataFrame: + """Generate tile-level predictions for a MIL model. + + Args: + weights (str): Path to model weights to load. + dataset (:class:`slideflow.Dataset`): Dataset. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + + Keyword Args: + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for building model. If ``weights`` is a path to a + model directory, will attempt to read ``mil_params.json`` from this + location and load saved configuration. Defaults to None. + outcomes (str, list(str)): Outcomes. + dest (str): Path at which to save tile predictions. + uq (bool): Whether to generate uncertainty estimates. Defaults to False. + device (str, optional): Device on which to run inference. Defaults to None. + tile_batch_size (int): Batch size for tile-level predictions. Defaults + to 512. + attention_pooling (str): Attention pooling strategy. Either 'avg' + or 'max'. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of tile predictions. + + """ + import torch + + if isinstance(bags, str): + utils._verify_compatible_tile_size(weights, bags) + + # Load model and configuration. + model, config = utils.load_model_weights(weights, config) + device = utils._detect_device(model, device, verbose=True) + model.eval() + model.to(device) + + if outcomes is not None: + labels, _ = utils.get_labels(dataset, outcomes, config.is_classification(), format='id') + + # Prepare bags. + slides = dataset.slides() + if isinstance(bags, str): + bags = dataset.get_bags(bags) + else: + bags = np.array([b for b in bags if path_to_name(b) in slides]) + + # Ensure slide names are sorted according to the bags. + slides = [path_to_name(b) for b in bags] + + log.info("Generating predictions for {} slides and {} bags.".format(len(slides), len(bags))) + + # Set model to eval, and prepare bags. + use_attention, uq = utils._validate_model(model, True, uq, allow_errors=True) + + # First, start with slide-level inference and attention. + slide_pred, attention = config.predict(model, bags, attention=use_attention, **kwargs) + + df_slides = [] + df_attention = [] + df_preds = [] + df_uq = [] + df_true = [] + df_loc_x = [] + df_loc_y = [] + + # Then, generate tile predictions for each slide: + for i, (bag, slide) in track(enumerate(zip(bags, slides)), + description="Generating tile predictions", + total=len(bags)): + + # Prepare bags, and resize bag dimension to the batch dimension. + loaded_bags = torch.unsqueeze(utils._load_bag(bag, device=device), dim=1) + + # Split loaded bags into smaller batches for inference (tile_batch_size) + if len(loaded_bags) > tile_batch_size: + loaded_bags = torch.split(loaded_bags, tile_batch_size, dim=0) + else: + loaded_bags = [loaded_bags] + + _running_pred = [] + _running_uq = [] + + # Run inference on each batch. + for batch in loaded_bags: + with torch.inference_mode(): + pred_out = config.batched_predict(model, batch, uq=uq, device=device, attention=True, **kwargs) + + if uq or len(pred_out) == 3: + _pred, _att, _uq = utils._output_to_numpy(*pred_out) + if _uq is not None and len(_uq): + _running_uq.append(_uq) + else: + _pred, _att = utils._output_to_numpy(*pred_out) + _running_pred.append(_pred) + + # Concatenate predictions and attention. + tile_pred = np.concatenate(_running_pred, axis=0) + if len(_running_uq): + tile_uq = np.concatenate(_running_uq, axis=0) + + # Verify the shapes are consistent. + if attention is not None and len(attention): + assert len(tile_pred) == attention[i].shape[-1] + n_bags = len(tile_pred) + + # Find the associated locations. + bag_index = join(dirname(bag), f'{slide}.index.npz') + if exists(bag_index): + locations = np.load(bag_index)['arr_0'] + assert len(locations) == n_bags + df_loc_x.append(locations[:, 0]) + df_loc_y.append(locations[:, 1]) + + # Add to dataframe lists. + df_preds.append(tile_pred) + if uq: + df_uq.append(tile_uq) + if attention is not None and len(attention): + df_attention.append(attention[i]) + df_slides += [slide for _ in range(n_bags)] + if outcomes is not None: + _label = labels[slide] + df_true += [_label for _ in range(n_bags)] + + # Update dataframe with predictions. + df_dict = dict(slide=df_slides) + if len(df_attention): + df_attention = np.concatenate(df_attention, axis=-1) + df_preds = np.concatenate(df_preds, axis=0) + + # Tile location + if df_loc_x: + df_dict['loc_x'] = np.concatenate(df_loc_x, axis=0) + df_dict['loc_y'] = np.concatenate(df_loc_y, axis=0) + + # Attention + if attention is not None and len(attention): + if len(df_attention.shape) == 1: + df_dict['attention'] = df_attention + else: + for _a in range(len(df_attention)): + df_dict[f'attention-{_a}'] = df_attention[_a] + + # Uncertainty + if uq: + df_uq = np.concatenate(df_uq, axis=0) + for i in range(df_uq[0].shape[0]): + df_dict[f'uncertainty{i}'] = df_uq[:, i] + + # Ground truth + if outcomes is not None: + df_dict['y_true'] = df_true + + # Predictions + for i in range(df_preds[0].shape[0]): + df_dict[f'y_pred{i}'] = df_preds[:, i] + + # Final processing to dataframe & disk + df = pd.DataFrame(df_dict) + if dest is not None: + df.to_parquet(dest) + log.info("{} tile predictions exported to [green]{}[/]".format( + df_preds.shape[0], + dest + )) + return df
+ + +def save_mil_tile_predictions( + weights: str, + dataset: "sf.Dataset", + bags: Union[str, np.ndarray, List[str]], + config: Optional[TrainerConfig] = None, + outcomes: Union[str, List[str]] = None, + dest: str = 'mil_tile_preds.parquet', +) -> pd.DataFrame: + return get_mil_tile_predictions( + weights, + dataset, + bags, + config=config, + outcomes=outcomes, + dest=dest + ) + +# ----------------------------------------------------------------------------- +# Feature extraction and attention heatmaps. + +
[docs]def generate_mil_features( + weights: str, + dataset: "sf.Dataset", + bags: Union[str, np.ndarray, List[str]], + *, + config: Optional[TrainerConfig] = None, +) -> "MILFeatures": + """Generate activations weights from the last layer of an MIL model. + + Returns MILFeatures object. + + Args: + weights (str): Path to model weights to load. + config (:class:`slideflow.mil.TrainerConfig`): + Configuration for building model. If ``weights`` is a path to a + model directory, will attempt to read ``mil_params.json`` from this + location and load saved configuration. Defaults to None. + dataset (:class:`slideflow.Dataset`): Dataset. + outcomes (str, list(str)): Outcomes. + bags (str, list(str)): Path to bags, or list of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + """ + from .features import MILFeatures + + # Load model weights. + model, config = utils.load_model_weights(weights, config) + + # Ensure the model is valid for generating features. + if not hasattr(model, 'get_last_layer_activations'): + raise errors.ModelError( + f"Model {model.__class__.__name__} is not supported; could not " + "find method 'get_last_layer_activations'") + + # Prepare bags and targets. + slides = dataset.slides() + if isinstance(bags, str): + bags = dataset.get_bags(bags) + else: + bags = np.array([b for b in bags if path_to_name(b) in slides]) + + # Ensure slide names are sorted according to the bags. + slides = [path_to_name(b) for b in bags] + + # Calculate and return last-layer features. + return MILFeatures(model, bags, slides=slides, config=config, dataset=dataset)
+ + +
[docs]def generate_attention_heatmaps( + outdir: str, + dataset: "sf.Dataset", + bags: Union[List[str], np.ndarray], + attention: Union[np.ndarray, List[np.ndarray]], + **kwargs +) -> None: + """Generate and save attention heatmaps for a dataset. + + Args: + outdir (str): Path at which to save heatmap images. + dataset (sf.Dataset): Dataset. + bags (str, list(str)): List of bag file paths. + Each bag should contain PyTorch array of features from all tiles in + a slide, with the shape ``(n_tiles, n_features)``. + attention (list(np.ndarray)): Attention scores for each slide. + Length of ``attention`` should equal the length of ``bags``. + + Keyword args: + interpolation (str, optional): Interpolation strategy for smoothing + heatmap. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + + """ + assert len(bags) == len(attention) + if not exists(outdir): + os.makedirs(outdir) + pb = Progress(transient=True) + task = pb.add_task('Generating heatmaps...', total=len(bags)) + pb.start() + with sf.util.cleanup_progress(pb): + for i, bag in enumerate(bags): + pb.advance(task) + slidename = sf.util.path_to_name(bag) + slide_path = dataset.find_slide(slide=slidename) + locations_file = join(dirname(bag), f'{slidename}.index.npz') + npy_loc_file = locations_file[:-1] + 'y' + if slide_path is None: + log.info(f"Unable to find slide {slidename}") + continue + if exists(locations_file): + locations = np.load(locations_file)['arr_0'] + elif exists(npy_loc_file): + locations = np.load(npy_loc_file) + else: + log.info( + f"Unable to find locations index file for {slidename}" + ) + continue + + # Handle the case of multiple attention values at each tile location. + heatmap_kwargs = dict( + locations=locations, + slide=slide_path, + tile_px=dataset.tile_px, + tile_um=dataset.tile_um, + **kwargs + ) + if (len(attention[i].shape) < 2) or (attention[i].shape[0] == 1): + # If there is a single attention value, create a single map. + sf.util.location_heatmap( + values=attention[i], + filename=join(outdir, f'{sf.util.path_to_name(slide_path)}_attn.png'), + **heatmap_kwargs + ) + else: + # Otherwise, create a separate heatmap for each value, + # as well as a heatmap reduced by mean. + # The attention values are assumed to have the shape (n_attention, n_tiles). + for att_idx in range(attention[i].shape[0]): + sf.util.location_heatmap( + values=attention[i][att_idx, :], + filename=join(outdir, f'{sf.util.path_to_name(slide_path)}_attn-{att_idx}.png'), + **heatmap_kwargs + ) + sf.util.location_heatmap( + values=np.mean(attention[i], axis=0), + filename=join(outdir, f'{sf.util.path_to_name(slide_path)}_attn-avg.png'), + **heatmap_kwargs + ) + + + log.info(f"Attention heatmaps saved to [green]{outdir}[/]")
+ +# ----------------------------------------------------------------------------- +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/mil/train/index.html b/docs/_modules/slideflow/mil/train/index.html new file mode 100644 index 000000000..24105bd10 --- /dev/null +++ b/docs/_modules/slideflow/mil/train/index.html @@ -0,0 +1,961 @@ + + + + + + + + + + + + slideflow.mil.train — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.mil.train

+"""Training functions for various multi-instance learning (MIL) models."""
+
+import os
+import numpy as np
+import slideflow as sf
+import pandas as pd
+from os.path import join, exists
+from typing import Union, List, Optional, TYPE_CHECKING
+from slideflow import Dataset, log
+from slideflow.util import path_to_name
+from os.path import join, isdir
+
+from .. import utils
+from ..eval import predict_mil, predict_multimodal_mil, generate_attention_heatmaps
+from .._params import TrainerConfig
+
+if TYPE_CHECKING:
+    from fastai.learner import Learner
+
+
+# -----------------------------------------------------------------------------
+
+
[docs]def train_mil( + config: TrainerConfig, + train_dataset: Dataset, + val_dataset: Optional[Dataset], + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + outdir: str = 'mil', + exp_label: Optional[str] = None, + **kwargs +) -> "Learner": + """Train a multiple-instance learning (MIL) model. + + This high-level trainer facilitates training from a given MIL configuration, + using Datasets as input and with input features taken from a given directory + of bags. + + Args: + config (:class:`slideflow.mil.TrainerConfig`): + Trainer and model configuration. + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (str): Either a path to directory with \*.pt files, or a list + of paths to individual \*.pt files. Each file should contain + exported feature vectors, with each file containing all tile + features for one patient. + + Keyword args: + outdir (str): Directory in which to save model and results. + exp_label (str): Experiment label, used for naming the subdirectory + in the ``{project root}/mil`` folder, where training history + and the model will be saved. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Not available for multi-modal MIL models. Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + """ + if not isinstance(config, TrainerConfig): + raise ValueError(f"Unrecognized training configuration of type {type(config)}") + + return config.train( + train_dataset=train_dataset, + val_dataset=val_dataset, + outcomes=outcomes, + bags=bags, + outdir=outdir, + exp_label=exp_label, + **kwargs + )
+ +# ----------------------------------------------------------------------------- + +
[docs]def build_fastai_learner( + config: TrainerConfig, + train_dataset: Dataset, + val_dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, np.ndarray, List[str]], + *, + outdir: str = 'mil', + return_shape: bool = False, + **kwargs +) -> "Learner": + """Build a FastAI Learner for training an MIL model. + + Does not execute training. Useful for customizing a Learner object + prior to training. + + Args: + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (str): list of paths to individual \*.pt files. Each file should + contain exported feature vectors, with each file containing all tile + features for one patient. + + Keyword args: + outdir (str): Directory in which to save model and results. + return_shape (bool): Return the input and output shapes of the model. + Defaults to False. + exp_label (str): Experiment label, used for naming the subdirectory + in the ``outdir`` folder, where training history + and the model will be saved. + lr (float): Learning rate, or maximum learning rate if + ``fit_one_cycle=True``. + epochs (int): Maximum epochs. + **kwargs: Additional keyword arguments to pass to the FastAI learner. + + Returns: + fastai.learner.Learner, and optionally a tuple of input and output shapes + if ``return_shape=True``. + + """ + from . import _fastai + + labels, unique = utils.get_labels((train_dataset, val_dataset), outcomes, config.is_classification()) + + # Prepare bags + if isinstance(bags, str) or (isinstance(bags, list) and isdir(bags[0])): + train_bags = train_dataset.get_bags(bags) + if val_dataset is train_dataset: + bags = train_bags + else: + val_bags = val_dataset.get_bags(bags) + bags = np.concatenate((train_bags, val_bags)) + else: + bags = np.array(bags) + + train_slides = train_dataset.slides() + val_slides = val_dataset.slides() + + if config.aggregation_level == 'slide': + # Aggregate feature bags across slides. + bags, targets, train_idx, val_idx = utils.aggregate_trainval_bags_by_slide( + bags, # type: ignore + labels, + train_slides, + val_slides, + log_manifest=(join(outdir, 'slide_manifest.csv') if outdir else None) + ) + + elif config.aggregation_level == 'patient': + # Associate patients and their slides. + # This is a dictionary where each key is a slide name and each value + # is a patient code. Multiple slides can match to the same patient. + slide_to_patient = { **train_dataset.patients(), + **val_dataset.patients() } + + # Aggregate feature bags across patients. + n_slide_bags = len(bags) + bags, targets, train_idx, val_idx = utils.aggregate_trainval_bags_by_patient( + bags, # type: ignore + labels, + train_slides, + val_slides, + slide_to_patient=slide_to_patient, + log_manifest=(join(outdir, 'slide_manifest.csv') if outdir else None) + ) + log.info(f"Aggregated {n_slide_bags} slide bags to {len(bags)} patient bags.") + + log.info("Training dataset: {} merged bags (from {} possible slides)".format( + len(train_idx), len(train_slides))) + log.info("Validation dataset: {} merged bags (from {} possible slides)".format( + len(val_idx), len(val_slides))) + + # Build FastAI Learner + learner, (n_in, n_out) = _fastai.build_learner( + config, + bags=bags, + targets=targets, + train_idx=train_idx, + val_idx=val_idx, + unique_categories=unique, + outdir=outdir, + **kwargs + ) + if return_shape: + return learner, (n_in, n_out) + else: + return learner
+ + +
[docs]def build_multimodal_learner( + config: TrainerConfig, + train_dataset: Dataset, + val_dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[np.ndarray, List[str]], + *, + outdir: str = 'mil', + return_shape: bool = False, +) -> "Learner": + """Build a multi-magnification FastAI Learner for training an MIL model. + + Does not execute training. Useful for customizing a Learner object + prior to training. + + Args: + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (list(str)): List of bag directories containing \*.pt files, one + directory for each mode. + + Keyword args: + outdir (str): Directory in which to save model and results. + return_shape (bool): Return the input and output shapes of the model. + Defaults to False. + exp_label (str): Experiment label, used for naming the subdirectory + in the ``outdir`` folder, where training history + and the model will be saved. + lr (float): Learning rate, or maximum learning rate if + ``fit_one_cycle=True``. + epochs (int): Maximum epochs. + **kwargs: Additional keyword arguments to pass to the FastAI learner. + + Returns: + fastai.learner.Learner, and optionally a tuple of input and output shapes + if ``return_shape=True``. + + """ + + from . import _fastai + + # Verify bags are in the correct format. + if (not isinstance(bags, (tuple, list)) + or not all([isinstance(b, str) and isdir(b) for b in bags])): + raise ValueError("Expected bags to be a list of paths, got {}".format(type(bags))) + + num_modes = len(bags) + + # Prepare labels and slides + labels, unique = utils.get_labels((train_dataset, val_dataset), outcomes, config.is_classification()) + + # --- Prepare bags -------------------------------------------------------- + + train_bags, train_slides = utils._get_nested_bags(train_dataset, bags) + val_bags, val_slides = utils._get_nested_bags(val_dataset, bags) + + # --- Process bags and targets for training ------------------------------- + + # Note: we are skipping patient-level bag aggregation for now. + # TODO: implement patient-level bag aggregation for multi-modal MIL. + + # Concatenate training and validation bags. + all_bags = np.concatenate((train_bags, val_bags)) # shape: (num_slides, num_modes) + assert all_bags.shape[0] == len(train_slides) + len(val_slides) + all_slides = train_slides + val_slides + targets = np.array([labels[s] for s in all_slides]) + train_idx = np.arange(len(train_slides)) + val_idx = np.arange(len(train_slides), len(all_slides)) + + # Write the slide manifest + if outdir: + sf.util.log_manifest( + train_slides, + val_slides, + labels=labels, + filename=join(outdir, 'slide_manifest.csv'), + remove_extension=False + ) + + # Print a multi-modal dataset summary. + log.info( + "[bold]Multi-modal MIL training summary:[/]" + + "\n - [blue]Modes[/]: {}".format(num_modes) + + "\n - [blue]Slides with bags[/]: {}".format(len(np.unique(all_slides))) + + "\n - [blue]Multi-modal bags[/]: {}".format(all_bags.shape[0]) + + "\n - [blue]Unique categories[/]: {}".format(len(unique)) + + "\n - [blue]Training multi-modal bags[/]: {}".format(len(train_idx)) + + "\n - [blue]Training slides[/]: {}".format(len(np.unique(train_slides))) + + "\n - [blue]Validation multi-modal bags[/]: {}".format(len(val_idx)) + + "\n - [blue]Validation slides[/]: {}".format(len(np.unique(val_slides))) + ) + + # Print a detailed summary of each mode. + for i, mode in enumerate(bags): + try: + bags_config = sf.util.load_json(join(mode, 'bags_config.json')) + except Exception: + log.info( + "Mode {i}: " + + "\n - Bags: {}".format(mode) + ) + else: + log.info( + f"[bold]Mode {i+1}[/]: [green]{mode}[/]" + + "\n - Feature extractor: [purple]{}[/]".format(bags_config['extractor']['class'].split('.')[-1]) + + "\n - Tile size (px): {}".format(bags_config['tile_px']) + + "\n - Tile size (um): {}".format(bags_config['tile_um']) + + "\n - Normalizer: {}".format(bags_config['normalizer']) + ) + + # --- Build FastAI Learner ------------------------------------------------ + + # Build FastAI Learner + learner, (n_in, n_out) = _fastai.build_learner( + config, + all_bags, + targets, + train_idx, + val_idx, + unique_categories=unique, + outdir=outdir, + ) + if return_shape: + return learner, (n_in, n_out) + else: + return learner
+ +# ------------------------------------------------------------------------------ +# Internal training functions. + +def _train_mil( + config: TrainerConfig, + train_dataset: Dataset, + val_dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + outdir: str = 'mil', + attention_heatmaps: bool = False, + uq: bool = False, + device: Optional[str] = None, + **heatmap_kwargs +) -> "Learner": + """Train an MIL model using FastAI. + + Args: + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (str): Either a path to directory with \*.pt files, or a list + of paths to individual \*.pt files. Each file should contain + exported feature vectors, with each file containing all tile + features for one patient. + + Keyword args: + outdir (str): Directory in which to save model and results. + exp_label (str): Experiment label, used for naming the subdirectory + in the ``{project root}/mil`` folder, where training history + and the model will be saved. + lr (float): Learning rate, or maximum learning rate if + ``fit_one_cycle=True``. + epochs (int): Maximum epochs. + attention_heatmaps (bool): Generate attention heatmaps for slides. + Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + Returns: + fastai.learner.Learner + """ + from . import _fastai + + # Prepare validation bags. + if isinstance(bags, str) or (isinstance(bags, list) and isdir(bags[0])): + val_bags = val_dataset.get_bags(bags) + else: + val_bags = np.array([b for b in bags if sf.util.path_to_name(b) in val_dataset.slides()]) + + # Build learner. + learner, (n_in, n_out) = build_fastai_learner( + config, + train_dataset, + val_dataset, + outcomes, + bags=bags, + outdir=outdir, + device=device, + return_shape=True + ) + + # Save MIL settings. + # Attempt to read the unique categories from the learner. + if not hasattr(learner.dls.train_ds, 'encoder'): + unique = None + else: + encoder = learner.dls.train_ds.encoder + if encoder is not None: + unique = encoder.categories_[0].tolist() + else: + unique = None + _log_mil_params(config, outcomes, unique, bags, n_in, n_out, outdir) + + # Train. + _fastai.train(learner, config) + + # Generate validation predictions. + df, attention = predict_mil( + learner.model, + dataset=val_dataset, + config=config, + outcomes=outcomes, + bags=val_bags, + attention=True, + uq=uq, + ) + if outdir: + pred_out = join(outdir, 'predictions.parquet') + df.to_parquet(pred_out) + log.info(f"Predictions saved to [green]{pred_out}[/]") + + # Print classification metrics, including per-category accuracy + utils.rename_df_cols(df, outcomes, categorical=config.is_classification(), inplace=True) + config.run_metrics(df, level='slide', outdir=outdir) + + # Export attention to numpy arrays + if attention and outdir: + utils._export_attention( + join(outdir, 'attention'), + attention, + [path_to_name(b) for b in val_bags] + ) + + # Attention heatmaps. + if attention and attention_heatmaps and outdir: + generate_attention_heatmaps( + outdir=join(outdir, 'heatmaps'), + dataset=val_dataset, + bags=val_bags, + attention=attention, + **heatmap_kwargs + ) + + return learner + + +def _train_multimodal_mil( + config: TrainerConfig, + train_dataset: Dataset, + val_dataset: Optional[Dataset], + outcomes: Union[str, List[str]], + bags: List[str], + *, + outdir: str = 'mil', + exp_label: Optional[str] = None, + attention_heatmaps: bool = False, +): + """Train a multi-modal (e.g. multi-magnification) MIL model.""" + + from . import _fastai + + # Export attention & heatmaps. + if attention_heatmaps: + raise ValueError( + "Attention heatmaps cannot yet be exported for multi-modal " + "models. Please use Slideflow Studio for visualization of " + "multi-modal attention." + ) + + # Build learner. + learner, (n_in, n_out) = build_multimodal_learner( + config, + train_dataset, + val_dataset, + outcomes, + bags=bags, + outdir=outdir, + return_shape=True + ) + + # Save MIL settings. + # Attempt to read the unique categories from the learner. + if not hasattr(learner.dls.train_ds, 'encoder'): + unique = None + else: + encoder = learner.dls.train_ds.encoder + if encoder is not None: + unique = encoder.categories_[0].tolist() + else: + unique = None + _log_mil_params(config, outcomes, unique, bags, n_in, n_out, outdir) + + # Execute training. + _fastai.train(learner, config) + + df, attention = predict_multimodal_mil( + learner.model, + dataset=val_dataset, + config=config, + outcomes=outcomes, + bags=bags, + attention=True + ) + + # Print classification metrics, including per-category accuracy + utils.rename_df_cols(df, outcomes, categorical=config.is_classification(), inplace=True) + config.run_metrics(df, level='slide', outdir=outdir) + + # Export predictions. + if outdir: + pred_out = join(outdir, 'predictions.parquet') + df.to_parquet(pred_out) + log.info(f"Predictions saved to [green]{pred_out}[/]") + + # Export attention. + if attention and outdir: + utils._export_attention(join(outdir, 'attention'), attention, df.slide.values) + + return learner + +# ------------------------------------------------------------------------------ + +def _log_mil_params(config, outcomes, unique, bags, n_in, n_out, outdir=None): + """Log MIL parameters to JSON.""" + mil_params = config.json_dump() + mil_params['outcomes'] = outcomes + if unique is not None: + mil_params['outcome_labels'] = dict(zip(range(len(unique)), unique)) + else: + mil_params['outcome_labels'] = None + mil_params['bags'] = bags + mil_params['input_shape'] = n_in + mil_params['output_shape'] = n_out + if isinstance(bags, str) and exists(join(bags, 'bags_config.json')): + mil_params['bags_extractor'] = sf.util.load_json( + join(bags, 'bags_config.json') + ) + elif isinstance(bags, list): + mil_params['bags_extractor'] = {} + for b in bags: + if isdir(b) and exists(join(b, 'bags_config.json')): + mil_params['bags_extractor'][b] = sf.util.load_json( + join(b, 'bags_config.json') + ) + else: + mil_params['bags_extractor'][b] = None + else: + mil_params['bags_extractor'] = None + if outdir: + sf.util.write_json(mil_params, join(outdir, 'mil_params.json')) + return mil_params +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/extractors/_factory/index.html b/docs/_modules/slideflow/model/extractors/_factory/index.html new file mode 100644 index 000000000..d76e1e8f0 --- /dev/null +++ b/docs/_modules/slideflow/model/extractors/_factory/index.html @@ -0,0 +1,742 @@ + + + + + + + + + + + + slideflow.model.extractors._factory — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.extractors._factory

+"""Factory for building feature extractors."""
+
+import importlib
+import slideflow as sf
+from os.path import join, exists
+from typing import Optional, Tuple, Dict, Any, TYPE_CHECKING
+from slideflow import errors
+from slideflow.model import BaseFeatureExtractor
+
+from ._registry import (is_tensorflow_extractor, is_torch_extractor,
+                        _tf_extractors, _torch_extractors, _extras_extractors)
+from ._factory_tensorflow import build_tensorflow_feature_extractor
+from ._factory_torch import build_torch_feature_extractor
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+
+# -----------------------------------------------------------------------------
+
+
[docs]def build_feature_extractor( + name: str, + backend: Optional[str] = None, + **kwargs +) -> BaseFeatureExtractor: + """Build a feature extractor. + + The returned feature extractor is a callable object, which returns + features (often layer activations) for either a batch of images or a + :class:`slideflow.WSI` object. + + If generating features for a batch of images, images are expected to be in + (B, W, H, C) format and non-standardized (scaled 0-255) with dtype uint8. + The feature extractors perform all needed preprocessing on the fly. + + If generating features for a slide, the slide is expected to be a + :class:`slideflow.WSI` object. The feature extractor will generate features + for each tile in the slide, returning a numpy array of shape (W, H, F), + where F is the number of features. + + Args: + name (str): Name of the feature extractor to build. Available + feature extractors are listed with + :func:`slideflow.model.list_extractors()`. + + Keyword arguments: + tile_px (int): Tile size (input image size), in pixels. + **kwargs (Any): All remaining keyword arguments are passed + to the feature extractor factory function, and may be different + for each extractor. + + Returns: + A callable object which accepts a batch of images (B, W, H, C) of dtype + uint8 and returns a batch of features (dtype float32). + + Examples + Create an extractor that calculates post-convolutional layer activations + from an imagenet-pretrained Resnet50 model. + + .. code-block:: python + + import slideflow as sf + + extractor = sf.build_feature_extractor( + 'resnet50_imagenet' + ) + + Create an extractor that calculates 'conv4_block4_2_relu' activations + from an imagenet-pretrained Resnet50 model. + + .. code-block:: python + + extractor = sf.build_feature_extractor( + 'resnet50_imagenet', + layers='conv4_block4_2_relu + ) + + Create a pretrained "CTransPath" extractor. + + .. code-block:: python + + extractor = sf.build_feature_extractor('ctranspath') + + Use an extractor to calculate layer activations for an entire dataset. + + .. code-block:: python + + import slideflow as sf + + # Load a project and dataset + P = sf.load_project(...) + dataset = P.dataset(...) + + # Create a feature extractor + resnet = sf.build_feature_extractor( + 'resnet50_imagenet' + ) + + # Calculate features for the entire dataset + features = sf.DatasetFeatures( + resnet, + dataset=dataset + ) + + Generate a map of features across a slide. + + .. code-block:: python + + import slideflow as sf + + # Load a slide + wsi = sf.WSI(...) + + # Create a feature extractor + retccl = sf.build_feature_extractor( + 'retccl', + resize=True + ) + + # Create a feature map, a 2D array of shape + # (W, H, F), where F is the number of features. + features = retccl(wsi) + + """ + # Build feature extractor according to manually specified backend + if backend is not None and backend not in ('tensorflow', 'torch'): + raise ValueError(f"Invalid backend: {backend}") + + # Build a feature extractor from a finetuned model + if sf.util.is_tensorflow_model_path(name): + model_config = sf.util.get_model_config(name) + if model_config['hp']['uq']: + from slideflow.model.tensorflow import UncertaintyInterface + return UncertaintyInterface(name, **kwargs) + else: + from slideflow.model.tensorflow import Features + return Features(name, **kwargs) + elif sf.util.is_torch_model_path(name): + model_config = sf.util.get_model_config(name) + if model_config['hp']['uq']: + from slideflow.model.torch import UncertaintyInterface + return UncertaintyInterface(name, **kwargs) + else: + from slideflow.model.torch import Features # noqa: F401 + return Features(name, **kwargs) + + # Build feature extractor with a specific backend + if backend == 'tensorflow': + if not is_tensorflow_extractor(name): + raise errors.InvalidFeatureExtractor( + f"Feature extractor {name} not available in Tensorflow backend") + return build_tensorflow_feature_extractor(name, **kwargs) + elif backend == 'torch': + if not is_torch_extractor(name): + raise errors.InvalidFeatureExtractor( + f"Feature extractor {name} not available in PyTorch backend") + return build_torch_feature_extractor(name, **kwargs) + + # Auto-build feature extractor according to available backends + if is_tensorflow_extractor(name) and is_torch_extractor(name): + sf.log.info( + f"Feature extractor {name} available in both Tensorflow and " + f"PyTorch backends; using active backend {sf.backend()}") + if sf.backend() == 'tensorflow': + return build_tensorflow_feature_extractor(name, **kwargs) + else: + return build_torch_feature_extractor(name, **kwargs) + if is_tensorflow_extractor(name): + return build_tensorflow_feature_extractor(name, **kwargs) + elif is_torch_extractor(name): + return build_torch_feature_extractor(name, **kwargs) + elif name in _extras_extractors: + raise errors.InvalidFeatureExtractor( + "{} requires the package {}, please install with 'pip install {}'".format( + name, _extras_extractors[name], _extras_extractors[name] + )) + else: + raise errors.InvalidFeatureExtractor(f"Unrecognized feature extractor: {name}")
+ + +
[docs]def rebuild_extractor( + bags_or_model: str, + allow_errors: bool = False, + native_normalizer: bool = True +) -> Tuple[Optional["BaseFeatureExtractor"], Optional["StainNormalizer"]]: + """Recreate the extractor used to generate features stored in bags. + + Args: + bags_or_model (str): Either a path to directory containing feature bags, + or a path to a trained MIL model. If a path to a trained MIL model, + the extractor used to generate features will be recreated. + allow_errors (bool): If True, return None if the extractor + cannot be rebuilt. If False, raise an error. Defaults to False. + native_normalizer (bool, optional): Whether to use PyTorch/Tensorflow-native + stain normalization, if applicable. If False, will use the OpenCV/Numpy + implementations. Defaults to True. + + Returns: + Optional[BaseFeatureExtractor]: Extractor function, or None if ``allow_errors`` is + True and the extractor cannot be rebuilt. + + Optional[StainNormalizer]: Stain normalizer used when generating + feature bags, or None if no stain normalization was used. + + """ + # Load bags configuration + is_bag_config = bags_or_model.endswith('bags_config.json') + is_bag_dir = exists(join(bags_or_model, 'bags_config.json')) + is_model_dir = exists(join(bags_or_model, 'mil_params.json')) + if not (is_bag_dir or is_model_dir or is_bag_config): + if allow_errors: + return None, None + else: + raise ValueError( + 'Could not find bags or MIL model configuration at ' + f'{bags_or_model}.' + ) + if is_bag_config: + bags_config = sf.util.load_json(bags_or_model) + elif is_model_dir: + mil_config = sf.util.load_json(join(bags_or_model, 'mil_params.json')) + if 'bags_extractor' not in mil_config: + if allow_errors: + return None, None + else: + raise ValueError( + 'Could not rebuild extractor from configuration at ' + f'{bags_or_model}; missing "bags_extractor" key in ' + 'mil_params.json.' + ) + bags_config = mil_config['bags_extractor'] + else: + bags_config = sf.util.load_json(join(bags_or_model, 'bags_config.json')) + if ('extractor' not in bags_config + or any(n not in bags_config['extractor'] for n in ['class', 'kwargs'])): + if allow_errors: + return None, None + else: + raise ValueError( + 'Could not rebuild extractor from configuration at ' + f'{bags_or_model}; missing "extractor" class or kwargs.' + ) + + # Rebuild extractor + extractor_name = bags_config['extractor']['class'].split('.') + extractor_class = extractor_name[-1] + extractor_kwargs = bags_config['extractor']['kwargs'] + try: + module = importlib.import_module('.'.join(extractor_name[:-1])) + extractor = getattr(module, extractor_class)(**extractor_kwargs) + except Exception: + submodule_name = extractor_name[-2] + if submodule_name in _extras_extractors: + raise errors.InvalidFeatureExtractor( + "{} requires the package {}, please install with 'pip install {}'".format( + submodule_name, + _extras_extractors[submodule_name], + _extras_extractors[submodule_name] + )) + if allow_errors: + return None + else: + raise ValueError( + f'Could not rebuild extractor from configuration at {bags_or_model}.' + ) + + # Rebuild stain normalizer + if bags_config['normalizer'] is not None: + normalizer = sf.norm.autoselect( + bags_config['normalizer']['method'], + backend=(extractor.backend if native_normalizer else 'opencv') + ) + normalizer.set_fit(**bags_config['normalizer']['fit']) + else: + normalizer = None + if (hasattr(extractor, 'normalizer') + and extractor.normalizer is not None + and normalizer is not None): + sf.log.warning( + 'Extractor already has a stain normalizer. Overwriting with ' + 'normalizer from bags configuration.' + ) + extractor.normalizer = normalizer + elif hasattr(extractor, 'normalizer') and extractor.normalizer is not None: + normalizer = extractor.normalizer + + return extractor, normalizer
+ +# ----------------------------------------------------------------------------- + +def extractor_to_config(extractor: BaseFeatureExtractor) -> Dict[str, Any]: + """Return a dictionary of configuration parameters for the extractor. + + These configuration parameters can be used to reconstruct the + feature extractor, using ``build_extractor_from_cfg()``. + + Args: + extractor (BaseFeatureExtractor): Feature extractor. + + Returns: + Dict[str, Any]: Configuration dictionary. + + """ + cfg = extractor.dump_config() + if extractor.backend == 'torch': + cfg['mixed_precision'] = extractor.mixed_precision + cfg['channels_last'] = extractor.channels_last + return cfg + + +def build_extractor_from_cfg( + cfg: Dict[str, Any], + **kwargs: Any +) -> BaseFeatureExtractor: + """Rebuild an extractor from a configuration dictionary. + + Args: + cfg (Dict[str, Any]): Configuration dictionary. + **kwargs (Any): All remaining keyword arguments are passed + to the feature extractor factory function, and may be different + for each extractor. + + Returns: + BaseFeatureExtractor: The rebuilt feature extractor. + + """ + extractor_name = cfg['class'].split('.') + extractor_class = extractor_name[-1] + extractor_kwargs = cfg['kwargs'] + module = importlib.import_module('.'.join(extractor_name[:-1])) + extractor = getattr(module, extractor_class)(**extractor_kwargs, **kwargs) + for k, v in cfg.items(): + if k not in ['class', 'kwargs']: + setattr(extractor, k, v) + return extractor +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/extractors/_registry/index.html b/docs/_modules/slideflow/model/extractors/_registry/index.html new file mode 100644 index 000000000..994dd67d4 --- /dev/null +++ b/docs/_modules/slideflow/model/extractors/_registry/index.html @@ -0,0 +1,483 @@ + + + + + + + + + + + + slideflow.model.extractors._registry — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.extractors._registry

+"""Feature extractor registry."""
+
+_tf_extractors = dict()
+_torch_extractors = dict()
+_known_extras_packages = {
+     'slideflow-gpl': ['retccl', 'ctranspath'],
+     'slideflow-noncommercial': ['gigapath', 'gigapath.tile', 'gigapath.slide', 'histossl', 'plip']
+}
+_extras_extractors = {
+    extractor: package
+    for package, extractors in _known_extras_packages.items()
+    for extractor in extractors
+}
+
+__all__ = ['list_extractors', 'list_tensorflow_extractors', 'list_torch_extractors',
+           'is_extractor', 'is_tensorflow_extractor', 'is_torch_extractor']
+
+# -----------------------------------------------------------------------------
+
+
[docs]def list_extractors(): + """Return a list of all available feature extractors.""" + return list(set(list(_tf_extractors.keys()) + list(_torch_extractors.keys())))
+ +def list_tensorflow_extractors(): + """Return a list of all Tensorflow feature extractors.""" + return list(_tf_extractors.keys()) + +def list_torch_extractors(): + """Return a list of all PyTorch feature extractors.""" + return list(_torch_extractors.keys()) + +def is_extractor(name): + """Checks if a given name is a valid feature extractor.""" + _valid_extractors = list_extractors() + return (name in _valid_extractors or name+'_imagenet' in _valid_extractors) + +def is_tensorflow_extractor(name): + """Checks if a given name is a valid Tensorflow feature extractor.""" + return name in _tf_extractors or name+'_imagenet' in _tf_extractors + +def is_torch_extractor(name): + """Checks if a given name is a valid PyTorch feature extractor.""" + return name in _torch_extractors or name+'_imagenet' in _torch_extractors + +# ----------------------------------------------------------------------------- + +def register_torch(key_name=None): + """Decorator to register a PyTorch feature extractor.""" + + def decorator(fn): + # Use the custom key name if provided, otherwise use the function's name + name = key_name if isinstance(key_name, str) else fn.__name__ + _torch_extractors[name] = fn + return fn + + # If the decorator is used without arguments, the key_name will be the function itself + if callable(key_name): + return decorator(key_name) + + return decorator + +def register_tf(key_name=None): + """Decorator to register a Tensorflow feature extractor.""" + + def decorator(fn): + # Use the custom key name if provided, otherwise use the function's name + name = key_name if isinstance(key_name, str) else fn.__name__ + _tf_extractors[name] = fn + return fn + + # If the decorator is used without arguments, the key_name will be the function itself + if callable(key_name): + return decorator(key_name) + + return decorator +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/features/index.html b/docs/_modules/slideflow/model/features/index.html new file mode 100644 index 000000000..8adaa5f9a --- /dev/null +++ b/docs/_modules/slideflow/model/features/index.html @@ -0,0 +1,2220 @@ + + + + + + + + + + + + slideflow.model.features — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.features

+import csv
+import os
+import pickle
+import queue
+import sys
+import threading
+import time
+import warnings
+import multiprocessing as mp
+from collections import defaultdict
+from math import isnan
+from os.path import exists, join
+from typing import (
+    TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union, Iterable, Callable
+)
+
+import numpy as np
+import pandas as pd
+import scipy.stats as stats
+import slideflow as sf
+from rich.progress import track, Progress
+from slideflow import errors
+from slideflow.util import log, Labels, ImgBatchSpeedColumn, tfrecord2idx
+from .base import BaseFeatureExtractor
+
+
+if TYPE_CHECKING:
+    import tensorflow as tf
+    import torch
+
+
+# -----------------------------------------------------------------------------
+
+
[docs]class DatasetFeatures: + + """Loads annotations, saved layer activations / features, and prepares + output saving directories. Will also read/write processed features to a + PKL cache file to save time in future iterations. + + Note: + Storing predictions along with layer features is optional, to offer the user + reduced memory footprint. For example, saving predictions for a 10,000 slide + dataset with 1000 categorical outcomes would require: + + 4 bytes/float32-logit + * 1000 predictions/slide + * 3000 tiles/slide + * 10000 slides + ~= 112 GB + """ + + def __init__( + self, + model: Union[str, "tf.keras.models.Model", "torch.nn.Module"], + dataset: "sf.Dataset", + *, + labels: Optional[Labels] = None, + cache: Optional[str] = None, + annotations: Optional[Labels] = None, + **kwargs: Any + ) -> None: + + """Calculate features / layer activations from model, storing to + internal parameters ``self.activations``, and ``self.predictions``, + ``self.locations``, dictionaries mapping slides to arrays of activations, + predictions, and locations for each tiles' constituent tiles. + + Args: + model (str): Path to model from which to calculate activations. + dataset (:class:`slideflow.Dataset`): Dataset from which to + generate activations. + labels (dict, optional): Dict mapping slide names to outcome + categories. + cache (str, optional): File for PKL cache. + + Keyword Args: + augment (bool, str, optional): Whether to use data augmentation + during feature extraction. If True, will use default + augmentation. If str, will use augmentation specified by the + string. Defaults to None. + batch_size (int): Batch size for activations calculations. + Defaults to 32. + device (str, optional): Device to use for feature extraction. + Only used for PyTorch feature extractors. Defaults to None. + include_preds (bool): Calculate and store predictions. + Defaults to True. + include_uncertainty (bool, optional): Whether to include model + uncertainty in the output. Only used if the feature generator + is a UQ-enabled model. Defaults to True. + layers (str, list(str)): Layers to extract features from. May be + the name of a single layer (str) or a list of layers (list). + Only used if model is a str. Defaults to 'postconv'. + normalizer ((str or :class:`slideflow.norm.StainNormalizer`), optional): + Stain normalization strategy to use on image tiles prior to + feature extraction. This argument is invalid if ``model`` is a + feature extractor built from a trained model, as stain + normalization will be specified by the model configuration. + Defaults to None. + normalizer_source (str, optional): Stain normalization preset + or path to a source image. Valid presets include 'v1', 'v2', + and 'v3'. If None, will use the default present ('v3'). + This argument is invalid if ``model`` is a feature extractor + built from a trained model. Defaults to None. + num_workers (int, optional): Number of workers to use for feature + extraction. Only used for PyTorch feature extractors. Defaults + to None. + pool_sort (bool): Use multiprocessing pools to perform final + sorting. Defaults to True. + progress (bool): Show a progress bar during feature calculation. + Defaults to True. + transform (Callable, optional): Custom transform to apply to + images. Applied before standardization. If the feature extractor + is a PyTorch model, the transform should be a torchvision + transform. + verbose (bool): Show verbose logging output. Defaults to True. + + Examples + Calculate features using a feature extractor. + + .. code-block:: python + + import slideflow as sf + + # Create a feature extractor + retccl = sf.build_feature_extractor('retccl', resize=True) + + # Load a dataset + P = sf.load_project(...) + dataset = P.dataset(...) + + # Calculate features + dts_ftrs = sf.DatasetFeatures(retccl, dataset) + + Calculate features using a trained model (preferred). + + .. code-block:: python + + import slideflow as sf + + # Create a feature extractor from the saved model. + extractor = sf.build_feature_extractor( + '/path/to/trained_model.zip', + layers=['postconv'] + ) + + # Calculate features across the dataset + dts_ftrs = sf.DatasetFeatures(extractor, dataset) + + Calculate features using a trained model (legacy). + + .. code-block:: python + + # This method is deprecated, and will be removed in a + # future release. Please use the method above instead. + dts_ftrs = sf.DatasetFeatures( + '/path/to/trained_model.zip', + dataset=dataset, + layers=['postconv'] + ) + + Calculate features from a loaded model. + + .. code-block:: python + + import tensorflow as tf + import slideflow as sf + + # Load a model + model = tf.keras.models.load_model('/path/to/model.h5') + + # Calculate features + dts_ftrs = sf.DatasetFeatures( + model, + layers=['postconv'], + dataset + ) + + """ + self.activations = defaultdict(list) # type: Dict[str, Any] + self.predictions = defaultdict(list) # type: Dict[str, Any] + self.uncertainty = defaultdict(list) # type: Dict[str, Any] + self.locations = defaultdict(list) # type: Dict[str, Any] + self.num_features = 0 + self.num_classes = 0 + self.model = model + self.dataset = dataset + self.feature_generator = None + if dataset is not None: + self.tile_px = dataset.tile_px + self.manifest = dataset.manifest() + self.tfrecords = np.array(dataset.tfrecords()) + else: + # Used when creating via DatasetFeatures.from_df(), + # otherwise dataset should not be None. + self.tile_px = None + self.manifest = dict() + self.tfrecords = [] + self.slides = sorted([sf.util.path_to_name(t) for t in self.tfrecords]) + + if labels is not None and annotations is not None: + raise DeprecationWarning( + 'Cannot supply both "labels" and "annotations" to sf.DatasetFeatures. ' + '"annotations" is deprecated and has been replaced with "labels".' + ) + elif annotations is not None: + warnings.warn( + 'The "annotations" argument to sf.DatasetFeatures is deprecated.' + 'Please use the argument "labels" instead.', + DeprecationWarning + ) + self.labels = annotations + else: + self.labels = labels + + if self.labels: + self.categories = list(set(self.labels.values())) + if self.activations: + for slide in self.slides: + try: + if self.activations[slide]: + used = (self.used_categories + + [self.labels[slide]]) + self.used_categories = list(set(used)) # type: List[Union[str, int, List[float]]] + self.used_categories.sort() + except KeyError: + raise KeyError(f"Slide {slide} not in labels.") + total = len(self.used_categories) + cat_list = ", ".join([str(c) for c in self.used_categories]) + log.debug(f'Observed categories (total: {total}): {cat_list}') + else: + self.categories = [] + self.used_categories = [] + + # Load from PKL (cache) if present + if cache and exists(cache): + self.load_cache(cache) + + # Otherwise will need to generate new activations from a given model + elif model is not None: + self._generate_features(cache=cache, **kwargs) + + # Now delete slides not included in our filtered TFRecord list + loaded_slides = list(self.activations.keys()) + for loaded_slide in loaded_slides: + if loaded_slide not in self.slides: + log.debug( + f'Removing activations from slide {loaded_slide} ' + 'slide not in the filtered tfrecords list' + ) + self.remove_slide(loaded_slide) + + # Now screen for missing slides in activations + missing = [] + for slide in self.slides: + if slide not in self.activations: + missing += [slide] + elif not len(self.activations[slide]): + missing += [slide] + num_loaded = len(self.slides)-len(missing) + log.debug( + f'Loaded activations from {num_loaded}/{len(self.slides)} ' + f'slides ({len(missing)} missing)' + ) + if missing: + log.warning(f'Activations missing for {len(missing)} slides') + + # Record which categories have been included in the specified tfrecords + if self.categories and self.labels: + self.used_categories = list(set([ + self.labels[slide] + for slide in self.slides + ])) + self.used_categories.sort() + + total = len(self.used_categories) + cat_list = ", ".join([str(c) for c in self.used_categories]) + log.debug(f'Observed categories (total: {total}): {cat_list}') + + # Show total number of features + if self.num_features is None: + self.num_features = self.activations[self.slides[0]].shape[-1] + log.debug(f'Number of activation features: {self.num_features}') + + @classmethod + def from_df(cls, df: "pd.core.frame.DataFrame") -> "DatasetFeatures": + """Load DataFrame of features, as exported by :meth:`DatasetFeatures.to_df()` + + Args: + df (:class:`pandas.DataFrame`): DataFrame of features, as exported by + :meth:`DatasetFeatures.to_df()` + + Returns: + :class:`DatasetFeatures`: DatasetFeatures object + + Examples + Recreate DatasetFeatures after export to a DataFrame. + + >>> df = features.to_df() + >>> new_features = DatasetFeatures.from_df(df) + + """ + obj = cls(None, None) # type: ignore + obj.slides = df.slide.unique().tolist() + if 'activations' in df.columns: + obj.activations = { + s: np.stack(df.loc[df.slide==s].activations.values) + for s in obj.slides + } + obj.num_features = next(df.iterrows())[1].activations.shape[0] + if 'locations' in df.columns: + obj.locations = { + s: np.stack(df.loc[df.slide==s].locations.values) + for s in obj.slides + } + if 'uncertainty' in df.columns: + obj.uncertainty = { + s: np.stack(df.loc[df.slide==s].uncertainty.values) + for s in obj.slides + } + if 'predictions' in df.columns: + obj.predictions = { + s: np.stack(df.loc[df.slide==s].predictions.values) + for s in obj.slides + } + obj.num_classes = next(df.iterrows())[1].predictions.shape[0] + return obj + + @classmethod + def from_bags(cls, bags: str) -> "DatasetFeatures": + """Load a DatasetFeatures object from a directory of bags. + + Args: + bags (str): Path to bags, as exported by :meth:`DatasetFeatures.to_torch()` + + Returns: + :class:`DatasetFeatures`: DatasetFeatures object + + """ + import torch + slides = [sf.util.path_to_name(b) for b in os.listdir(bags) if b.endswith('.pt')] + obj = cls(None, None) + obj.slides = slides + for slide in slides: + activations = torch.load(join(bags, f'{slide}.pt')) + obj.activations[slide] = activations.numpy() + obj.locations[slide] = tfrecord2idx.load_index(join(bags, f'{slide}.index')) + return obj + + @classmethod + def concat( + cls, + args: Iterable["DatasetFeatures"], + ) -> "DatasetFeatures": + """Concatenate activations from multiple DatasetFeatures together. + + For example, if ``df1`` is a DatasetFeatures object with 2048 features + and ``df2`` is a DatasetFeatures object with 1024 features, + then ``sf.DatasetFeatures.concat([df1, df2])`` would return an object + with 3072. + + Vectors from DatasetFeatures objects are concatenated in the given order. + During concatenation, predictions and uncertainty are dropped. + + If there are any tiles that do not have calculated features in both + dataframes, these will be dropped. + + Args: + args (Iterable[:class:`DatasetFeatures`]): DatasetFeatures objects + to concatenate. + + Returns: + :class:`DatasetFeatures`: DatasetFeatures object with concatenated + features. + + Examples + Concatenate two DatasetFeatures objects. + + >>> df1 = DatasetFeatures(model, dataset, layers='postconv') + >>> df2 = DatasetFeatures(model, dataset, layers='sepconv_3') + >>> df = DatasetFeatures.concat([df1, df2]) + + """ + assert len(args) > 1 + dfs = [] + for f, ftrs in enumerate(args): + log.debug(f"Creating dataframe {f} from features...") + dfs.append(ftrs.to_df()) + if not all([len(df) == len(dfs[0]) for df in dfs]): + raise ValueError( + "Unable to concatenate DatasetFeatures of different lengths " + f"(got: {', '.join([str(len(_df)) for _df in dfs])})" + ) + log.debug(f"Created {len(dfs)} dataframes") + for i in range(len(dfs)): + log.debug(f"Mapping tuples for df {i}") + dfs[i]['locations'] = dfs[i]['locations'].map(tuple) + for i in range(1, len(dfs)): + log.debug(f"Merging dataframe {i}") + dfs[0] = pd.merge( + dfs[0], + dfs[i], + how='inner', + left_on=['slide', 'locations', 'tfr_index'], + right_on=['slide', 'locations', 'tfr_index'], + suffixes=['_1', '_2'] + ) + log.debug("Dropping merged columns") + to_drop = [c for c in dfs[0].columns + if ('predictions' in c or 'uncertainty' in c)] + dfs[0].drop(columns=to_drop, inplace=True) + log.debug("Concatenating activations") + act1 = np.stack(dfs[0]['activations_1'].values) + act2 = np.stack(dfs[0]['activations_2'].values) + log.debug(f"Act 1 shape: {act1.shape}") + log.debug(f"Act 2 shape: {act2.shape}") + concatenated = np.concatenate((act1, act2), axis=1) + as_list = [_c for _c in concatenated] + dfs[0]['activations'] = as_list + log.debug("Dropping old columns") + dfs[0].drop(columns=['activations_1', 'activations_2'], inplace=True) + log.debug("Sorting by TFRecord index") + dfs[0].sort_values('tfr_index', inplace=True) + log.debug("Creating DatasetFeatures object") + return DatasetFeatures.from_df(dfs[0]) + + @property + def uq(self) -> bool: + if self.feature_generator is None: + return None + else: + return self.feature_generator.uq + + @property + def normalizer(self): + if self.feature_generator is None: + return None + else: + return self.feature_generator.normalizer + + def _generate_features( + self, + cache: Optional[str] = None, + progress: bool = True, + verbose: bool = True, + pool_sort: bool = True, + pb: Optional[Progress] = None, + **kwargs + ) -> None: + + """Calculates activations from a given model, saving to self.activations + + Args: + model (str): Path to Tensorflow model from which to calculate final + layer activations. + layers (str, optional): Layers from which to generate activations. + Defaults to 'postconv'. + include_preds (bool, optional): Include logit predictions. + Defaults to True. + include_uncertainty (bool, optional): Include uncertainty + estimation if UQ enabled. Defaults to True. + batch_size (int, optional): Batch size to use during activations + calculations. Defaults to 32. + progress (bool): Show a progress bar during feature calculation. + Defaults to True. + verbose (bool): Show verbose logging output. Defaults to True. + pool_sort (bool): Use multiprocessing pools to perform final + sorting. Defaults to True. + cache (str, optional): File in which to store PKL cache. + """ + + fg = self.feature_generator = _FeatureGenerator( + self.model, + self.dataset, + **kwargs + ) + self.num_features = fg.num_features + self.num_classes = fg.num_classes + + # Calculate final layer activations for each tfrecord + fla_start_time = time.time() + + activations, predictions, locations, uncertainty = fg.generate( + progress=progress, pb=pb, verbose=verbose + ) + + self.activations = {s: np.stack(v) for s, v in activations.items()} + self.predictions = {s: np.stack(v) for s, v in predictions.items()} + self.locations = {s: np.stack(v) for s, v in locations.items()} + self.uncertainty = {s: np.stack(v) for s, v in uncertainty.items()} + + # Sort using TFRecord location information, + # to ensure dictionary indices reflect TFRecord indices + if fg.tfrecords_have_loc: + slides_to_sort = [ + s for s in self.slides + if (self.activations[s].size + or not self.predictions[s].size + or not self.locations[s].size + or not self.uncertainty[s].size) + ] + if pool_sort and len(slides_to_sort) > 1: + pool = mp.Pool(sf.util.num_cpu()) + imap_iterable = pool.imap( + self.dataset.get_tfrecord_locations, slides_to_sort + ) + else: + pool = None + imap_iterable = map( + self.dataset.get_tfrecord_locations, slides_to_sort + ) + if progress and not pb: + iterable = track( + imap_iterable, + transient=False, + total=len(slides_to_sort), + description="Sorting...") + else: + iterable = imap_iterable + + for i, true_locs in enumerate(iterable): + slide = slides_to_sort[i] + # Get the order of locations stored in TFRecords, + # and the corresponding indices for sorting + cur_locs = self.locations[slide] + idx = [true_locs.index(tuple(cur_locs[i])) for i in range(cur_locs.shape[0])] + + # Make sure that the TFRecord indices are continuous, otherwise + # our sorted indices will be inaccurate + assert max(idx)+1 == len(idx) + + # Final sorting + sorted_idx = np.argsort(idx) + if slide in self.activations: + self.activations[slide] = self.activations[slide][sorted_idx] + if slide in self.predictions: + self.predictions[slide] = self.predictions[slide][sorted_idx] + if slide in self.uncertainty: + self.uncertainty[slide] = self.uncertainty[slide][sorted_idx] + self.locations[slide] = self.locations[slide][sorted_idx] + if pool is not None: + pool.close() + + fla_calc_time = time.time() + log.debug(f'Calculation time: {fla_calc_time-fla_start_time:.0f} sec') + log.debug(f'Number of activation features: {self.num_features}') + + if cache: + self.save_cache(cache) + + def activations_by_category( + self, + idx: int + ) -> Dict[Union[str, int, List[float]], np.ndarray]: + """For each outcome category, calculates activations of a given + feature across all tiles in the category. Requires annotations to + have been provided. + + Args: + idx (int): Index of activations layer to return, stratified by + outcome category. + + Returns: + dict: Dict mapping categories to feature activations for all + tiles in the category. + """ + + if not self.categories: + raise errors.FeaturesError( + 'Unable to calculate by category; annotations not provided.' + ) + + def act_by_cat(c): + return np.concatenate([ + self.activations[pt][:, idx] + for pt in self.slides + if self.labels[pt] == c + ]) + return {c: act_by_cat(c) for c in self.used_categories} + + def box_plots(self, features: List[int], outdir: str) -> None: + """Generates plots comparing node activations at slide- and tile-level. + + Args: + features (list(int)): List of feature indices for which to + generate box plots. + outdir (str): Path to directory in which to save box plots. + """ + import matplotlib.pyplot as plt + import seaborn as sns + + if not isinstance(features, list): + raise ValueError("'features' must be a list of int.") + if not self.categories: + log.warning('Unable to generate box plots; no annotations loaded.') + return + if not os.path.exists(outdir): + os.makedirs(outdir) + + _, _, category_stats = self.stats() + + log.info('Generating box plots...') + for f in features: + # Display tile-level box plots & stats + plt.clf() + boxplot_data = list(self.activations_by_category(f).values()) + snsbox = sns.boxplot(data=boxplot_data) + title = f'{f} (tile-level)' + snsbox.set_title(title) + snsbox.set(xlabel='Category', ylabel='Activation') + plt.xticks(plt.xticks()[0], self.used_categories) + boxplot_filename = join(outdir, f'boxplot_{title}.png') + plt.gcf().canvas.start_event_loop(sys.float_info.min) + plt.savefig(boxplot_filename, bbox_inches='tight') + + # Print slide_level box plots & stats + plt.clf() + snsbox = sns.boxplot(data=[c[:, f] for c in category_stats]) + title = f'{f} (slide-level)' + snsbox.set_title(title) + snsbox.set(xlabel='Category', ylabel='Average tile activation') + plt.xticks(plt.xticks()[0], self.used_categories) + boxplot_filename = join(outdir, f'boxplot_{title}.png') + plt.gcf().canvas.start_event_loop(sys.float_info.min) + plt.savefig(boxplot_filename, bbox_inches='tight') + + def dump_config(self): + """Return a dictionary of the feature extraction configuration.""" + if self.normalizer: + norm_dict = dict( + method=self.normalizer.method, + fit=self.normalizer.get_fit(as_list=True), + ) + else: + norm_dict = None + config = dict( + extractor=self.feature_generator.generator.dump_config(), + normalizer=norm_dict, + num_features=self.num_features, + tile_px=self.dataset.tile_px, + tile_um=self.dataset.tile_um + ) + return config + + def export_to_torch(self, *args, **kwargs): + """Deprecated function; please use `.to_torch()`""" + warnings.warn( + "Deprecation warning: DatasetFeatures.export_to_torch() will" + " be removed in a future version. Use .to_torch() instead.", + DeprecationWarning + ) + self.to_torch(*args, **kwargs) + + def save_cache(self, path: str): + """Cache calculated activations to file. + + Args: + path (str): Path to pkl. + """ + with open(path, 'wb') as pt_pkl_file: + pickle.dump( + [self.activations, + self.predictions, + self.uncertainty, + self.locations], + pt_pkl_file + ) + log.info(f'Data cached to [green]{path}') + + def to_csv( + self, + filename: str, + level: str = 'tile', + method: str = 'mean', + slides: Optional[List[str]] = None + ): + """Exports calculated activations to csv. + + Args: + filename (str): Path to CSV file for export. + level (str): 'tile' or 'slide'. Indicates whether tile or + slide-level activations are saved. Defaults to 'tile'. + method (str): Method of summarizing slide-level results. Either + 'mean' or 'median'. Defaults to 'mean'. + slides (list(str)): Slides to export. If None, exports all slides. + Defaults to None. + """ + if level not in ('tile', 'slide'): + raise errors.FeaturesError(f"Export error: unknown level {level}") + + meth_fn = {'mean': np.mean, 'median': np.median} + slides = self.slides if not slides else slides + + with open(filename, 'w') as outfile: + csvwriter = csv.writer(outfile) + logit_header = [f'Class_{log}' for log in range(self.num_classes)] + feature_header = [f'Feature_{f}' for f in range(self.num_features)] + header = ['Slide'] + logit_header + feature_header + csvwriter.writerow(header) + for slide in track(slides): + if level == 'tile': + for i, tile_act in enumerate(self.activations[slide]): + if self.num_classes and self.predictions[slide] != []: + csvwriter.writerow( + [slide] + + self.predictions[slide][i].tolist() + + tile_act.tolist() + ) + else: + csvwriter.writerow([slide] + tile_act.tolist()) + else: + act = meth_fn[method]( + self.activations[slide], + axis=0 + ).tolist() + if self.num_classes and self.predictions[slide] != []: + logit = meth_fn[method]( + self.predictions[slide], + axis=0 + ).tolist() + csvwriter.writerow([slide] + logit + act) + else: + csvwriter.writerow([slide] + act) + log.debug(f'Activations saved to [green]{filename}') + + def to_torch( + self, + outdir: str, + slides: Optional[List[str]] = None, + verbose: bool = True + ) -> None: + """Export activations in torch format to .pt files in the directory. + + Used for training MIL models. + + Args: + outdir (str): Path to directory in which to save .pt files. + verbose (bool): Verbose logging output. Defaults to True. + + """ + import torch + + if not exists(outdir): + os.makedirs(outdir) + slides = self.slides if not slides else slides + for slide in (slides if not verbose else track(slides)): + if not len(self.activations[slide]): + log.info(f'Skipping empty slide [green]{slide}') + continue + slide_activations = torch.from_numpy( + self.activations[slide].astype(np.float32) + ) + torch.save(slide_activations, join(outdir, f'{slide}.pt')) + tfrecord2idx.save_index( + self.locations[slide], + join(outdir, f'{slide}.index') + ) + + # Log the feature extraction configuration + config = self.dump_config() + if exists(join(outdir, 'bags_config.json')): + old_config = sf.util.load_json(join(outdir, 'bags_config.json')) + if old_config != config: + log.warning( + "Feature extraction configuration does not match the " + "configuration used to generate the existing bags at " + f"{outdir}. Current configuration will not be saved." + ) + else: + sf.util.write_json(config, join(outdir, 'bags_config.json')) + + log_fn = log.info if verbose else log.debug + log_fn(f'Activations exported in Torch format to {outdir}') + + def to_df( + self + ) -> pd.core.frame.DataFrame: + """Export activations, predictions, uncertainty, and locations to + a pandas DataFrame. + + Returns: + pd.core.frame.DataFrame: Dataframe with columns 'activations', + 'predictions', 'uncertainty', and 'locations'. + """ + + index = [s for s in self.slides + for _ in range(len(self.locations[s]))] + df_dict = dict() + df_dict.update({ + 'locations': pd.Series([ + self.locations[s][i] + for s in self.slides + for i in range(len(self.locations[s]))], index=index) + }) + df_dict.update({ + 'tfr_index': pd.Series([ + i + for s in self.slides + for i in range(len(self.locations[s]))], index=index) + }) + if self.activations: + df_dict.update({ + 'activations': pd.Series([ + self.activations[s][i] + for s in self.slides + for i in range(len(self.activations[s]))], index=index) + }) + if self.predictions: + df_dict.update({ + 'predictions': pd.Series([ + self.predictions[s][i] + for s in self.slides + for i in range(len(self.predictions[s]))], index=index) + }) + if self.uncertainty: + df_dict.update({ + 'uncertainty': pd.Series([ + self.uncertainty[s][i] + for s in self.slides + for i in range(len(self.uncertainty[s]))], index=index) + }) + df = pd.DataFrame(df_dict) + df['slide'] = df.index + return df + + def load_cache(self, path: str): + """Load cached activations from PKL. + + Args: + path (str): Path to pkl cache. + """ + log.info(f'Loading from cache [green]{path}...') + with open(path, 'rb') as pt_pkl_file: + loaded_pkl = pickle.load(pt_pkl_file) + self.activations = loaded_pkl[0] + self.predictions = loaded_pkl[1] + self.uncertainty = loaded_pkl[2] + self.locations = loaded_pkl[3] + if self.activations: + self.num_features = self.activations[self.slides[0]].shape[-1] + if self.predictions: + self.num_classes = self.predictions[self.slides[0]].shape[-1] + + def stats( + self, + outdir: Optional[str] = None, + method: str = 'mean', + threshold: float = 0.5 + ) -> Tuple[Dict[int, Dict[str, float]], + Dict[int, Dict[str, float]], + List[np.ndarray]]: + """Calculates activation averages across categories, as well as + tile-level and patient-level statistics, using ANOVA, exporting to + CSV if desired. + + Args: + outdir (str, optional): Path to directory in which CSV file will + be saved. Defaults to None. + method (str, optional): Indicates method of aggregating tile-level + data into slide-level data. Either 'mean' (default) or + 'threshold'. If mean, slide-level feature data is calculated by + averaging feature activations across all tiles. If threshold, + slide-level feature data is calculated by counting the number + of tiles with feature activations > threshold and dividing by + the total number of tiles. Defaults to 'mean'. + threshold (float, optional): Threshold if using 'threshold' method. + + Returns: + A tuple containing + + dict: Dict mapping slides to dict of slide-level features; + + dict: Dict mapping features to tile-level statistics ('p', 'f'); + + dict: Dict mapping features to slide-level statistics ('p', 'f'); + """ + + if not self.categories: + raise errors.FeaturesError('No annotations loaded') + if method not in ('mean', 'threshold'): + raise errors.FeaturesError(f"Stats method {method} unknown") + if not self.labels: + raise errors.FeaturesError("No annotations provided, unable" + "to calculate feature stats.") + + log.info('Calculating activation averages & stats across features...') + + tile_stats = {} + pt_stats = {} + category_stats = [] + activation_stats = {} + for slide in self.slides: + if method == 'mean': + # Mean of each feature across tiles + summarized = np.mean(self.activations[slide], axis=0) + elif method == 'threshold': + # For each feature, count number of tiles with value above + # threshold, divided by number of tiles + act_sum = np.sum((self.activations[slide] > threshold), axis=0) + summarized = act_sum / self.activations[slide].shape[-1] + activation_stats[slide] = summarized + for c in self.used_categories: + category_stats += [np.array([ + activation_stats[slide] + for slide in self.slides + if self.labels[slide] == c + ])] + + for f in range(self.num_features): + # Tile-level ANOVA + stats_vals = list(self.activations_by_category(f).values()) + with warnings.catch_warnings(): + if hasattr(stats, "F_onewayConstantInputWarning"): + warnings.simplefilter( + "ignore", + category=stats.F_onewayConstantInputWarning) + elif hasattr(stats, "ConstantInputWarning"): + warnings.simplefilter( + "ignore", + category=stats.ConstantInputWarning) + fvalue, pvalue = stats.f_oneway(*stats_vals) + if not isnan(fvalue) and not isnan(pvalue): + tile_stats.update({f: {'f': fvalue, + 'p': pvalue}}) + else: + tile_stats.update({f: {'f': -1, + 'p': 1}}) + # Patient-level ANOVA + fvalue, pvalue = stats.f_oneway(*[c[:, f] for c in category_stats]) + if not isnan(fvalue) and not isnan(pvalue): + pt_stats.update({f: {'f': fvalue, + 'p': pvalue}}) + else: + pt_stats.update({f: {'f': -1, + 'p': 1}}) + try: + pt_sorted_ft = sorted( + range(self.num_features), + key=lambda f: pt_stats[f]['p'] + ) + except Exception: + log.warning('No stats calculated; unable to sort features.') + + for f in range(self.num_features): + try: + log.debug(f"Tile-level P-value ({f}): {tile_stats[f]['p']}") + log.debug(f"Patient-level P-value: ({f}): {pt_stats[f]['p']}") + except Exception: + log.warning(f'No stats calculated for feature {f}') + + # Export results + if outdir: + if not exists(outdir): + os.makedirs(outdir) + filename = join(outdir, 'slide_level_summary.csv') + log.info(f'Writing results to [green]{filename}[/]...') + with open(filename, 'w') as outfile: + csv_writer = csv.writer(outfile) + header = (['slide', 'category'] + + [f'Feature_{n}' for n in pt_sorted_ft]) + csv_writer.writerow(header) + for slide in self.slides: + category = self.labels[slide] + row = ([slide, category] + + list(activation_stats[slide][pt_sorted_ft])) + csv_writer.writerow(row) + if tile_stats: + csv_writer.writerow( + ['Tile statistic', 'ANOVA P-value'] + + [tile_stats[n]['p'] for n in pt_sorted_ft] + ) + csv_writer.writerow( + ['Tile statistic', 'ANOVA F-value'] + + [tile_stats[n]['f'] for n in pt_sorted_ft] + ) + if pt_stats: + csv_writer.writerow( + ['Slide statistic', 'ANOVA P-value'] + + [pt_stats[n]['p'] for n in pt_sorted_ft] + ) + csv_writer.writerow( + ['Slide statistic', 'ANOVA F-value'] + + [pt_stats[n]['f'] for n in pt_sorted_ft] + ) + return tile_stats, pt_stats, category_stats + + def softmax_mean(self) -> Dict[str, np.ndarray]: + """Calculates the mean prediction vector (post-softmax) across + all tiles in each slide. + + Returns: + dict: This is a dictionary mapping slides to the mean logits + array for all tiles in each slide. + """ + + return {s: np.mean(v, axis=0) for s, v in self.predictions.items()} + + def softmax_percent( + self, + prediction_filter: Optional[List[int]] = None + ) -> Dict[str, np.ndarray]: + """Returns dictionary mapping slides to a vector of length num_classes + with the percent of tiles in each slide predicted to be each outcome. + + Args: + prediction_filter: (optional) List of int. If provided, will + restrict predictions to only these categories, with final + prediction being based based on highest logit among these + categories. + + Returns: + dict: This is a dictionary mapping slides to an array of + percentages for each logit, of length num_classes + """ + + if prediction_filter: + assert isinstance(prediction_filter, list) and all([ + isinstance(i, int) + for i in prediction_filter + ]) + assert max(prediction_filter) <= self.num_classes + else: + prediction_filter = list(range(self.num_classes)) + + slide_percentages = {} + for slide in self.predictions: + # Find the index of the highest prediction for each tile, only for + # logits within prediction_filter + tile_pred = np.argmax( + self.predictions[slide][:, prediction_filter], + axis=1 + ) + slide_perc = np.array([ + np.count_nonzero(tile_pred == logit) / len(tile_pred) + for logit in range(self.num_classes) + ]) + slide_percentages.update({slide: slide_perc}) + return slide_percentages + + def softmax_predict( + self, + prediction_filter: Optional[List[int]] = None + ) -> Dict[str, int]: + """Returns slide-level predictions, assuming the model is predicting a + categorical outcome, by generating a prediction for each individual + tile, and making a slide-level prediction by finding the most + frequently predicted outcome among its constituent tiles. + + Args: + prediction_filter: (optional) List of int. If provided, will + restrict predictions to only these categories, with final + prediction based based on highest logit among these categories. + + Returns: + dict: Dictionary mapping slide names to slide-level predictions. + """ + if prediction_filter: + assert isinstance(prediction_filter, list) + assert all([isinstance(i, int) for i in prediction_filter]) + assert max(prediction_filter) <= self.num_classes + else: + prediction_filter = list(range(self.num_classes)) + + slide_predictions = {} + for slide in self.predictions: + # Find the index of the highest prediction for each tile, only for + # logits within prediction_filter + tile_pred = np.argmax( + self.predictions[slide][:, prediction_filter], + axis=1 + ) + slide_perc = np.array([ + np.count_nonzero(tile_pred == logit) / len(tile_pred) + for logit in range(self.num_classes) + ]) + slide_predictions.update({slide: int(np.argmax(slide_perc))}) + return slide_predictions + + def map_activations(self, **kwargs) -> "sf.SlideMap": + """Map activations with UMAP. + + Keyword args: + ... + + Returns: + sf.SlideMap + + """ + return sf.SlideMap.from_features(self, **kwargs) + + def map_predictions( + self, + x: int = 0, + y: int = 0, + **kwargs + ) -> "sf.SlideMap": + """Map tile predictions onto x/y coordinate space. + + Args: + x (int, optional): Outcome category id for which predictions will + be mapped to the X-axis. Defaults to 0. + y (int, optional): Outcome category id for which predictions will + be mapped to the Y-axis. Defaults to 0. + + Keyword args: + cache (str, optional): Path to parquet file to cache coordinates. + Defaults to None (caching disabled). + + Returns: + sf.SlideMap + + """ + all_x, all_y, all_slides, all_tfr_idx = [], [], [], [] + for slide in self.slides: + all_x.append(self.predictions[slide].values[:, x]) + all_y.append(self.predictions[slide].values[:, y]) + all_slides.append([slide for _ in range(self.predictions[slide].shape[0])]) + all_tfr_idx.append(np.arange(self.predictions[slide].shape[0])) + all_x = np.concatenate(all_x) + all_y = np.concatenate(all_y) + all_slides = np.concatenate(all_slides) + all_tfr_idx = np.concatenate(all_tfr_idx) + + return sf.SlideMap.from_xy( + x=all_x, + y=all_y, + slides=all_slides, + tfr_index=all_tfr_idx, + **kwargs + ) + + def merge(self, df: "DatasetFeatures") -> None: + '''Merges with another DatasetFeatures. + + Args: + df (slideflow.DatasetFeatures): TargetDatasetFeatures + to merge with. + + Returns: + None + ''' + + self.activations.update(df.activations) + self.predictions.update(df.predictions) + self.uncertainty.update(df.uncertainty) + self.locations.update(df.locations) + self.tfrecords = np.concatenate([self.tfrecords, df.tfrecords]) + self.slides = list(self.activations.keys()) + + def remove_slide(self, slide: str) -> None: + """Removes slide from calculated features.""" + if slide in self.activations: + del self.activations[slide] + if slide in self.predictions: + del self.predictions[slide] + if slide in self.uncertainty: + del self.uncertainty[slide] + if slide in self.locations: + del self.locations[slide] + self.tfrecords = np.array([ + t for t in self.tfrecords + if sf.util.path_to_name(t) != slide + ]) + if slide in self.slides: + self.slides.remove(slide) + + def save_example_tiles( + self, + features: List[int], + outdir: str, + slides: Optional[List[str]] = None, + tiles_per_feature: int = 100 + ) -> None: + """For a set of activation features, saves image tiles named according + to their corresponding activations. + + Duplicate image tiles will be saved for each feature, organized into + subfolders named according to feature. + + Args: + features (list(int)): Features to evaluate. + outdir (str): Path to folder in which to save examples tiles. + slides (list, optional): List of slide names. If provided, will + only include tiles from these slides. Defaults to None. + tiles_per_feature (int, optional): Number of tiles to include as + examples for each feature. Defaults to 100. Will evenly sample + this many tiles across the activation gradient. + """ + + if not isinstance(features, list): + raise ValueError("'features' must be a list of int.") + + if not slides: + slides = self.slides + for f in features: + if not exists(join(outdir, str(f))): + os.makedirs(join(outdir, str(f))) + + gradient_list = [] + for slide in slides: + for i, val in enumerate(self.activations[slide][:, f]): + gradient_list += [{ + 'val': val, + 'slide': slide, + 'index': i + }] + gradient = np.array(sorted(gradient_list, key=lambda k: k['val'])) + sample_idx = np.linspace( + 0, + gradient.shape[0]-1, + num=tiles_per_feature, + dtype=int + ) + for i, g in track(enumerate(gradient[sample_idx]), + total=tiles_per_feature, + description=f"Feature {f}"): + for tfr in self.tfrecords: + if sf.util.path_to_name(tfr) == g['slide']: + tfr_dir = tfr + if not tfr_dir: + log.warning("TFRecord location not found for " + f"slide {g['slide']}") + slide, image = sf.io.get_tfrecord_by_index(tfr_dir, g['index']) + tile_filename = (f"{i}-tfrecord{g['slide']}-{g['index']}" + + f"-{g['val']:.2f}.jpg") + image_string = open(join(outdir, str(f), tile_filename), 'wb') + image_string.write(image.numpy()) + image_string.close() + + # --- Deprecated functions ---------------------------------------------------- + + def logits_mean(self): + warnings.warn( + "DatasetFeatures.logits_mean() is deprecated. Please use " + "DatasetFeatures.softmax_mean()", DeprecationWarning + ) + return self.softmax_mean() + + def logits_percent(self, *args, **kwargs): + warnings.warn( + "DatasetFeatures.logits_percent() is deprecated. Please use " + "DatasetFeatures.softmax_percent()", DeprecationWarning + ) + return self.softmax_percent(*args, **kwargs) + + def logits_predict(self, *args, **kwargs): + warnings.warn( + "DatasetFeatures.logits_predict() is deprecated. Please use " + "DatasetFeatures.softmax_predict()", DeprecationWarning + ) + return self.softmax_predict(*args, **kwargs)
+ +# ----------------------------------------------------------------------------- + +class _FeatureGenerator: + """Provides common API for feature generator interfaces.""" + + def __init__( + self, + model: Union[str, "BaseFeatureExtractor", "tf.keras.models.Model", "torch.nn.Module"], + dataset: "sf.Dataset", + *, + layers: Union[str, List[str]] = 'postconv', + include_preds: Optional[bool] = None, + include_uncertainty: bool = True, + batch_size: int = 32, + device: Optional[str] = None, + num_workers: Optional[int] = None, + augment: Optional[Union[bool, str]] = None, + transform: Optional[Callable] = None, + **kwargs + ) -> None: + """Initializes FeatureGenerator. + + Args: + model (str, BaseFeatureExtractor, tf.keras.models.Model, torch.nn.Module): + Model to use for feature extraction. If str, must be a path to + a saved model. + dataset (sf.Dataset): Dataset to use for feature extraction. + + Keyword Args: + augment (bool, str, optional): Whether to use data augmentation + during feature extraction. If True, will use default + augmentation. If str, will use augmentation specified by the + string. Defaults to None. + batch_size (int, optional): Batch size to use for feature + extraction. Defaults to 32. + device (str, optional): Device to use for feature extraction. + Only used for PyTorch feature extractors. Defaults to None. + include_preds (bool, optional): Whether to include model + predictions. If None, will be set to True if + model has a num_classes attribute. Defaults to None. + include_uncertainty (bool, optional): Whether to include model + uncertainty in the output. Only used if the feature generator + is a UQ-enabled model. Defaults to True. + layers (str, list(str)): Layers to extract features from. May be + the name of a single layer (str) or a list of layers (list). + Only used if model is a str. Defaults to 'postconv'. + normalizer ((str or :class:`slideflow.norm.StainNormalizer`), optional): + Stain normalization strategy to use on image tiles prior to + feature extraction. This argument is invalid if ``model`` is a + feature extractor built from a trained model, as stain + normalization will be specified by the model configuration. + Defaults to None. + normalizer_source (str, optional): Stain normalization preset + or path to a source image. Valid presets include 'v1', 'v2', + and 'v3'. If None, will use the default present ('v3'). + This argument is invalid if ``model`` is a feature extractor + built from a trained model. Defaults to None. + num_workers (int, optional): Number of workers to use for feature + extraction. Only used for PyTorch feature extractors. Defaults + to None. + transform (Callable, optional): Custom transform to apply to + images. Applied before standardization. If the feature extractor + is a PyTorch model, the transform should be a torchvision + transform. + + """ + self.model = model + self.dataset = dataset + self.layers = sf.util.as_list(layers) + self.batch_size = batch_size + self.simclr_args = None + self.num_workers = num_workers + self.augment = augment + self.transform = transform + + # Check if location information is stored in TFRecords + self.tfrecords_have_loc = self.dataset.tfrecords_have_locations() + if not self.tfrecords_have_loc: + log.warning( + "Some TFRecords do not have tile location information; " + "dataset iteration speed may be affected." + ) + + if self.is_extractor() and include_preds is None: + include_preds = self.model.num_classes > 0 # type: ignore + elif include_preds is None: + include_preds = True + self.include_preds = include_preds + self.include_uncertainty = include_uncertainty + + # Determine UQ and stain normalization. + # If the `model` is a feature extractor, stain normalization + # will be determined via keyword arguments by self._prepare_generator() + self._determine_uq_and_normalizer() + self.generator = self._prepare_generator(**kwargs) + + self.num_features = self.generator.num_features + self.num_classes = 0 if not include_preds else self.generator.num_classes + if self.is_torch() and hasattr(self.model, 'device'): + from slideflow.model import torch_utils + self.device = self.model.device or torch_utils.get_device(device) + elif self.is_torch(): + from slideflow.model import torch_utils + self.device = torch_utils.get_device(device) + else: + self.device = None + self._prepare_dataset_kwargs() + + # Move the normalizer to the appropriate device, if this is + # a pytorch GPU normalizer. + if self.has_torch_gpu_normalizer(): + log.debug("Moving normalizer to device: {}".format(self.device)) + self.normalizer.device = self.device + + def _calculate_feature_batch(self, batch_img): + """Calculate features from a batch of images.""" + + # If a PyTorch generator, wrap in inference_mode() and perform on CUDA + if self.is_torch(): + import torch + with torch.inference_mode(): + batch_img = batch_img.to(self.device) + if self.has_torch_gpu_normalizer(): + batch_img = self.normalizer.preprocess( + batch_img.to(self.normalizer.device), + standardize=self.standardize + ).to(self.device) + return self.generator(batch_img) + else: + if self.has_torch_gpu_normalizer(): + import torch + import tensorflow as tf + batch_img = batch_img.numpy() + batch_img = torch.from_numpy(batch_img) + batch_img = self.normalizer.transform( + batch_img.to(self.normalizer.device) + ) + batch_img = batch_img.cpu().numpy() + batch_img = tf.convert_to_tensor(batch_img) + if self.standardize: + batch_img = tf.image.per_image_standardization(batch_img) + return self.generator(batch_img) + + def _process_out(self, model_out, batch_slides, batch_loc): + model_out = sf.util.as_list(model_out) + + # Process data if the output is Tensorflow (SimCLR or Tensorflow model) + if self.is_tf(): + slides = [ + bs.decode('utf-8') + for bs in batch_slides.numpy() + ] + model_out = [ + m.numpy() if not isinstance(m, (list, tuple)) else m + for m in model_out + ] + if batch_loc[0] is not None: + loc = np.stack([ + batch_loc[0].numpy(), + batch_loc[1].numpy() + ], axis=1) + else: + loc = None + + # Process data if the output is PyTorch + elif self.is_torch(): + slides = batch_slides + model_out = [ + m.cpu().numpy() if not isinstance(m, list) else m + for m in model_out + ] + if batch_loc[0] is not None: + loc = np.stack([batch_loc[0], batch_loc[1]], axis=1) + else: + loc = None + + # Final processing. + # Order of return is features, predictions, uncertainty. + if self.uq and self.include_uncertainty: + uncertainty = model_out[-1] + model_out = model_out[:-1] + else: + uncertainty = None + if self.include_preds: + predictions = model_out[-1] + features = model_out[:-1] + else: + predictions = None + features = model_out + + # Concatenate features if we have features from >1 layer + if isinstance(features, list): + features = np.concatenate(features, axis=1) + + return features, predictions, uncertainty, slides, loc + + def _prepare_dataset_kwargs(self): + """Prepare keyword arguments for Dataset.tensorflow() or .torch().""" + + dts_kw = { + 'infinite': False, + 'batch_size': self.batch_size, + 'augment': self.augment, + 'transform': self.transform, + 'incl_slidenames': True, + 'incl_loc': True, + } + + # If this is a Feature Extractor, update the dataset kwargs + # with any preprocessing instructions specified by the extractor + if self.is_extractor(): + dts_kw.update(self.model.preprocess_kwargs) + + # Establish standardization. + self.standardize = ('standardize' not in dts_kw or dts_kw['standardize']) + + # Check if normalization is happening on GPU with PyTorch. + # If so, we will handle normalization and standardization + # in the feature generation loop. + if self.has_torch_gpu_normalizer(): + log.debug("Using GPU for stain normalization") + dts_kw['standardize'] = False + else: + # Otherwise, let the dataset handle normalization/standardization. + dts_kw['normalizer'] = self.normalizer + + # This is not used by SimCLR feature extractors. + self.dts_kw = dts_kw + + def _determine_uq_and_normalizer(self): + """Determines whether the model uses UQ and its stain normalizer.""" + + # Load configuration if model is path to a saved model + if isinstance(self.model, BaseFeatureExtractor): + self.uq = self.model.num_uncertainty > 0 + # If the feature extractor has a normalizer, use it. + # This will be overridden by keyword arguments if the + # feature extractor is not an instance of slideflow.model.Features. + self.normalizer = self.model.normalizer + elif isinstance(self.model, str): + model_config = sf.util.get_model_config(self.model) + hp = sf.ModelParams.from_dict(model_config['hp']) + self.uq = hp.uq + self.normalizer = hp.get_normalizer() + if self.normalizer: + log.debug(f'Using realtime {self.normalizer.method} normalization') + if 'norm_fit' in model_config: + self.normalizer.set_fit(**model_config['norm_fit']) + else: + self.normalizer = None + self.uq = False + + def _norm_from_kwargs(self, kwargs): + """Parse the stain normalizer from keyword arguments.""" + if 'normalizer' in kwargs and kwargs['normalizer'] is not None: + norm = kwargs['normalizer'] + del kwargs['normalizer'] + if 'normalizer_source' in kwargs: + norm_src = kwargs['normalizer_source'] + del kwargs['normalizer_source'] + else: + norm_src = None + if isinstance(norm, str): + normalizer = sf.norm.autoselect( + norm, + source=norm_src, + backend='tensorflow' if self.is_tf() else 'torch' + ) + else: + normalizer = norm + log.debug(f"Normalizing with {normalizer.method}") + return normalizer, kwargs + if 'normalizer' in kwargs: + del kwargs['normalizer'] + if 'normalizer_source' in kwargs: + del kwargs['normalizer_source'] + return None, kwargs + + def _prepare_generator(self, **kwargs) -> Callable: + """Prepare the feature generator.""" + + # Generator is a Feature Extractor + if self.is_extractor(): + + # Handle the case where the extractor is built from a trained model + if self.is_tf(): + from slideflow.model.tensorflow import Features as TFFeatures + is_tf_model_extractor = isinstance(self.model, TFFeatures) + is_torch_model_extractor = False + elif self.is_torch(): + from slideflow.model.torch import Features as TorchFeatures + is_torch_model_extractor = isinstance(self.model, TorchFeatures) + is_tf_model_extractor = False + else: + is_tf_model_extractor = False + is_torch_model_extractor = False + if (is_tf_model_extractor or is_torch_model_extractor) and 'normalizer' in kwargs: + raise ValueError( + "Cannot specify a normalizer when using a feature extractor " + "created from a trained model. Stain normalization is auto-detected " + "from the model configuration." + ) + elif (is_tf_model_extractor or is_torch_model_extractor) and kwargs: + raise ValueError( + f"Invalid keyword arguments: {', '.join(list(kwargs.keys()))}" + ) + elif (is_tf_model_extractor or is_torch_model_extractor): + # Stain normalization has already been determined + # from the model configuration. + return self.model + + # For all other feature extractors, stain normalization + # is determined from keyword arguments. + self.normalizer, kwargs = self._norm_from_kwargs(kwargs) + if kwargs: + raise ValueError( + f"Invalid keyword arguments: {', '.join(list(kwargs.keys()))}" + ) + return self.model + + # Generator is a path to a trained model, and we're using UQ + elif self.is_model_path() and (self.uq and self.include_uncertainty): + if self.include_preds is False: + raise ValueError( + "include_preds must be True if include_uncertainty is True" + ) + return sf.model.UncertaintyInterface( + self.model, + layers=self.layers, + **kwargs + ) + + # Generator is a path to a trained Slideflow model + elif self.is_model_path(): + return sf.model.Features( + self.model, + layers=self.layers, + include_preds=self.include_preds, + **kwargs + ) + + # Generator is a loaded Tensorflow model + elif self.is_tf(): + return sf.model.Features.from_model( + self.model, + layers=self.layers, + include_preds=self.include_preds, + **kwargs + ) + + # Generator is a loaded torch.nn.Module + elif self.is_torch(): + return sf.model.Features.from_model( + self.model.to(self.device), + tile_px=self.tile_px, + layers=self.layers, + include_preds=self.include_preds, + **kwargs + ) + + # Unrecognized feature extractor + else: + raise ValueError(f'Unrecognized feature extractor {self.model}') + + def is_model_path(self): + return isinstance(self.model, str) and (self.is_tf() or self.is_torch()) + + def is_extractor(self): + return isinstance(self.model, BaseFeatureExtractor) + + def is_torch(self): + if self.is_extractor(): + return self.model.is_torch() + else: + return sf.model.is_torch_model(self.model) + + def is_tf(self): + if self.is_extractor(): + return self.model.is_tensorflow() + else: + return sf.model.is_tensorflow_model(self.model) + + def has_torch_gpu_normalizer(self): + return ( + isinstance(self.normalizer, sf.norm.StainNormalizer) + and self.normalizer.__class__.__name__ == 'TorchStainNormalizer' + and self.normalizer.device != 'cpu' + ) + + def build_dataset(self): + """Build a dataloader.""" + + # Generator is a Tensorflow model. + if self.is_tf(): + log.debug( + "Setting up Tensorflow dataset iterator (num_parallel_reads=" + f"None, deterministic={not self.tfrecords_have_loc})" + ) + # Disable parallel reads if we're using tfrecords without location + # information, as we would need to read and receive data in order. + if not self.tfrecords_have_loc: + par_kw = dict(num_parallel_reads=None) + else: + par_kw = dict() + return self.dataset.tensorflow( + None, + deterministic=(not self.tfrecords_have_loc), + **par_kw, + **self.dts_kw # type: ignore + ) + + # Generator is a PyTorch model. + elif self.is_torch(): + if self.num_workers is None: + n_workers = (4 if self.tfrecords_have_loc else 1) + else: + n_workers = self.num_workers + log.debug( + "Setting up PyTorch dataset iterator (num_workers=" + f"{n_workers}, chunk_size=8)" + ) + return self.dataset.torch( + None, + num_workers=n_workers, + chunk_size=8, + **self.dts_kw # type: ignore + ) + + # Unrecognized feature generator. + else: + raise ValueError(f"Unrecognized model type: {type(self.model)}") + + def generate( + self, + *, + verbose: bool = True, + progress: bool = True, + pb: Optional[Progress] = None, + ): + + # Get the dataloader for iterating through tfrecords + dataset = self.build_dataset() + + # Rename tfrecord_array to tfrecords + log_fn = log.info if verbose else log.debug + log_fn(f'Calculating activations for {len(self.dataset.tfrecords())} ' + 'tfrecords') + log_fn(f'Generating from [green]{self.model}') + + # Interleave tfrecord datasets + estimated_tiles = self.dataset.num_tiles + + activations = defaultdict(list) # type: Dict[str, Any] + predictions = defaultdict(list) # type: Dict[str, Any] + uncertainty = defaultdict(list) # type: Dict[str, Any] + locations = defaultdict(list) # type: Dict[str, Any] + + # Worker to process activations/predictions, for more efficient throughput + q = queue.Queue() # type: queue.Queue + + def batch_worker(): + while True: + model_out, batch_slides, batch_loc = q.get() + if model_out is None: + return + features, preds, unc, slides, loc = self._process_out( + model_out, batch_slides, batch_loc + ) + + for d, slide in enumerate(slides): + if self.layers: + activations[slide].append(features[d]) + if self.include_preds and preds is not None: + predictions[slide].append(preds[d]) + if self.uq and self.include_uncertainty: + uncertainty[slide].append(unc[d]) + if loc is not None: + locations[slide].append(loc[d]) + + batch_proc_thread = threading.Thread(target=batch_worker, daemon=True) + batch_proc_thread.start() + + if progress and not pb: + pb = Progress(*Progress.get_default_columns(), + ImgBatchSpeedColumn(), + transient=sf.getLoggingLevel()>20) + task = pb.add_task("Generating...", total=estimated_tiles) + pb.start() + elif pb: + task = 0 + progress = False + else: + pb = None + with sf.util.cleanup_progress((pb if progress else None)): + for batch_img, _, batch_slides, batch_loc_x, batch_loc_y in dataset: + model_output = self._calculate_feature_batch(batch_img) + q.put((model_output, batch_slides, (batch_loc_x, batch_loc_y))) + if pb: + pb.advance(task, self.batch_size) + q.put((None, None, None)) + batch_proc_thread.join() + if hasattr(dataset, 'close'): + dataset.close() + + return activations, predictions, locations, uncertainty + + +# ----------------------------------------------------------------------------- + +def _export_bags( + model: Union[Callable, Dict], + dataset: "sf.Dataset", + slides: List[str], + slide_batch_size: int, + pb: Any, + outdir: str, + slide_task: int = 0, + **dts_kwargs +) -> None: + """Export bags for a given feature extractor.""" + for slide_batch in sf.util.batch(slides, slide_batch_size): + try: + _dataset = dataset.remove_filter(filters='slide') + except errors.DatasetFilterError: + _dataset = dataset + _dataset = _dataset.filter(filters={'slide': slide_batch}) + if not len(_dataset.tfrecords()): + continue + df = sf.DatasetFeatures(model, _dataset, pb=pb, **dts_kwargs) + df.to_torch(outdir, verbose=False) + pb.advance(slide_task, len(slide_batch)) + +def _distributed_export( + device: int, + model_cfg: Dict, + dataset: "sf.Dataset", + slides: List[List[str]], + slide_batch_size: int, + pb: Any, + outdir: str, + slide_task: int = 0, + dts_kwargs: Any = None, + mixed_precision: Optional[bool] = None, + channels_last: Optional[bool] = None +) -> None: + """Distributed export across multiple GPUs.""" + model = sf.model.extractors.build_extractor_from_cfg(model_cfg, device=f'cuda:{device}') + if mixed_precision is not None: + model.mixed_precision = mixed_precision + if channels_last is not None: + model.channels_last = channels_last + return _export_bags( + model, + dataset, + list(slides[device]), + slide_batch_size, + pb, + outdir, + slide_task, + **(dts_kwargs or {}) + ) +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/index.html b/docs/_modules/slideflow/model/index.html new file mode 100644 index 000000000..58053caa9 --- /dev/null +++ b/docs/_modules/slideflow/model/index.html @@ -0,0 +1,601 @@ + + + + + + + + + + + + slideflow.model — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model

+'''Submodule that includes tools for intermediate layer activations.
+
+Supports both PyTorch and Tensorflow backends, importing either model.tensorflow
+or model.pytorch based on the environmental variable SF_BACKEND.
+'''
+
+import warnings
+from typing import Any, Dict, List
+
+import slideflow as sf
+from slideflow import errors
+from .base import BaseFeatureExtractor
+from .features import DatasetFeatures
+from .extractors import (
+    list_extractors, list_torch_extractors, list_tensorflow_extractors,
+    is_extractor, is_torch_extractor, is_tensorflow_extractor,
+    build_feature_extractor, build_torch_feature_extractor,
+    build_tensorflow_feature_extractor, rebuild_extractor
+)
+
+# --- Backend-specific imports ------------------------------------------------
+
+if sf.backend() == 'tensorflow':
+    from slideflow.model.tensorflow import (SurvivalTrainer, Features, load, # noqa F401
+                                            RegressionTrainer, ModelParams,
+                                            Trainer, UncertaintyInterface)
+elif sf.backend() == 'torch':
+    from slideflow.model.torch import (SurvivalTrainer, Features, load, # noqa F401
+                                       RegressionTrainer, ModelParams,
+                                       Trainer, UncertaintyInterface)
+else:
+    raise errors.UnrecognizedBackendError
+
+# -----------------------------------------------------------------------------
+
+
+
[docs]def is_tensorflow_tensor(arg: Any) -> bool: + """Checks if the given object is a Tensorflow Tensor.""" + if sf.util.tf_available: + import tensorflow as tf + return isinstance(arg, tf.Tensor) + else: + return False
+ + +
[docs]def is_torch_tensor(arg: Any) -> bool: + """Checks if the given object is a Tensorflow Tensor.""" + if sf.util.torch_available: + import torch + return isinstance(arg, torch.Tensor) + else: + return False
+ + +
[docs]def is_tensorflow_model(arg: Any) -> bool: + """Checks if the object is a Tensorflow Model or path to Tensorflow model.""" + if isinstance(arg, str): + return sf.util.is_tensorflow_model_path(arg) + elif sf.util.tf_available: + import tensorflow as tf + return isinstance(arg, tf.keras.models.Model) + else: + return False
+ + +
[docs]def is_torch_model(arg: Any) -> bool: + """Checks if the object is a PyTorch Module or path to PyTorch model.""" + if isinstance(arg, str): + return sf.util.is_torch_model_path(arg) + elif sf.util.torch_available: + import torch + return isinstance(arg, torch.nn.Module) + else: + return False
+ + +def trainer_from_hp(*args, **kwargs): + warnings.warn( + "sf.model.trainer_from_hp() is deprecated. Please use " + "sf.model.build_trainer().", + DeprecationWarning + ) + return build_trainer(*args, **kwargs) + + +
[docs]def build_trainer( + hp: "ModelParams", + outdir: str, + labels: Dict[str, Any], + **kwargs +) -> Trainer: + """From the given :class:`slideflow.ModelParams` object, returns + the appropriate instance of :class:`slideflow.model.Trainer`. + + Args: + hp (:class:`slideflow.ModelParams`): ModelParams object. + outdir (str): Path for event logs and checkpoints. + labels (dict): Dict mapping slide names to outcome labels (int or + float format). + + Keyword Args: + slide_input (dict): Dict mapping slide names to additional + slide-level input, concatenated after post-conv. + name (str, optional): Optional name describing the model, used for + model saving. Defaults to 'Trainer'. + feature_sizes (list, optional): List of sizes of input features. + Required if providing additional input features as input to + the model. + feature_names (list, optional): List of names for input features. + Used when permuting feature importance. + outcome_names (list, optional): Name of each outcome. Defaults to + "Outcome {X}" for each outcome. + mixed_precision (bool, optional): Use FP16 mixed precision (rather + than FP32). Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + config (dict, optional): Training configuration dictionary, used + for logging. Defaults to None. + use_neptune (bool, optional): Use Neptune API logging. + Defaults to False + neptune_api (str, optional): Neptune API token, used for logging. + Defaults to None. + neptune_workspace (str, optional): Neptune workspace. + Defaults to None. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model with + ``tf.keras.models.load_model()``. If 'weights', will read the + ``params.json`` configuration file, build the model architecture, + and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + custom_objects (dict, Optional): Dictionary mapping names + (strings) to custom classes or functions. Defaults to None. + num_workers (int): Number of dataloader workers. Only used for PyTorch. + Defaults to 4. + + """ + if hp.model_type() == 'classification': + return Trainer(hp, outdir, labels, **kwargs) + if hp.model_type() == 'regression': + return RegressionTrainer(hp, outdir, labels, **kwargs) + if hp.model_type() == 'survival': + return SurvivalTrainer(hp, outdir, labels, **kwargs) + else: + raise ValueError(f"Unknown model type: {hp.model_type()}")
+ + +
[docs]def read_hp_sweep( + filename: str, + models: List[str] = None +) -> Dict[str, "ModelParams"]: + """Organizes a list of hyperparameters ojects and associated models names. + + Args: + filename (str): Path to hyperparameter sweep JSON file. + models (list(str)): List of model names. Defaults to None. + If not supplied, returns all valid models from batch file. + + Returns: + List of (Hyperparameter, model_name) for each HP combination + """ + if models is not None and not isinstance(models, list): + raise ValueError("If supplying models, must be list(str) " + "with model names.") + if isinstance(models, list) and not list(set(models)) == models: + raise ValueError("Duplicate model names provided.") + + hp_list = sf.util.load_json(filename) + + # First, ensure all indicated models are in the batch train file + if models: + valid_models = [] + for hp_dict in hp_list: + model_name = list(hp_dict.keys())[0] + if ((not models) + or (isinstance(models, str) and model_name == models) + or model_name in models): + valid_models += [model_name] + missing = [m for m in models if m not in valid_models] + if missing: + raise ValueError(f"Unable to find models {', '.join(missing)}") + else: + valid_models = [list(hp_dict.keys())[0] for hp_dict in hp_list] + + # Read the batch train file and generate HyperParameter objects + # from the given configurations + loaded = {} + for hp_dict in hp_list: + name = list(hp_dict.keys())[0] + if name in valid_models: + loaded.update({ + name: ModelParams.from_dict(hp_dict[name]) + }) + return loaded # type: ignore
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/tensorflow/index.html b/docs/_modules/slideflow/model/tensorflow/index.html new file mode 100644 index 000000000..053d75b67 --- /dev/null +++ b/docs/_modules/slideflow/model/tensorflow/index.html @@ -0,0 +1,3053 @@ + + + + + + + + + + + + slideflow.model.tensorflow — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.tensorflow

+'''Tensorflow backend for the slideflow.model submodule.'''
+
+from __future__ import absolute_import, division, print_function
+
+import atexit
+import inspect
+import json
+import logging
+import os
+import shutil
+import numpy as np
+import multiprocessing as mp
+import tensorflow as tf
+from packaging import version
+from os.path import dirname, exists, join
+from types import SimpleNamespace
+from typing import (
+    TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union, Callable, Iterable
+)
+from tensorflow.keras import applications as kapps
+
+import slideflow as sf
+import slideflow.model.base as _base
+import slideflow.util.neptune_utils
+from slideflow import errors
+from slideflow.util import log, NormFit, no_scope
+
+from . import tensorflow_utils as tf_utils
+from .base import log_manifest, BaseFeatureExtractor
+from .tensorflow_utils import unwrap, flatten, eval_from_model, build_uq_model  # type: ignore
+
+# Set the tensorflow logger
+if sf.getLoggingLevel() == logging.DEBUG:
+    logging.getLogger('tensorflow').setLevel(logging.DEBUG)
+    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'
+else:
+    logging.getLogger('tensorflow').setLevel(logging.ERROR)
+    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
+
+sf.util.allow_gpu_memory_growth()
+
+if TYPE_CHECKING:
+    import pandas as pd
+    from slideflow.norm import StainNormalizer
+
+
+class StaticDropout(tf.keras.layers.Dropout):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+    def call(self, inputs, **kwargs):
+        return super().call(inputs, training=True)
+
+
+class ModelParams(_base._ModelParams):
+    """Build a set of hyperparameters."""
+
+    ModelDict = {
+        'xception': kapps.Xception,
+        'vgg16': kapps.VGG16,
+        'vgg19': kapps.VGG19,
+        'resnet50': kapps.ResNet50,
+        'resnet101': kapps.ResNet101,
+        'resnet152': kapps.ResNet152,
+        'resnet50_v2': kapps.ResNet50V2,
+        'resnet101_v2': kapps.ResNet101V2,
+        'resnet152_v2': kapps.ResNet152V2,
+        'inception': kapps.InceptionV3,
+        'nasnet_large': kapps.NASNetLarge,
+        'inception_resnet_v2': kapps.InceptionResNetV2,
+        'mobilenet': kapps.MobileNet,
+        'mobilenet_v2': kapps.MobileNetV2,
+        'densenet_121': kapps.DenseNet121,
+        'densenet_169': kapps.DenseNet169,
+        'densenet_201': kapps.DenseNet201,
+        # 'ResNeXt50': kapps.ResNeXt50,
+        # 'ResNeXt101': kapps.ResNeXt101,
+        # 'NASNet': kapps.NASNet
+    }
+    OptDict = {
+        'Adam': tf.keras.optimizers.Adam,
+        'SGD': tf.keras.optimizers.SGD,
+        'RMSprop': tf.keras.optimizers.RMSprop,
+        'Adagrad': tf.keras.optimizers.Adagrad,
+        'Adadelta': tf.keras.optimizers.Adadelta,
+        'Adamax': tf.keras.optimizers.Adamax,
+        'Nadam': tf.keras.optimizers.Nadam
+    }
+    if hasattr(kapps, 'EfficientNetV2B0'):
+        ModelDict.update({'efficientnet_v2b0': kapps.EfficientNetV2B0})
+    if hasattr(kapps, 'EfficientNetV2B1'):
+        ModelDict.update({'efficientnet_v2b1': kapps.EfficientNetV2B1})
+    if hasattr(kapps, 'EfficientNetV2B2'):
+        ModelDict.update({'efficientnet_v2b2': kapps.EfficientNetV2B2})
+    if hasattr(kapps, 'EfficientNetV2B3'):
+        ModelDict.update({'efficientnet_v2b3': kapps.EfficientNetV2B3})
+    if hasattr(kapps, 'EfficientNetV2S'):
+        ModelDict.update({'efficientnet_v2s': kapps.EfficientNetV2S})
+    if hasattr(kapps, 'EfficientNetV2M'):
+        ModelDict.update({'efficientnet_v2m': kapps.EfficientNetV2M})
+    if hasattr(kapps, 'EfficientNetV2L'):
+        ModelDict.update({'efficientnet_v2l': kapps.EfficientNetV2L})
+    RegressionLossDict = {
+        loss: getattr(tf.keras.losses, loss)
+        for loss in [
+            'mean_squared_error',
+            'mean_absolute_error',
+            'mean_absolute_percentage_error',
+            'mean_squared_logarithmic_error',
+            'squared_hinge',
+            'hinge',
+            'logcosh'
+        ]
+    }
+    RegressionLossDict.update({
+        'negative_log_likelihood': tf_utils.negative_log_likelihood
+    })
+    AllLossDict = {
+        loss: getattr(tf.keras.losses, loss)
+        for loss in [
+            'mean_squared_error',
+            'mean_absolute_error',
+            'mean_absolute_percentage_error',
+            'mean_squared_logarithmic_error',
+            'squared_hinge',
+            'hinge',
+            'categorical_hinge',
+            'logcosh',
+            'huber',
+            'categorical_crossentropy',
+            'sparse_categorical_crossentropy',
+            'binary_crossentropy',
+            'kullback_leibler_divergence',
+            'poisson'
+        ]
+    }
+    AllLossDict.update({
+        'batch_loss_crossentropy': tf_utils.batch_loss_crossentropy,
+        'negative_log_likelihood': tf_utils.negative_log_likelihood
+    })
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        assert self.model in self.ModelDict.keys()
+        assert self.optimizer in self.OptDict.keys()
+        assert self.loss in self.AllLossDict.keys()
+
+    def _add_hidden_layers(
+        self,
+        model: tf.keras.Model,
+        regularizer: tf.keras.layers.Layer
+    ) -> Tuple[tf.keras.Model, tf.keras.layers.Layer]:
+        """Adds hidden layer(s) to a model.
+
+        Args:
+            model (tf.keras.Model): Tensorflow model.
+            regularizer (tf.keras.layers.Layer): Regularization for hidden layers.
+
+        Returns:
+            A tuple containing
+
+                tf.keras.Model: Model with hidden layers added.
+
+                tf.keras.layers.Layer: Last linear layer.
+        """
+        log.debug("Using Batch normalization")
+        last_linear = None
+        for i in range(self.hidden_layers):
+            model = tf.keras.layers.Dense(self.hidden_layer_width,
+                                          name=f'hidden_{i}',
+                                          activation='relu',
+                                          kernel_regularizer=regularizer)(model)
+            model = tf.keras.layers.BatchNormalization()(model)
+            last_linear = model
+            if self.uq:
+                model = StaticDropout(self.dropout)(model)
+            elif self.dropout:
+                model = tf.keras.layers.Dropout(self.dropout)(model)
+        return model, last_linear
+
+    def _get_dense_regularizer(self) -> Optional[tf.keras.layers.Layer]:
+        """Return regularizer for dense (hidden) layers."""
+
+        if self.l2_dense and not self.l1_dense:
+            log.debug(f"Using L2 regularization for dense layers (weight={self.l2_dense})")
+            return tf.keras.regularizers.l2(self.l2_dense)
+        elif self.l1_dense and not self.l2_dense:
+            log.debug(f"Using L1 regularization for dense layers (weight={self.l1_dense})")
+            return tf.keras.regularizers.l1(self.l1_dense)
+        elif self.l1_dense and self.l2_dense:
+            log.debug(f"Using L1 (weight={self.l1_dense}) and L2 (weight={self.l2_dense}) reg for dense layers")
+            return tf.keras.regularizers.l1_l2(l1=self.l1_dense, l2=self.l2_dense)
+        else:
+            log.debug("Not using regularization for dense layers")
+            return None
+
+    def _add_regularization(self, model: tf.keras.Model) -> tf.keras.Model:
+        """Add non-hidden layer regularization.
+
+        Args:
+            model (tf.keras.Model): Tensorflow model.
+
+        Returns:
+            tf.keras.Model: Tensorflow model with regularization added.
+        """
+        if self.l2 and not self.l1:
+            log.debug(f"Using L2 regularization for base model (weight={self.l2})")
+            regularizer = tf.keras.regularizers.l2(self.l2)
+        elif self.l1 and not self.l2:
+            log.debug(f"Using L1 regularization for base model (weight={self.l1})")
+            regularizer = tf.keras.regularizers.l1(self.l1)
+        elif self.l1 and self.l2:
+            log.debug(f"Using L1 (weight={self.l1}) and L2 (weight={self.l2}) regularization for base model")
+            regularizer = tf.keras.regularizers.l1_l2(l1=self.l1, l2=self.l2)
+        else:
+            log.debug("Not using regularization for base model")
+            regularizer = None
+        if regularizer is not None:
+            model = tf_utils.add_regularization(model, regularizer)
+        return model
+
+    def _freeze_layers(self, model: tf.keras.Model) -> tf.keras.Model:
+        """Freeze last X layers, where X = self.trainable_layers.
+
+        Args:
+            model (tf.keras.Model): Tensorflow model.
+
+        Returns:
+            tf.keras.Model: Tensorflow model with frozen layers.
+        """
+        freezeIndex = int(len(model.layers) - (self.trainable_layers - 1))  # - self.hp.hidden_layers - 1))
+        log.info(f'Only training on last {self.trainable_layers} layers (of {len(model.layers)} total)')
+        for layer in model.layers[:freezeIndex]:
+            layer.trainable = False
+        return model
+
+    def _get_core(self, weights: Optional[str] = None) -> tf.keras.Model:
+        """Returns a Keras model of the appropriate architecture, input shape,
+        pooling, and initial weights.
+
+        Args:
+            weights (Optional[str], optional): Pretrained weights to use.
+                Defaults to None.
+
+        Returns:
+            tf.keras.Model: Core model.
+        """
+        input_shape = (self.tile_px, self.tile_px, 3)
+        model_fn = self.ModelDict[self.model]
+        model_kwargs = {
+            'input_shape': input_shape,
+            'include_top': self.include_top,
+            'pooling': self.pooling,
+            'weights': weights
+        }
+        # Only pass kwargs accepted by model function
+        model_fn_sig = inspect.signature(model_fn)
+        model_kw = [
+            param.name
+            for param in model_fn_sig.parameters.values()
+            if param.kind == param.POSITIONAL_OR_KEYWORD
+        ]
+        model_kwargs = {key: model_kwargs[key] for key in model_kw if key in model_kwargs}
+        return model_fn(**model_kwargs)
+
+    def _build_base(
+        self,
+        pretrain: Optional[str] = 'imagenet',
+        load_method: str = 'weights'
+    ) -> tf.keras.Model:
+        """"Builds the base image model, from a Keras model core, with the
+        appropriate input tensors and identity layers.
+
+        Args:
+            pretrain (str, optional): Pretrained weights to load.
+                Defaults to 'imagenet'.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+
+        Returns:
+            tf.keras.Model: Base model.
+        """
+        image_shape = (self.tile_px, self.tile_px, 3)
+        tile_input_tensor = tf.keras.Input(shape=image_shape, name='tile_image')
+        if pretrain:
+            log.debug(f'Using pretraining from [magenta]{pretrain}')
+        if pretrain and pretrain != 'imagenet':
+            pretrained_model = load(pretrain, method=load_method, training=True)
+            try:
+                # This is the tile_image input
+                pretrained_input = pretrained_model.get_layer(name='tile_image').input
+                # Name of the pretrained model core, which should be at layer 1
+                pretrained_name = pretrained_model.get_layer(index=1).name
+                # This is the post-convolution layer
+                pretrained_output = pretrained_model.get_layer(name='post_convolution').output
+                base_model = tf.keras.Model(inputs=pretrained_input,
+                                            outputs=pretrained_output,
+                                            name=f'pretrained_{pretrained_name}').layers[1]
+            except ValueError:
+                log.warning('Unable to automatically read pretrained model, will try legacy format')
+                base_model = pretrained_model.get_layer(index=0)
+        else:
+            base_model = self._get_core(weights=pretrain)
+            if self.include_top:
+                base_model = tf.keras.Model(
+                    inputs=base_model.input,
+                    outputs=base_model.layers[-2].output,
+                    name=base_model.name
+                )
+        # Add regularization
+        base_model = self._add_regularization(base_model)
+
+        # Allow only a subset of layers in the base model to be trainable
+        if self.trainable_layers != 0:
+            base_model = self._freeze_layers(base_model)
+
+        # This is an identity layer that simply returns the last layer, allowing us to name and access this layer later
+        post_convolution_identity_layer = tf.keras.layers.Activation('linear', name='post_convolution')
+        layers = [tile_input_tensor, base_model]
+        if not self.pooling:
+            layers += [tf.keras.layers.Flatten()]
+        layers += [post_convolution_identity_layer]
+        if self.uq:
+            layers += [StaticDropout(self.dropout)]
+        elif self.dropout:
+            layers += [tf.keras.layers.Dropout(self.dropout)]
+        tile_image_model = tf.keras.Sequential(layers)
+        model_inputs = [tile_image_model.input]
+        return tile_image_model, model_inputs
+
+    def _build_classification_or_regression_model(
+        self,
+        num_classes: Union[int, Dict[Any, int]],
+        num_slide_features: int = 0,
+        activation: str = 'softmax',
+        pretrain: str = 'imagenet',
+        checkpoint: Optional[str] = None,
+        load_method: str = 'weights'
+    ) -> tf.keras.Model:
+        """Assembles classification or regression model, using pretraining (imagenet)
+        or the base layers of a supplied model.
+
+        Args:
+            num_classes (int or dict): Either int (single categorical outcome,
+                indicating number of classes) or dict (dict mapping categorical
+                outcome names to number of unique categories in each outcome).
+            num_slide_features (int): Number of slide-level features separate
+                from image input. Defaults to 0.
+            activation (str): Type of final layer activation to use.
+                Defaults to softmax.
+            pretrain (str): Either 'imagenet' or path to model to use as
+                pretraining. Defaults to 'imagenet'.
+            checkpoint (str): Path to checkpoint from which to resume model
+                training. Defaults to None.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+        """
+        tile_image_model, model_inputs = self._build_base(pretrain, load_method)
+        if num_slide_features:
+            log.debug(f'Model has {num_slide_features} slide input features')
+            slide_feature_input_tensor = tf.keras.Input(
+                shape=(num_slide_features),
+                name='slide_feature_input'
+            )
+        else:
+            log.debug('Not using any slide-level input features.')
+
+        # Merge layers
+        if num_slide_features and ((self.tile_px == 0) or self.drop_images):
+            log.info('Generating model with only slide-level input - no images')
+            merged_model = slide_feature_input_tensor
+            model_inputs += [slide_feature_input_tensor]
+        elif num_slide_features:
+            # Add slide feature input tensors
+            merged_model = tf.keras.layers.Concatenate(name='input_merge')(
+                [slide_feature_input_tensor, tile_image_model.output]
+            )
+            model_inputs += [slide_feature_input_tensor]
+        else:
+            merged_model = tile_image_model.output
+
+        # Add hidden layers
+        regularizer = self._get_dense_regularizer()
+        merged_model, last_linear = self._add_hidden_layers(
+            merged_model, regularizer
+        )
+
+        # Multi-categorical outcomes
+        if isinstance(num_classes, dict):
+            outputs = []
+            for c in num_classes:
+                final_dense_layer = tf.keras.layers.Dense(
+                    num_classes[c],
+                    kernel_regularizer=regularizer,
+                    name=f'logits-{c}'
+                )(merged_model)
+                outputs += [
+                    tf.keras.layers.Activation(
+                        activation,
+                        dtype='float32',
+                        name=f'out-{c}'
+                    )(final_dense_layer)
+                ]
+        else:
+            final_dense_layer = tf.keras.layers.Dense(
+                num_classes,
+                kernel_regularizer=regularizer,
+                name='logits'
+            )(merged_model)
+            outputs = [
+                tf.keras.layers.Activation(
+                    activation,
+                    dtype='float32',
+                    name='output'
+                )(final_dense_layer)
+            ]
+        # Assemble final model
+        log.debug(f'Using {activation} activation')
+        model = tf.keras.Model(inputs=model_inputs, outputs=outputs)
+        # Disable experimental batch loss
+        if False:
+            model.add_loss(tf_utils.batch_loss_crossentropy(last_linear))
+
+        if checkpoint:
+            log.info(f'Loading checkpoint weights from [green]{checkpoint}')
+            model.load_weights(checkpoint)
+
+        return model
+
+    def _build_survival_model(
+        self,
+        num_classes: Union[int, Dict[Any, int]],
+        num_slide_features: int = 1,
+        pretrain: Optional[str] = None,
+        checkpoint: Optional[str] = None,
+        load_method: str = 'weights',
+        training: bool = True
+    ) -> tf.keras.Model:
+        """Assembles a survival model, using pretraining (imagenet)
+        or the base layers of a supplied model.
+
+        Args:
+            num_classes (int or dict): Either int (single categorical outcome,
+                indicating number of classes) or dict (dict mapping categorical
+                outcome names to number of unique categories in each outcome).
+            num_slide_features (int): Number of slide-level features separate
+                from image input. Defaults to 0.
+            activation (str): Type of final layer activation to use.
+                Defaults to softmax.
+            pretrain (str): Either 'imagenet' or path to model to use as
+                pretraining. Defaults to 'imagenet'.
+            checkpoint (str): Path to checkpoint from which to resume model
+                training. Defaults to None.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+        """
+        activation = 'linear'
+        tile_image_model, model_inputs = self._build_base(pretrain, load_method)
+
+        # Add slide feature input tensors, if there are more slide features
+        # than just the event input tensor for survival models
+        if training:
+            event_input_tensor = tf.keras.Input(shape=(1), name='event_input')
+        if not (num_slide_features == 1):
+            slide_feature_input_tensor = tf.keras.Input(
+                shape=(num_slide_features - 1),
+                name='slide_feature_input'
+            )
+        # Merge layers
+        if num_slide_features and ((self.tile_px == 0) or self.drop_images):
+            # Add images
+            log.info('Generating model with only slide-level input - no images')
+            merged_model = slide_feature_input_tensor
+            model_inputs += [slide_feature_input_tensor]
+            if training:
+                model_inputs += [event_input_tensor]
+        elif num_slide_features and num_slide_features > 1:
+            # Add slide feature input tensors, if there are more slide features
+            # than just the event input tensor for survival models
+            merged_model = tf.keras.layers.Concatenate(name='input_merge')(
+                [slide_feature_input_tensor, tile_image_model.output]
+            )
+            model_inputs += [slide_feature_input_tensor]
+            if training:
+                model_inputs += [event_input_tensor]
+        else:
+            merged_model = tile_image_model.output
+            if training:
+                model_inputs += [event_input_tensor]
+
+        # Add hidden layers
+        regularizer = self._get_dense_regularizer()
+        merged_model, last_linear = self._add_hidden_layers(
+            merged_model, regularizer
+        )
+        log.debug(f'Using {activation} activation')
+
+        # Multi-categorical outcomes
+        if type(num_classes) == dict:
+            outputs = []
+            for c in num_classes:
+                final_dense_layer = tf.keras.layers.Dense(
+                    num_classes[c],
+                    kernel_regularizer=regularizer,
+                    name=f'logits-{c}'
+                )(merged_model)
+                outputs += [tf.keras.layers.Activation(
+                    activation,
+                    dtype='float32',
+                    name=f'out-{c}'
+                )(final_dense_layer)]
+        else:
+            final_dense_layer = tf.keras.layers.Dense(
+                num_classes,
+                kernel_regularizer=regularizer,
+                name='logits'
+            )(merged_model)
+            outputs = [tf.keras.layers.Activation(
+                activation,
+                dtype='float32',
+                name='output'
+            )(final_dense_layer)]
+        if training:
+            outputs[0] = tf.keras.layers.Concatenate(
+                name='output_merge_survival',
+                dtype='float32'
+            )([outputs[0], event_input_tensor])
+
+        # Assemble final model
+        model = tf.keras.Model(inputs=model_inputs, outputs=outputs)
+
+        if checkpoint:
+            log.info(f'Loading checkpoint weights from [green]{checkpoint}')
+            model.load_weights(checkpoint)
+
+        return model
+
+    def build_model(
+        self,
+        labels: Optional[Dict] = None,
+        num_classes: Optional[Union[int, Dict[Any, int]]] = None,
+        **kwargs
+    ) -> tf.keras.Model:
+        """Auto-detects model type (classification, regression, survival) from parameters
+        and builds, using pretraining or the base layers of a supplied model.
+
+        Args:
+            labels (dict, optional): Dict mapping slide names to outcomes.
+                Used to detect number of outcome categories.
+            num_classes (int or dict, optional): Either int (single categorical
+                outcome, indicating number of classes) or dict (dict mapping
+                categorical outcome names to number of unique categories in
+                each outcome). Must supply either `num_classes` or `label`
+                (can detect number of classes from labels)
+            num_slide_features (int, optional): Number of slide-level features
+                separate from image input. Defaults to 0.
+            activation (str, optional): Type of final layer activation to use.
+                Defaults to 'softmax' (classification models) or 'regression'
+                (regression or survival models).
+            pretrain (str, optional): Either 'imagenet' or path to model to use
+                as pretraining. Defaults to 'imagenet'.
+            checkpoint (str, optional): Path to checkpoint from which to resume
+                model training. Defaults to None.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+        """
+
+        assert num_classes is not None or labels is not None
+        if num_classes is None:
+            num_classes = self._detect_classes_from_labels(labels)  # type: ignore
+
+        if self.model_type() == 'classification':
+            return self._build_classification_or_regression_model(
+                num_classes, **kwargs, activation='softmax'
+            )
+        elif self.model_type() == 'regression':
+            return self._build_classification_or_regression_model(
+                num_classes, **kwargs, activation='linear'
+            )
+        elif self.model_type() == 'survival':
+            return self._build_survival_model(num_classes, **kwargs)
+        else:
+            raise errors.ModelError(f'Unknown model type: {self.model_type()}')
+
+    def get_loss(self) -> tf.keras.losses.Loss:
+        return self.AllLossDict[self.loss]
+
+    def get_opt(self) -> tf.keras.optimizers.Optimizer:
+        """Returns optimizer with appropriate learning rate."""
+        if self.learning_rate_decay not in (0, 1):
+            initial_learning_rate = self.learning_rate
+            lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
+                initial_learning_rate,
+                decay_steps=self.learning_rate_decay_steps,
+                decay_rate=self.learning_rate_decay,
+                staircase=True
+            )
+            return self.OptDict[self.optimizer](learning_rate=lr_schedule)
+        else:
+            return self.OptDict[self.optimizer](learning_rate=self.learning_rate)
+
+    def model_type(self) -> str:
+        """Returns 'regression', 'classification', or 'survival', reflecting the loss."""
+        #check if loss is custom_[type] and returns type
+        if self.loss.startswith('custom'):
+            return self.loss[7:]
+        elif self.loss == 'negative_log_likelihood':
+            return 'survival'
+        elif self.loss in self.RegressionLossDict:
+            return 'regression'
+        else:
+            return 'classification'
+
+
+class _PredictionAndEvaluationCallback(tf.keras.callbacks.Callback):
+
+    """Prediction and Evaluation Callback used during model training."""
+
+    def __init__(self, parent: "Trainer", cb_args: SimpleNamespace) -> None:
+        super(_PredictionAndEvaluationCallback, self).__init__()
+        self.parent = parent
+        self.hp = parent.hp
+        self.cb_args = cb_args
+        self.early_stop = False
+        self.early_stop_batch = 0
+        self.early_stop_epoch = 0
+        self.last_ema = -1  # type: float
+        self.moving_average = []  # type: List
+        self.ema_two_checks_prior = -1  # type: float
+        self.ema_one_check_prior = -1  # type: float
+        self.epoch_count = cb_args.starting_epoch
+        self.model_type = self.hp.model_type()
+        self.results = {'epochs': {}}  # type: Dict[str, Dict]
+        self.neptune_run = self.parent.neptune_run
+        self.global_step = 0
+        self.train_summary_writer = tf.summary.create_file_writer(
+            join(self.parent.outdir, 'train'))
+        self.val_summary_writer = tf.summary.create_file_writer(
+            join(self.parent.outdir, 'validation'))
+
+        # Circumvents buffer overflow error with Python 3.10.
+        # Without this, a buffer overflow error will be encountered when
+        # attempting to make a matplotlib figure (with the tkagg backend)
+        # during model evaluation. I have not yet been able to track down
+        # the root cause.
+        if self.cb_args.using_validation:
+            import matplotlib.pyplot as plt
+            plt.figure()
+            plt.close()
+
+    def _log_training_metrics(self, logs):
+        """Log training metrics to Tensorboard/Neptune."""
+        # Log to Tensorboard.
+        with self.train_summary_writer.as_default():
+            for _log in logs:
+                tf.summary.scalar(
+                    f'batch_{_log}',
+                    data=logs[_log],
+                    step=self.global_step)
+        # Log to neptune.
+        if self.neptune_run:
+            self.neptune_run['metrics/train/batch/loss'].log(
+                logs['loss'],
+                step=self.global_step)
+            sf.util.neptune_utils.list_log(
+                self.neptune_run,
+                'metrics/train/batch/accuracy',
+                logs['accuracy'],
+                step=self.global_step)
+
+    def _log_validation_metrics(self, metrics):
+        """Log validation metrics to Tensorboard/Neptune."""
+        # Tensorboard logging for validation metrics
+        with self.val_summary_writer.as_default():
+            for _log in metrics:
+                tf.summary.scalar(
+                    f'batch_{_log}',
+                    data=metrics[_log],
+                    step=self.global_step)
+        # Log to neptune
+        if self.neptune_run:
+            for v in metrics:
+                self.neptune_run[f"metrics/val/batch/{v}"].log(
+                    round(metrics[v], 3),
+                    step=self.global_step
+                )
+            if self.last_ema != -1:
+                self.neptune_run["metrics/val/batch/exp_moving_avg"].log(
+                    round(self.last_ema, 3),
+                    step=self.global_step
+                )
+            self.neptune_run["early_stop/stopped_early"] = False
+
+    def _log_epoch_evaluation(self, epoch_results, metrics, accuracy, loss, logs={}):
+        """Log the end-of-epoch evaluation to CSV, Tensorboard, and Neptune."""
+        epoch = self.epoch_count
+        run = self.neptune_run
+        sf.util.update_results_log(
+            self.cb_args.results_log,
+            'trained_model',
+            {f'epoch{epoch}': epoch_results}
+        )
+        with self.val_summary_writer.as_default():
+            # Note: Tensorboard epoch logging starts with index=0,
+            # whereas all other logging starts with index=1
+            if isinstance(accuracy, (list, tuple, np.ndarray)):
+                for i in range(len(accuracy)):
+                    tf.summary.scalar(f'epoch_accuracy-{i}', data=accuracy[i], step=epoch-1)
+            elif accuracy is not None:
+                tf.summary.scalar(f'epoch_accuracy', data=accuracy, step=epoch-1)
+            if isinstance(loss, (list, tuple, np.ndarray)):
+                for i in range(len(loss)):
+                    tf.summary.scalar(f'epoch_loss-{i}', data=loss[i], step=epoch-1)
+            else:
+                tf.summary.scalar(f'epoch_loss', data=loss, step=epoch-1)
+
+        # Log epoch results to Neptune
+        if run:
+            # Training epoch metrics
+            run['metrics/train/epoch/loss'].log(logs['loss'], step=epoch)
+            sf.util.neptune_utils.list_log(
+                run,
+                'metrics/train/epoch/accuracy',
+                logs['accuracy'],
+                step=epoch
+            )
+            # Validation epoch metrics
+            run['metrics/val/epoch/loss'].log(loss, step=epoch)
+            sf.util.neptune_utils.list_log(
+                run,
+                'metrics/val/epoch/accuracy',
+                accuracy,
+                step=epoch
+            )
+            for metric in metrics:
+                if metrics[metric]['tile'] is None:
+                    continue
+                for outcome in metrics[metric]['tile']:
+                    # If only one outcome, log to metrics/val/epoch/[metric].
+                    # If more than one outcome, log to
+                    # metrics/val/epoch/[metric]/[outcome_name]
+                    def metric_label(s):
+                        if len(metrics[metric]['tile']) == 1:
+                            return f'metrics/val/epoch/{s}_{metric}'
+                        else:
+                            return f'metrics/val/epoch/{s}_{metric}/{outcome}'
+
+                    tile_metric = metrics[metric]['tile'][outcome]
+                    slide_metric = metrics[metric]['slide'][outcome]
+                    patient_metric = metrics[metric]['patient'][outcome]
+
+                    # If only one value for a metric, log to .../[metric]
+                    # If more than one value for a metric (e.g. AUC for each
+                    # category), log to .../[metric]/[i]
+                    sf.util.neptune_utils.list_log(
+                        run,
+                        metric_label('tile'),
+                        tile_metric,
+                        step=epoch
+                    )
+                    sf.util.neptune_utils.list_log(
+                        run,
+                        metric_label('slide'),
+                        slide_metric,
+                        step=epoch
+                    )
+                    sf.util.neptune_utils.list_log(
+                        run,
+                        metric_label('patient'),
+                        patient_metric,
+                        step=epoch
+                    )
+
+    def _metrics_from_dataset(
+        self,
+        epoch_label: str,
+    ) -> Tuple[Dict, float, float]:
+        return sf.stats.metrics_from_dataset(
+            self.model,
+            model_type=self.hp.model_type(),
+            patients=self.parent.patients,
+            dataset=self.cb_args.validation_data,
+            outcome_names=self.parent.outcome_names,
+            label=epoch_label,
+            data_dir=self.parent.outdir,
+            num_tiles=self.cb_args.num_val_tiles,
+            save_predictions=self.cb_args.save_predictions,
+            reduce_method=self.cb_args.reduce_method,
+            loss=self.hp.get_loss(),
+            uq=bool(self.hp.uq),
+        )
+
+    def on_epoch_end(self, epoch: int, logs={}) -> None:
+        if sf.getLoggingLevel() <= 20:
+            print('\r\033[K', end='')
+        self.epoch_count += 1
+        if (self.epoch_count in [e for e in self.hp.epochs]
+           or self.early_stop):
+            if self.parent.name:
+                model_name = self.parent.name
+            else:
+                model_name = 'trained_model'
+            model_path = os.path.join(
+                self.parent.outdir,
+                f'{model_name}_epoch{self.epoch_count}'
+            )
+            if self.cb_args.save_model:
+                self.model.save(model_path)
+                log.info(f'Trained model saved to [green]{model_path}')
+
+                # Try to copy model settings/hyperparameters file
+                # into the model folder
+                params_dest = join(model_path, 'params.json')
+                if not exists(params_dest):
+                    try:
+                        config_path = join(dirname(model_path), 'params.json')
+                        if self.neptune_run:
+                            config = sf.util.load_json(config_path)
+                            config['neptune_id'] = self.neptune_run['sys/id'].fetch()
+                            sf.util.write_json(config, config_path)
+
+                        shutil.copy(config_path, params_dest)
+                        shutil.copy(
+                            join(dirname(model_path), 'slide_manifest.csv'),
+                            join(model_path, 'slide_manifest.csv')
+                        )
+                    except Exception as e:
+                        log.warning(e)
+                        log.warning('Unable to copy params.json/slide_manifest'
+                                    '.csv files into model folder.')
+
+            if self.cb_args.using_validation:
+                self.evaluate_model(logs)
+        elif self.early_stop:
+            self.evaluate_model(logs)
+        self.model.stop_training = self.early_stop
+
+    def on_train_batch_end(self, batch: int, logs={}) -> None:
+        # Tensorboard logging for training metrics
+        if batch > 0 and batch % self.cb_args.log_frequency == 0:
+            #with self.train_summary_writer.as_default():
+            self._log_training_metrics(logs)
+
+        # Check if manual early stopping has been triggered
+        if (self.hp.early_stop
+           and self.hp.early_stop_method == 'manual'):
+
+            assert self.hp.manual_early_stop_batch is not None
+            assert self.hp.manual_early_stop_epoch is not None
+
+            if (self.hp.manual_early_stop_epoch <= (self.epoch_count+1)
+               and self.hp.manual_early_stop_batch <= batch):
+
+                log.info('Manual early stop triggered: epoch '
+                         f'{self.epoch_count+1}, batch {batch}')
+                self.model.stop_training = True
+                self.early_stop = True
+                self.early_stop_batch = batch
+                self.early_stop_epoch = self.epoch_count + 1
+
+        # Validation metrics
+        if (self.cb_args.using_validation and self.cb_args.validate_on_batch
+           and (batch > 0)
+           and (batch % self.cb_args.validate_on_batch == 0)):
+            _, acc, loss = eval_from_model(
+                self.model,
+                self.cb_args.mid_train_validation_data,
+                model_type=self.hp.model_type(),
+                uq=False,
+                loss=self.hp.get_loss(),
+                steps=self.cb_args.validation_steps,
+                verbosity='quiet',
+            )
+            val_metrics = {'loss': loss}
+            val_log_metrics = {'loss': loss}
+            if isinstance(acc, float):
+                val_metrics['accuracy'] = acc
+                val_log_metrics['accuracy'] = acc
+            elif acc is not None:
+                val_metrics.update({f'accuracy-{i+1}': acc[i] for i in range(len(acc))})
+                val_log_metrics.update({f'out-{i}_accuracy': acc[i] for i in range(len(acc))})
+
+            val_loss = val_metrics['loss']
+            self.model.stop_training = False
+            if (self.hp.early_stop_method == 'accuracy'
+               and 'accuracy' in val_metrics):
+                early_stop_value = val_metrics['accuracy']
+                val_acc = f"{val_metrics['accuracy']:.3f}"
+            else:
+                early_stop_value = val_loss
+                val_acc = ', '.join([
+                    f'{val_metrics[v]:.3f}'
+                    for v in val_metrics
+                    if 'accuracy' in v
+                ])
+            if 'accuracy' in logs:
+                train_acc = f"{logs['accuracy']:.3f}"
+            else:
+                train_acc = ', '.join([
+                    f'{logs[v]:.3f}'
+                    for v in logs
+                    if 'accuracy' in v
+                ])
+            if sf.getLoggingLevel() <= 20:
+                print('\r\033[K', end='')
+            self.moving_average += [early_stop_value]
+
+            self._log_validation_metrics(val_log_metrics)
+            # Log training metrics if not already logged this batch
+            if batch % self.cb_args.log_frequency > 0:
+                self._log_training_metrics(logs)
+
+            # Base logging message
+            batch_msg = f'[blue]Batch {batch:<5}[/]'
+            loss_msg = f"[green]loss[/]: {logs['loss']:.3f}"
+            val_loss_msg = f"[magenta]val_loss[/]: {val_loss:.3f}"
+            if self.model_type == 'classification':
+                acc_msg = f"[green]acc[/]: {train_acc}"
+                val_acc_msg = f"[magenta]val_acc[/]: {val_acc}"
+                log_message = f"{batch_msg} {loss_msg}, {acc_msg} | "
+                log_message += f"{val_loss_msg}, {val_acc_msg}"
+            else:
+                log_message = f"{batch_msg} {loss_msg} | {val_loss_msg}"
+
+            # Calculate exponential moving average of validation accuracy
+            if len(self.moving_average) <= self.cb_args.ema_observations:
+                log.info(log_message)
+            else:
+                # Only keep track of the last [ema_observations] val accuracies
+                self.moving_average.pop(0)
+                if self.last_ema == -1:
+                    # Calculate simple moving average
+                    self.last_ema = (sum(self.moving_average)
+                                     / len(self.moving_average))
+                    log.info(log_message + f' (SMA: {self.last_ema:.3f})')
+                else:
+                    # Update exponential moving average
+                    sm = self.cb_args.ema_smoothing
+                    obs = self.cb_args.ema_observations
+                    self.last_ema = ((early_stop_value * (sm / (1 + obs)))
+                                     + (self.last_ema * (1 - (sm / (1 + obs)))))
+                    log.info(log_message + f' (EMA: {self.last_ema:.3f})')
+
+            # If early stopping and our patience criteria has been met,
+            #   check if validation accuracy is still improving
+            steps_per_epoch = self.cb_args.steps_per_epoch
+            if (self.hp.early_stop
+               and self.hp.early_stop_method in ('loss', 'accuracy')
+               and self.last_ema != -1
+               and ((float(batch) / steps_per_epoch) + self.epoch_count)
+                    > self.hp.early_stop_patience):
+
+                if (self.ema_two_checks_prior != -1
+                    and ((self.hp.early_stop_method == 'accuracy'
+                          and self.last_ema <= self.ema_two_checks_prior)
+                         or (self.hp.early_stop_method == 'loss'
+                             and self.last_ema >= self.ema_two_checks_prior))):
+
+                    log.info(f'Early stop: epoch {self.epoch_count+1}, batch '
+                             f'{batch}')
+                    self.model.stop_training = True
+                    self.early_stop = True
+                    self.early_stop_batch = batch
+                    self.early_stop_epoch = self.epoch_count + 1
+
+                    # Log early stop to neptune
+                    if self.neptune_run:
+                        self.neptune_run["early_stop/early_stop_epoch"] = self.epoch_count
+                        self.neptune_run["early_stop/early_stop_batch"] = batch
+                        self.neptune_run["early_stop/method"] = self.hp.early_stop_method
+                        self.neptune_run["early_stop/stopped_early"] = self.early_stop
+                        self.neptune_run["sys/tags"].add("early_stopped")
+                else:
+                    self.ema_two_checks_prior = self.ema_one_check_prior
+                    self.ema_one_check_prior = self.last_ema
+
+        # Update global step (for tracking metrics across epochs)
+        self.global_step += 1
+
+    def on_train_end(self, logs={}) -> None:
+        if sf.getLoggingLevel() <= 20:
+            print('\r\033[K')
+        if self.neptune_run:
+            self.neptune_run['sys/tags'].add('training_complete')
+
+    def evaluate_model(self, logs={}) -> None:
+        log.debug("Evaluating model from evaluation callback")
+        epoch = self.epoch_count
+        metrics, acc, loss = self._metrics_from_dataset(f'val_epoch{epoch}')
+
+        # Note that Keras loss during training includes regularization losses,
+        # so this loss will not match validation loss calculated during training
+        val_metrics = {'accuracy': acc, 'loss': loss}
+        log.info('Validation metrics: ' + json.dumps(val_metrics, indent=4))
+        self.results['epochs'][f'epoch{epoch}'] = {
+            'train_metrics': {k: v for k, v in logs.items() if k[:3] != 'val'},
+            'val_metrics': val_metrics
+        }
+        if self.early_stop:
+            self.results['epochs'][f'epoch{epoch}'].update({
+                'early_stop_epoch': self.early_stop_epoch,
+                'early_stop_batch': self.early_stop_batch,
+            })
+        for m in metrics:
+            if metrics[m]['tile'] is None:
+                continue
+            self.results['epochs'][f'epoch{epoch}'][f'tile_{m}'] = metrics[m]['tile']
+            self.results['epochs'][f'epoch{epoch}'][f'slide_{m}'] = metrics[m]['slide']
+            self.results['epochs'][f'epoch{epoch}'][f'patient_{m}'] = metrics[m]['patient']
+
+        epoch_results = self.results['epochs'][f'epoch{epoch}']
+        self._log_epoch_evaluation(
+            epoch_results, metrics=metrics, accuracy=acc, loss=loss, logs=logs
+        )
+
+
+class Trainer:
+    """Base trainer class containing functionality for model building, input
+    processing, training, and evaluation.
+
+    This base class requires categorical outcome(s). Additional outcome types
+    are supported by :class:`slideflow.model.RegressionTrainer` and
+    :class:`slideflow.model.SurvivalTrainer`.
+
+    Slide-level (e.g. clinical) features can be used as additional model input
+    by providing slide labels in the slide annotations dictionary, under the
+    key 'input'.
+    """
+
+    _model_type = 'classification'
+
+    def __init__(
+        self,
+        hp: ModelParams,
+        outdir: str,
+        labels: Dict[str, Any],
+        *,
+        slide_input: Optional[Dict[str, Any]] = None,
+        name: str = 'Trainer',
+        feature_sizes: Optional[List[int]] = None,
+        feature_names: Optional[List[str]] = None,
+        outcome_names: Optional[List[str]] = None,
+        mixed_precision: bool = True,
+        allow_tf32: bool = False,
+        config: Dict[str, Any] = None,
+        use_neptune: bool = False,
+        neptune_api: Optional[str] = None,
+        neptune_workspace: Optional[str] = None,
+        load_method: str = 'weights',
+        custom_objects: Optional[Dict[str, Any]] = None,
+        transform: Optional[Union[Callable, Dict[str, Callable]]] = None,
+    ) -> None:
+
+        """Sets base configuration, preparing model inputs and outputs.
+
+        Args:
+            hp (:class:`slideflow.ModelParams`): ModelParams object.
+            outdir (str): Path for event logs and checkpoints.
+            labels (dict): Dict mapping slide names to outcome labels (int or
+                float format).
+            slide_input (dict): Dict mapping slide names to additional
+                slide-level input, concatenated after post-conv.
+            name (str, optional): Optional name describing the model, used for
+                model saving. Defaults to 'Trainer'.
+            feature_sizes (list, optional): List of sizes of input features.
+                Required if providing additional input features as input to
+                the model.
+            feature_names (list, optional): List of names for input features.
+                Used when permuting feature importance.
+            outcome_names (list, optional): Name of each outcome. Defaults to
+                "Outcome {X}" for each outcome.
+            mixed_precision (bool, optional): Use FP16 mixed precision (rather
+                than FP32). Defaults to True.
+            allow_tf32 (bool): Allow internal use of Tensorfloat-32 format.
+                Defaults to False.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+            config (dict, optional): Training configuration dictionary, used
+                for logging and image format verification. Defaults to None.
+            use_neptune (bool, optional): Use Neptune API logging.
+                Defaults to False
+            neptune_api (str, optional): Neptune API token, used for logging.
+                Defaults to None.
+            neptune_workspace (str, optional): Neptune workspace.
+                Defaults to None.
+            custom_objects (dict, Optional): Dictionary mapping names
+                (strings) to custom classes or functions. Defaults to None.
+            transform (callable or dict, optional): Optional transform to
+                apply to input images. If dict, must have the keys 'train'
+                and/or 'val', mapping to callables that takes a single
+                image Tensor as input and returns a single image Tensor.
+                If None, no transform is applied. If a single callable is
+                provided, it will be applied to both training and validation
+                data. If a dict is provided, the 'train' transform will be
+                applied to training data and the 'val' transform will be
+                applied to validation data. If a dict is provided and either
+                'train' or 'val' is None, no transform will be applied to
+                that data. Defaults to None.
+        """
+
+        if load_method not in ('full', 'weights'):
+            raise ValueError("Unrecognized value for load_method, must be "
+                             "either 'full' or 'weights'.")
+
+        self.outdir = outdir
+        self.tile_px = hp.tile_px
+        self.labels = labels
+        self.hp = hp
+        self.slides = list(labels.keys())
+        self.slide_input = slide_input
+        self.feature_names = feature_names
+        self.feature_sizes = feature_sizes
+        self.num_slide_features = 0 if not feature_sizes else sum(feature_sizes)
+        self.mixed_precision = mixed_precision
+        self._allow_tf32 = allow_tf32
+        self.name = name
+        self.neptune_run = None
+        self.annotations_tables = []
+        self.eval_callback = _PredictionAndEvaluationCallback  # type: tf.keras.callbacks.Callback
+        self.load_method = load_method
+        self.custom_objects = custom_objects
+        self.patients = dict()
+
+        if not os.path.exists(outdir):
+            os.makedirs(outdir)
+
+        # Format outcome labels (ensures compatibility with single
+        # and multi-outcome models)
+        outcome_labels = np.array(list(labels.values()))
+        if len(outcome_labels.shape) == 1:
+            outcome_labels = np.expand_dims(outcome_labels, axis=1)
+        if not outcome_names:
+            outcome_names = [
+                f'Outcome {i}'
+                for i in range(outcome_labels.shape[1])
+            ]
+        outcome_names = sf.util.as_list(outcome_names)
+        if labels and (len(outcome_names) != outcome_labels.shape[1]):
+            num_names = len(outcome_names)
+            num_outcomes = outcome_labels.shape[1]
+            raise errors.ModelError(f'Size of outcome_names ({num_names}) != '
+                                    f'number of outcomes {num_outcomes}')
+        self.outcome_names = outcome_names
+        self._setup_inputs()
+        if labels:
+            self.num_classes = self.hp._detect_classes_from_labels(labels)
+            with tf.device('/cpu'):
+                for oi in range(outcome_labels.shape[1]):
+                    self.annotations_tables += [tf.lookup.StaticHashTable(
+                        tf.lookup.KeyValueTensorInitializer(
+                            self.slides,
+                            outcome_labels[:, oi]
+                        ), -1
+                    )]
+        else:
+            self.num_classes = None  # type: ignore
+
+        # Normalization setup
+        self.normalizer = self.hp.get_normalizer()
+        if self.normalizer:
+            log.info(f'Using realtime {self.hp.normalizer} normalization')
+
+        # Mixed precision and Tensorfloat-32
+        if self.mixed_precision:
+            _policy = 'mixed_float16'
+            log.debug(f'Enabling mixed precision ({_policy})')
+            if version.parse(tf.__version__) > version.parse("2.8"):
+                tf.keras.mixed_precision.set_global_policy(_policy)
+            else:
+                policy = tf.keras.mixed_precision.experimental.Policy(_policy)
+                tf.keras.mixed_precision.experimental.set_policy(policy)
+        tf.config.experimental.enable_tensor_float_32_execution(allow_tf32)
+
+        # Custom transforms
+        self._process_transforms(transform)
+
+        # Log parameters
+        if config is None:
+            config = {
+                'slideflow_version': sf.__version__,
+                'backend': sf.backend(),
+                'git_commit': sf.__gitcommit__,
+                'model_name': self.name,
+                'full_model_name': self.name,
+                'outcomes': self.outcome_names,
+                'model_type': self.hp.model_type(),
+                'img_format': None,
+                'tile_px': self.hp.tile_px,
+                'tile_um': self.hp.tile_um,
+                'input_features': None,
+                'input_feature_sizes': None,
+                'input_feature_labels': None,
+                'hp': self.hp.to_dict()
+            }
+        sf.util.write_json(config, join(self.outdir, 'params.json'))
+        self.config = config
+        self.img_format = config['img_format'] if 'img_format' in config else None
+
+        # Initialize Neptune
+        self.use_neptune = use_neptune
+        if self.use_neptune:
+            if neptune_api is None or neptune_workspace is None:
+                raise ValueError("If using Neptune, must supply values "
+                                 "neptune_api and neptune_workspace.")
+            self.neptune_logger = sf.util.neptune_utils.NeptuneLog(
+                neptune_api,
+                neptune_workspace
+            )
+
+    def _process_transforms(
+        self,
+        transform: Optional[Union[Callable, Dict[str, Callable]]] = None
+    ) -> None:
+        """Process custom transformations for training and/or validation."""
+        if not isinstance(transform, dict):
+            transform = {'train': transform, 'val': transform}
+        if any([t not in ('train', 'val') for t in transform]):
+            raise ValueError("transform must be a callable or dict with keys "
+                             "'train' and/or 'val'")
+        if 'train' not in transform:
+            transform['train'] = None
+        if 'val' not in transform:
+            transform['val'] = None
+        self.transform = transform
+
+    def _setup_inputs(self) -> None:
+        """Setup slide-level input."""
+        if self.num_slide_features:
+            assert self.slide_input is not None
+            try:
+                if self.num_slide_features:
+                    log.info(f'Training with both images and '
+                             f'{self.num_slide_features} slide-level input'
+                             'features')
+            except KeyError:
+                raise errors.ModelError("Unable to find slide-level input at "
+                                        "'input' key in annotations")
+            for slide in self.slides:
+                if len(self.slide_input[slide]) != self.num_slide_features:
+                    num_in_feature_table = len(self.slide_input[slide])
+                    raise errors.ModelError(
+                        f'Length of input for slide {slide} does not match '
+                        f'feature_sizes; expected {self.num_slide_features}, '
+                        f'got {num_in_feature_table}'
+                    )
+
+    def _compile_model(self) -> None:
+        """Compile keras model."""
+        self.model.compile(
+            optimizer=self.hp.get_opt(),
+            loss=self.hp.get_loss(),
+            metrics=['accuracy']
+        )
+
+    def _fit_normalizer(self, norm_fit: Optional[NormFit]) -> None:
+        """Fit the Trainer normalizer using the specified fit, if applicable.
+
+        Args:
+            norm_fit (Optional[Dict[str, np.ndarray]]): Normalizer fit.
+        """
+        if norm_fit is not None and not self.normalizer:
+            raise ValueError("norm_fit supplied, but model params do not"
+                             "specify a normalizer.")
+        if self.normalizer and norm_fit is not None:
+            self.normalizer.set_fit(**norm_fit)  # type: ignore
+        elif (self.normalizer
+              and 'norm_fit' in self.config
+              and self.config['norm_fit'] is not None):
+            log.debug("Detecting normalizer fit from model config")
+            self.normalizer.set_fit(**self.config['norm_fit'])
+
+    def _parse_tfrecord_labels(
+        self,
+        image: tf.Tensor,
+        slide: tf.Tensor
+    ) -> Tuple[Dict[str, tf.Tensor], tf.Tensor]:
+        """Parses raw entry read from TFRecord."""
+
+        image_dict = {'tile_image': image}
+
+        if self.num_classes is None:
+            label = None
+        elif len(self.num_classes) > 1:  # type: ignore
+            label = {
+                f'out-{oi}': self.annotations_tables[oi].lookup(slide)
+                for oi in range(len(self.num_classes))  # type: ignore
+            }
+        else:
+            label = self.annotations_tables[0].lookup(slide)
+
+        # Add additional non-image feature inputs if indicated,
+        #     excluding the event feature used for survival models
+        if self.num_slide_features:
+
+            def slide_lookup(s):
+                return self.slide_input[s.numpy().decode('utf-8')]
+
+            num_features = self.num_slide_features
+            slide_feature_input_val = tf.py_function(
+                func=slide_lookup,
+                inp=[slide],
+                Tout=[tf.float32] * num_features
+            )
+            image_dict.update({'slide_feature_input': slide_feature_input_val})
+
+        return image_dict, label
+
+    def _retrain_top_layers(
+        self,
+        train_data: tf.data.Dataset,
+        steps_per_epoch: int,
+        callbacks: tf.keras.callbacks.Callback = None,
+        epochs: int = 1
+    ) -> Dict:
+        """Retrain only the top layer, leaving all other layers frozen."""
+        log.info('Retraining top layer')
+        # Freeze the base layer
+        self.model.layers[0].trainable = False
+        #val_steps = 200 if validation_data else None
+        self._compile_model()
+
+        toplayer_model = self.model.fit(
+            train_data,
+            epochs=epochs,
+            verbose=(sf.getLoggingLevel() <= 20),
+            steps_per_epoch=steps_per_epoch,
+            callbacks=callbacks
+        )
+        # Unfreeze the base layer
+        self.model.layers[0].trainable = True
+        return toplayer_model.history
+
+    def _detect_patients(self, *args):
+        self.patients = dict()
+        for dataset in args:
+            if dataset is None:
+                continue
+            dataset_patients = dataset.patients()
+            if not dataset_patients:
+                self.patients.update({s: s for s in self.slides})
+            else:
+                self.patients.update(dataset_patients)
+
+    def _interleave_kwargs(self, **kwargs) -> Dict[str, Any]:
+        args = SimpleNamespace(
+            labels=self._parse_tfrecord_labels,
+            normalizer=self.normalizer,
+            **kwargs
+        )
+        return vars(args)
+
+    def _interleave_kwargs_val(self, **kwargs) -> Dict[str, Any]:
+        return self._interleave_kwargs(**kwargs)
+
+    def _metric_kwargs(self, **kwargs) -> Dict[str, Any]:
+        args = SimpleNamespace(
+            model=self.model,
+            model_type=self._model_type,
+            patients=self.patients,
+            outcome_names=self.outcome_names,
+            data_dir=self.outdir,
+            neptune_run=self.neptune_run,
+            **kwargs
+        )
+        return vars(args)
+
+    def _verify_img_format(self, dataset, *datasets: Optional["sf.Dataset"]) -> str:
+        """Verify that the image format of the dataset matches the model config.
+
+        Args:
+            dataset (sf.Dataset): Dataset to check.
+            *datasets (sf.Dataset): Additional datasets to check. May be None.
+
+        Returns:
+            str: Image format, either 'png' or 'jpg', if a consistent image
+                format was found, otherwise None.
+
+        """
+        # First, verify all datasets have the same image format
+        img_formats = set([d.img_format for d in datasets if d])
+        if len(img_formats) > 1:
+            log.error("Multiple image formats detected: {}.".format(
+                ', '.join(img_formats)
+            ))
+            return None
+        elif self.img_format and not dataset.img_format:
+            log.warning("Unable to verify image format (PNG/JPG) of dataset.")
+            return None
+        elif self.img_format and dataset.img_format != self.img_format:
+            log.error(
+                "Mismatched image formats. Expected '{}' per model config, "
+                "but dataset has format '{}'.".format(
+                    self.img_format,
+                    dataset.img_format))
+            return None
+        else:
+            return dataset.img_format
+
+    def load(self, model: str, **kwargs) -> tf.keras.Model:
+        self.model = load(
+            model,
+            method=self.load_method,
+            custom_objects=self.custom_objects,
+            **kwargs
+        )
+
+    def predict(
+        self,
+        dataset: "sf.Dataset",
+        batch_size: Optional[int] = None,
+        norm_fit: Optional[NormFit] = None,
+        format: str = 'parquet',
+        from_wsi: bool = False,
+        roi_method: str = 'auto',
+        reduce_method: Union[str, Callable] = 'average',
+    ) -> Dict[str, "pd.DataFrame"]:
+        """Perform inference on a model, saving tile-level predictions.
+
+        Args:
+            dataset (:class:`slideflow.dataset.Dataset`): Dataset containing
+                TFRecords to evaluate.
+            batch_size (int, optional): Evaluation batch size. Defaults to the
+                same as training (per self.hp)
+            norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit
+                parameters (e.g. target_means, target_stds) to values
+                (np.ndarray). If not provided, will fit normalizer using
+                model params (if applicable). Defaults to None.
+            format (str, optional): Format in which to save predictions. Either
+                'csv', 'feather', or 'parquet'. Defaults to 'parquet'.
+            from_wsi (bool): Generate predictions from tiles dynamically
+                extracted from whole-slide images, rather than TFRecords.
+                Defaults to False (use TFRecords).
+            roi_method (str): ROI method to use if from_wsi=True (ignored if
+                from_wsi=False).  Either 'inside', 'outside', 'auto', 'ignore'.
+                If 'inside' or 'outside', will extract tiles in/out of an ROI,
+                and raise errors.MissingROIError if an ROI is not available.
+                If 'auto', will extract tiles inside an ROI if available,
+                and across the whole-slide if no ROI is found.
+                If 'ignore', will extract tiles across the whole-slide
+                regardless of whether an ROI is available.
+                Defaults to 'auto'.
+            reduce_method (str, optional): Reduction method for calculating
+                slide-level and patient-level predictions for categorical
+                outcomes. Options include 'average', 'mean', 'proportion',
+                'median', 'sum', 'min', 'max', or a callable function.
+                'average' and 'mean' are  synonymous, with both options kept
+                for backwards compatibility. If  'average' or 'mean', will
+                reduce with average of each logit across  tiles. If
+                'proportion', will convert tile predictions into onehot encoding
+                then reduce by averaging these onehot values. For all other
+                values, will reduce with the specified function, applied via
+                the pandas ``DataFrame.agg()`` function. Defaults to 'average'.
+
+        Returns:
+            Dict[str, pd.DataFrame]: Dictionary with keys 'tile', 'slide', and
+            'patient', and values containing DataFrames with tile-, slide-,
+            and patient-level predictions.
+        """
+
+        if format not in ('csv', 'feather', 'parquet'):
+            raise ValueError(f"Unrecognized format {format}")
+
+        self._detect_patients(dataset)
+
+        # Verify image format
+        self._verify_img_format(dataset)
+
+        # Fit normalizer
+        self._fit_normalizer(norm_fit)
+
+        # Load and initialize model
+        if not self.model:
+            raise errors.ModelNotLoadedError
+        log_manifest(
+            None,
+            dataset.tfrecords(),
+            labels=self.labels,
+            filename=join(self.outdir, 'slide_manifest.csv')
+        )
+        if not batch_size:
+            batch_size = self.hp.batch_size
+        with tf.name_scope('input'):
+            interleave_kwargs = self._interleave_kwargs_val(
+                batch_size=batch_size,
+                infinite=False,
+                transform=self.transform['val'],
+                augment=False
+            )
+            tf_dts_w_slidenames = dataset.tensorflow(
+                incl_loc=True,
+                incl_slidenames=True,
+                from_wsi=from_wsi,
+                roi_method=roi_method,
+                **interleave_kwargs
+            )
+        # Generate predictions
+        log.info('Generating predictions...')
+        dfs = sf.stats.predict_dataset(
+            model=self.model,
+            dataset=tf_dts_w_slidenames,
+            model_type=self._model_type,
+            uq=bool(self.hp.uq),
+            num_tiles=dataset.num_tiles,
+            outcome_names=self.outcome_names,
+            patients=self.patients,
+            reduce_method=reduce_method,
+        )
+        # Save predictions
+        sf.stats.metrics.save_dfs(dfs, format=format, outdir=self.outdir)
+        return dfs
+
+    def evaluate(
+        self,
+        dataset: "sf.Dataset",
+        batch_size: Optional[int] = None,
+        save_predictions: Union[bool, str] = 'parquet',
+        reduce_method: Union[str, Callable] = 'average',
+        norm_fit: Optional[NormFit] = None,
+        uq: Union[bool, str] = 'auto',
+        from_wsi: bool = False,
+        roi_method: str = 'auto',
+    ) -> Dict[str, Any]:
+        """Evaluate model, saving metrics and predictions.
+
+        Args:
+            dataset (:class:`slideflow.dataset.Dataset`): Dataset containing
+                TFRecords to evaluate.
+            batch_size (int, optional): Evaluation batch size. Defaults to the
+                same as training (per self.hp)
+            save_predictions (bool or str, optional): Save tile, slide, and
+                patient-level predictions at each evaluation. May be 'csv',
+                'feather', or 'parquet'. If False, will not save predictions.
+                Defaults to 'parquet'.
+            reduce_method (str, optional): Reduction method for calculating
+                slide-level and patient-level predictions for categorical
+                outcomes. Options include 'average', 'mean', 'proportion',
+                'median', 'sum', 'min', 'max', or a callable function.
+                'average' and 'mean' are  synonymous, with both options kept
+                for backwards compatibility. If  'average' or 'mean', will
+                reduce with average of each logit across  tiles. If
+                'proportion', will convert tile predictions into onehot encoding
+                then reduce by averaging these onehot values. For all other
+                values, will reduce with the specified function, applied via
+                the pandas ``DataFrame.agg()`` function. Defaults to 'average'.
+            norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit
+                parameters (e.g. target_means, target_stds) to values
+                (np.ndarray). If not provided, will fit normalizer using
+                model params (if applicable). Defaults to None.
+            uq (bool or str, optional): Enable UQ estimation (for
+                applicable models). Defaults to 'auto'.
+
+        Returns:
+            Dictionary of evaluation metrics.
+        """
+        if uq != 'auto':
+            if not isinstance(uq, bool):
+                raise ValueError(f"Unrecognized value {uq} for uq")
+            self.hp.uq = uq
+
+        self._detect_patients(dataset)
+
+        # Verify image format
+        self._verify_img_format(dataset)
+
+        # Perform evaluation
+        _unit_type = 'slides' if from_wsi else 'tfrecords'
+        log.info(f'Evaluating {len(dataset.tfrecords())} {_unit_type}')
+
+        # Fit normalizer
+        self._fit_normalizer(norm_fit)
+
+        # Load and initialize model
+        if not self.model:
+            raise errors.ModelNotLoadedError
+        log_manifest(
+            None,
+            dataset.tfrecords(),
+            labels=self.labels,
+            filename=join(self.outdir, 'slide_manifest.csv')
+        )
+        # Neptune logging
+        if self.use_neptune:
+            assert self.neptune_run is not None
+            self.neptune_run = self.neptune_logger.start_run(
+                self.name,
+                self.config['project'],
+                dataset,
+                tags=['eval']
+            )
+            self.neptune_logger.log_config(self.config, 'eval')
+            self.neptune_run['data/slide_manifest'].upload(
+                join(self.outdir, 'slide_manifest.csv')
+            )
+
+        if not batch_size:
+            batch_size = self.hp.batch_size
+        with tf.name_scope('input'):
+            interleave_kwargs = self._interleave_kwargs_val(
+                batch_size=batch_size,
+                infinite=False,
+                transform=self.transform['val'],
+                augment=False
+            )
+            tf_dts_w_slidenames = dataset.tensorflow(
+                incl_slidenames=True,
+                incl_loc=True,
+                from_wsi=from_wsi,
+                roi_method=roi_method,
+                **interleave_kwargs
+            )
+        # Generate performance metrics
+        log.info('Calculating performance metrics...')
+        metric_kwargs = self._metric_kwargs(
+            dataset=tf_dts_w_slidenames,
+            num_tiles=dataset.num_tiles,
+            label='eval'
+        )
+        metrics, acc, loss = sf.stats.metrics_from_dataset(
+            save_predictions=save_predictions,
+            reduce_method=reduce_method,
+            loss=self.hp.get_loss(),
+            uq=bool(self.hp.uq),
+            **metric_kwargs
+        )
+        results = {'eval': {}}  # type: Dict[str, Dict[str, float]]
+        for metric in metrics:
+            if metrics[metric]:
+                log.info(f"Tile {metric}: {metrics[metric]['tile']}")
+                log.info(f"Slide {metric}: {metrics[metric]['slide']}")
+                log.info(f"Patient {metric}: {metrics[metric]['patient']}")
+                results['eval'].update({
+                    f'tile_{metric}': metrics[metric]['tile'],
+                    f'slide_{metric}': metrics[metric]['slide'],
+                    f'patient_{metric}': metrics[metric]['patient']
+                })
+
+        # Note that Keras loss during training includes regularization losses,
+        # so this loss will not match validation loss calculated during training
+        val_metrics = {'accuracy': acc, 'loss': loss}
+        results_log = os.path.join(self.outdir, 'results_log.csv')
+        log.info('Evaluation metrics:')
+        for m in val_metrics:
+            log.info(f'{m}: {val_metrics[m]}')
+        results['eval'].update(val_metrics)
+        sf.util.update_results_log(results_log, 'eval_model', results)
+
+        # Update neptune log
+        if self.neptune_run:
+            self.neptune_run['eval/results'] = val_metrics
+            self.neptune_run.stop()
+
+        return results
+
+    def train(
+        self,
+        train_dts: "sf.Dataset",
+        val_dts: Optional["sf.Dataset"],
+        log_frequency: int = 100,
+        validate_on_batch: int = 0,
+        validation_batch_size: int = None,
+        validation_steps: int = 200,
+        starting_epoch: int = 0,
+        ema_observations: int = 20,
+        ema_smoothing: int = 2,
+        use_tensorboard: bool = True,
+        steps_per_epoch_override: int = 0,
+        save_predictions: Union[bool, str] = 'parquet',
+        save_model: bool = True,
+        resume_training: Optional[str] = None,
+        pretrain: Optional[str] = 'imagenet',
+        checkpoint: Optional[str] = None,
+        save_checkpoints: bool = True,
+        multi_gpu: bool = False,
+        reduce_method: Union[str, Callable] = 'average',
+        norm_fit: Optional[NormFit] = None,
+        from_wsi: bool = False,
+        roi_method: str = 'auto',
+    ) -> Dict[str, Any]:
+        """Builds and trains a model from hyperparameters.
+
+        Args:
+            train_dts (:class:`slideflow.Dataset`): Training dataset. Will call
+                the `.tensorflow()` method to retrieve the tf.data.Dataset
+                used for model fitting.
+            val_dts (:class:`slideflow.Dataset`): Validation dataset. Will call
+                the `.tensorflow()` method to retrieve the tf.data.Dataset
+                used for model fitting.
+            log_frequency (int, optional): How frequent to update Tensorboard
+                logs, in batches. Defaults to 100.
+            validate_on_batch (int, optional): Validation will also be performed
+                every N batches. Defaults to 0.
+            validation_batch_size (int, optional): Validation batch size.
+                Defaults to same as training (per self.hp).
+            validation_steps (int, optional): Number of batches to use for each
+                instance of validation. Defaults to 200.
+            starting_epoch (int, optional): Starts training at the specified
+                epoch. Defaults to 0.
+            ema_observations (int, optional): Number of observations over which
+                to perform exponential moving average smoothing. Defaults to 20.
+            ema_smoothing (int, optional): Exponential average smoothing value.
+                Defaults to 2.
+            use_tensoboard (bool, optional): Enable tensorboard callbacks.
+                Defaults to False.
+            steps_per_epoch_override (int, optional): Manually set the number
+                of steps per epoch. Defaults to 0 (automatic).
+            save_predictions (bool or str, optional): Save tile, slide, and
+                patient-level predictions at each evaluation. May be 'csv',
+                'feather', or 'parquet'. If False, will not save predictions.
+                Defaults to 'parquet'.
+            save_model (bool, optional): Save models when evaluating at
+                specified epochs. Defaults to True.
+            resume_training (str, optional): Path to model to continue training.
+                Only valid in Tensorflow backend. Defaults to None.
+            pretrain (str, optional): Either 'imagenet' or path to Tensorflow
+                model from which to load weights. Defaults to 'imagenet'.
+            checkpoint (str, optional): Path to cp.ckpt from which to load
+                weights. Defaults to None.
+            save_checkpoint (bool, optional): Save checkpoints at each epoch.
+                Defaults to True.
+            multi_gpu (bool, optional): Enable multi-GPU training using
+                Tensorflow/Keras MirroredStrategy.
+            reduce_method (str, optional): Reduction method for calculating
+                slide-level and patient-level predictions for categorical
+                outcomes. Options include 'average', 'mean', 'proportion',
+                'median', 'sum', 'min', 'max', or a callable function.
+                'average' and 'mean' are  synonymous, with both options kept
+                for backwards compatibility. If  'average' or 'mean', will
+                reduce with average of each logit across  tiles. If
+                'proportion', will convert tile predictions into onehot encoding
+                then reduce by averaging these onehot values. For all other
+                values, will reduce with the specified function, applied via
+                the pandas ``DataFrame.agg()`` function. Defaults to 'average'.
+            norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit
+                parameters (e.g. target_means, target_stds) to values
+                (np.ndarray). If not provided, will fit normalizer using
+                model params (if applicable). Defaults to None.
+
+        Returns:
+            dict: Nested results dict with metrics for each evaluated epoch.
+        """
+
+        if self.hp.model_type() != self._model_type:
+            hp_model = self.hp.model_type()
+            raise errors.ModelError(f"Incompatible models: {hp_model} (hp) and "
+                                    f"{self._model_type} (model)")
+
+        self._detect_patients(train_dts, val_dts)
+
+        # Verify image format across datasets.
+        img_format = self._verify_img_format(train_dts, val_dts)
+        if img_format and self.config['img_format'] is None:
+            self.config['img_format'] = img_format
+            sf.util.write_json(self.config, join(self.outdir, 'params.json'))
+
+        # Clear prior Tensorflow graph to free memory
+        tf.keras.backend.clear_session()
+        results_log = os.path.join(self.outdir, 'results_log.csv')
+
+        # Fit the normalizer to the training data and log the source mean/stddev
+        if self.normalizer and self.hp.normalizer_source == 'dataset':
+            self.normalizer.fit(train_dts)
+        else:
+            self._fit_normalizer(norm_fit)
+
+        if self.normalizer:
+            config_path = join(self.outdir, 'params.json')
+            if not exists(config_path):
+                config = {
+                    'slideflow_version': sf.__version__,
+                    'hp': self.hp.to_dict(),
+                    'backend': sf.backend()
+                }
+            else:
+                config = sf.util.load_json(config_path)
+            config['norm_fit'] = self.normalizer.get_fit(as_list=True)
+            sf.util.write_json(config, config_path)
+
+        # Prepare multiprocessing pool if from_wsi=True
+        if from_wsi:
+            pool = mp.Pool(
+                sf.util.num_cpu(default=8),
+                initializer=sf.util.set_ignore_sigint
+            )
+        else:
+            pool = None
+
+        # Save training / validation manifest
+        if val_dts is None:
+            val_paths = None
+        elif from_wsi:
+            val_paths = val_dts.slide_paths()
+        else:
+            val_paths = val_dts.tfrecords()
+        log_manifest(
+            train_dts.tfrecords(),
+            val_paths,
+            labels=self.labels,
+            filename=join(self.outdir, 'slide_manifest.csv')
+        )
+
+        # Neptune logging
+        if self.use_neptune:
+            tags = ['train']
+            if 'k-fold' in self.config['validation_strategy']:
+                tags += [f'k-fold{self.config["k_fold_i"]}']
+            self.neptune_run = self.neptune_logger.start_run(
+                self.name,
+                self.config['project'],
+                train_dts,
+                tags=tags
+            )
+            self.neptune_logger.log_config(self.config, 'train')
+            self.neptune_run['data/slide_manifest'].upload(  # type: ignore
+                os.path.join(self.outdir, 'slide_manifest.csv')
+            )
+
+        # Set up multi-GPU strategy
+        if multi_gpu:
+            strategy = tf.distribute.MirroredStrategy()
+            log.info('Multi-GPU training with '
+                     f'{strategy.num_replicas_in_sync} devices')
+            # Fixes "OSError: [Errno 9] Bad file descriptor" after training
+            atexit.register(strategy._extended._collective_ops._pool.close)
+        else:
+            strategy = None
+
+        with strategy.scope() if strategy else no_scope():
+            # Build model from ModelParams
+            if resume_training:
+                self.model = load(resume_training, method='weights', training=True)
+            else:
+                model = self.hp.build_model(
+                    labels=self.labels,
+                    num_slide_features=self.num_slide_features,
+                    pretrain=pretrain,
+                    checkpoint=checkpoint,
+                    load_method=self.load_method
+                )
+                self.model = model
+                tf_utils.log_summary(model, self.neptune_run)
+
+            with tf.name_scope('input'):
+                t_kwargs = self._interleave_kwargs(
+                    batch_size=self.hp.batch_size,
+                    infinite=True,
+                    augment=self.hp.augment,
+                    transform=self.transform['train'],
+                    from_wsi=from_wsi,
+                    pool=pool,
+                    roi_method=roi_method
+                )
+                train_data = train_dts.tensorflow(drop_last=True, **t_kwargs)
+                log.debug(f"Training: {train_dts.num_tiles} total tiles.")
+
+            # Set up validation data
+            using_validation = (val_dts
+                                and (len(val_dts.tfrecords()) if not from_wsi
+                                     else len(val_dts.slide_paths())))
+            if using_validation:
+                assert val_dts is not None
+                with tf.name_scope('input'):
+                    if not validation_batch_size:
+                        validation_batch_size = self.hp.batch_size
+                    v_kwargs = self._interleave_kwargs_val(
+                        batch_size=validation_batch_size,
+                        infinite=False,
+                        augment=False,
+                        transform=self.transform['val'],
+                        from_wsi=from_wsi,
+                        pool=pool,
+                        roi_method=roi_method
+                    )
+                    validation_data = val_dts.tensorflow(
+                        incl_slidenames=True,
+                        incl_loc=True,
+                        drop_last=True,
+                        **v_kwargs
+                    )
+                    log.debug(f"Validation: {val_dts.num_tiles} total tiles.")
+                if validate_on_batch:
+                    log.debug('Validation during training: every '
+                              f'{validate_on_batch} steps and at epoch end')
+                    mid_v_kwargs = v_kwargs.copy()
+                    mid_v_kwargs['infinite'] = True
+                    mid_train_validation_data = iter(val_dts.tensorflow(
+                        incl_slidenames=True,
+                        incl_loc=True,
+                        drop_last=True,
+                        **mid_v_kwargs
+                    ))
+                else:
+                    log.debug('Validation during training: at epoch end')
+                    mid_train_validation_data = None
+                if validation_steps:
+                    num_samples = validation_steps * self.hp.batch_size
+                    log.debug(f'Using {validation_steps} batches ({num_samples}'
+                              ' samples) each validation check')
+                else:
+                    log.debug('Using entire validation set each val check')
+            else:
+                log.debug('Validation during training: None')
+                validation_data = None
+                mid_train_validation_data = None
+                validation_steps = 0
+
+            # Calculate parameters
+            if from_wsi:
+                train_tiles = train_data.est_num_tiles
+                val_tiles = validation_data.est_num_tiles
+            else:
+                train_tiles = train_dts.num_tiles
+                val_tiles = 0 if val_dts is None else val_dts.num_tiles
+            if max(self.hp.epochs) <= starting_epoch:
+                max_epoch = max(self.hp.epochs)
+                log.error(f'Starting epoch ({starting_epoch}) cannot be greater'
+                          f' than max target epoch ({max_epoch})')
+            if (self.hp.early_stop and self.hp.early_stop_method == 'accuracy'
+               and self._model_type != 'classification'):
+                log.error("Unable to use 'accuracy' early stopping with model "
+                          f"type '{self.hp.model_type()}'")
+            if starting_epoch != 0:
+                log.info(f'Starting training at epoch {starting_epoch}')
+            if steps_per_epoch_override:
+                steps_per_epoch = steps_per_epoch_override
+            else:
+                steps_per_epoch = round(train_tiles / self.hp.batch_size)
+
+            cb_args = SimpleNamespace(
+                starting_epoch=starting_epoch,
+                using_validation=using_validation,
+                validate_on_batch=validate_on_batch,
+                validation_steps=validation_steps,
+                ema_observations=ema_observations,
+                ema_smoothing=ema_smoothing,
+                steps_per_epoch=steps_per_epoch,
+                validation_data=validation_data,
+                mid_train_validation_data=mid_train_validation_data,
+                num_val_tiles=val_tiles,
+                save_predictions=save_predictions,
+                save_model=save_model,
+                results_log=results_log,
+                reduce_method=reduce_method,
+                log_frequency=log_frequency
+            )
+
+            # Create callbacks for early stopping, checkpoint saving,
+            # summaries, and history
+            val_callback = self.eval_callback(self, cb_args)
+            callbacks = [tf.keras.callbacks.History(), val_callback]
+            if save_checkpoints:
+                cp_callback = tf.keras.callbacks.ModelCheckpoint(
+                    os.path.join(self.outdir, 'cp.ckpt'),
+                    save_weights_only=True,
+                    verbose=(sf.getLoggingLevel() <= 20)
+                )
+                callbacks += [cp_callback]
+            if use_tensorboard:
+                log.debug(
+                    "Logging with Tensorboard to {} every {} batches.".format(
+                        self.outdir, log_frequency
+                    ))
+                tensorboard_callback = tf.keras.callbacks.TensorBoard(
+                    log_dir=self.outdir,
+                    histogram_freq=0,
+                    write_graph=False,
+                    update_freq='batch'
+                )
+                callbacks += [tensorboard_callback]
+
+            # Retrain top layer only, if using transfer learning and
+            # not resuming training
+            total_epochs = (self.hp.toplayer_epochs
+                            + (max(self.hp.epochs) - starting_epoch))
+            if self.hp.toplayer_epochs:
+                self._retrain_top_layers(
+                    train_data,
+                    steps_per_epoch,
+                    callbacks=None,
+                    epochs=self.hp.toplayer_epochs
+                )
+            self._compile_model()
+
+            # Train the model
+            log.info('Beginning training')
+            try:
+                self.model.fit(
+                    train_data,
+                    steps_per_epoch=steps_per_epoch,
+                    epochs=total_epochs,
+                    verbose=(sf.getLoggingLevel() <= 20),
+                    initial_epoch=self.hp.toplayer_epochs,
+                    callbacks=callbacks
+                )
+            except tf.errors.ResourceExhaustedError as e:
+                log.error(f"Training failed for [bold]{self.name}[/]. "
+                          f"Error: \n {e}")
+            results = val_callback.results
+            if self.use_neptune and self.neptune_run is not None:
+                self.neptune_run['results'] = results['epochs']
+                self.neptune_run.stop()
+
+            # Cleanup
+            if pool is not None:
+                pool.close()
+            del mid_train_validation_data
+
+            return results
+
+
+class RegressionTrainer(Trainer):
+
+    """Extends the base :class:`slideflow.model.Trainer` class to add support
+    for regression models with continuous outcomes. Requires that all outcomes be continuous,
+    with appropriate regression loss function. Uses R-squared as the evaluation
+    metric, rather than AUROC."""
+
+    _model_type = 'regression'
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+
+    def _compile_model(self) -> None:
+        self.model.compile(optimizer=self.hp.get_opt(),
+                           loss=self.hp.get_loss(),
+                           metrics=[self.hp.get_loss()])
+
+    def _parse_tfrecord_labels(
+        self,
+        image: Union[Dict[str, tf.Tensor], tf.Tensor],
+        slide: tf.Tensor
+    ) -> Tuple[Union[Dict[str, tf.Tensor], tf.Tensor], tf.Tensor]:
+        image_dict = {'tile_image': image}
+        if self.num_classes is None:
+            label = None
+        else:
+            label = [
+                self.annotations_tables[oi].lookup(slide)
+                for oi in range(self.num_classes)  # type: ignore
+            ]
+
+        # Add additional non-image feature inputs if indicated,
+        #     excluding the event feature used for survival models
+        if self.num_slide_features:
+
+            def slide_lookup(s):
+                return self.slide_input[s.numpy().decode('utf-8')]
+
+            num_features = self.num_slide_features
+            slide_feature_input_val = tf.py_function(
+                func=slide_lookup,
+                inp=[slide],
+                Tout=[tf.float32] * num_features
+            )
+            image_dict.update({'slide_feature_input': slide_feature_input_val})
+
+        return image_dict, label
+
+
+class SurvivalTrainer(RegressionTrainer):
+
+    """Cox Proportional Hazards model. Requires that the user provide event
+    data as the first input feature, and time to outcome as the continuous outcome.
+    Uses concordance index as the evaluation metric."""
+
+    _model_type = 'survival'
+
+    def __init__(self, *args, **kwargs) -> None:
+        super().__init__(*args, **kwargs)
+        if not self.num_slide_features:
+            raise errors.ModelError('Model error - survival models must '
+                                    'include event input')
+
+    def _setup_inputs(self) -> None:
+        # Setup slide-level input
+        try:
+            num_features = self.num_slide_features - 1
+            if num_features:
+                log.info(f'Training with both images and {num_features} '
+                         'categories of slide-level input')
+                log.info('Interpreting first feature as event for survival model')
+            else:
+                log.info('Training with images alone. Interpreting first '
+                         'feature as event for survival model')
+        except KeyError:
+            raise errors.ModelError("Unable to find slide-level input at "
+                                    "'input' key in annotations")
+        assert self.slide_input is not None
+        for slide in self.slides:
+            if len(self.slide_input[slide]) != self.num_slide_features:
+                num_in_feature_table = len(self.slide_input[slide])
+                raise errors.ModelError(
+                    f'Length of input for slide {slide} does not match '
+                    f'feature_sizes; expected {self.num_slide_features}, got '
+                    f'{num_in_feature_table}'
+                )
+
+    def load(self, model: str, **kwargs) -> tf.keras.Model:
+        if self.load_method == 'full':
+            custom_objects = {
+                'negative_log_likelihood': tf_utils.negative_log_likelihood,
+                'concordance_index': tf_utils.concordance_index
+            }
+            self.model = tf.keras.models.load_model(
+                model,
+                custom_objects=custom_objects
+            )
+            self.model.compile(
+                loss=tf_utils.negative_log_likelihood,
+                metrics=tf_utils.concordance_index
+            )
+        else:
+            self.model = load(model, method=self.load_method, **kwargs)
+
+    def _compile_model(self) -> None:
+        self.model.compile(optimizer=self.hp.get_opt(),
+                           loss=tf_utils.negative_log_likelihood,
+                           metrics=tf_utils.concordance_index)
+
+    def _parse_tfrecord_labels(
+        self,
+        image: Union[Dict[str, tf.Tensor], tf.Tensor],
+        slide: tf.Tensor
+    ) -> Tuple[Union[Dict[str, tf.Tensor], tf.Tensor], tf.Tensor]:
+        image_dict = {'tile_image': image}
+        if self.num_classes is None:
+            label = None
+        else:
+            label = [
+                self.annotations_tables[oi].lookup(slide)
+                for oi in range(self.num_classes)  # type: ignore
+            ]
+
+        # Add additional non-image feature inputs if indicated,
+        #     excluding the event feature used for survival models
+        if self.num_slide_features:
+            # Time-to-event data must be added as a separate feature
+
+            def slide_lookup(s):
+                return self.slide_input[s.numpy().decode('utf-8')][1:]
+
+            def event_lookup(s):
+                return self.slide_input[s.numpy().decode('utf-8')][0]
+
+            num_features = self.num_slide_features - 1
+            event_input_val = tf.py_function(
+                func=event_lookup,
+                inp=[slide],
+                Tout=[tf.float32]
+            )
+            image_dict.update({'event_input': event_input_val})
+            slide_feature_input_val = tf.py_function(
+                func=slide_lookup,
+                inp=[slide],
+                Tout=[tf.float32] * num_features
+            )
+            # Add slide input features, excluding the event feature
+            # used for survival models
+            if not (self.num_slide_features == 1):
+                image_dict.update(
+                    {'slide_feature_input': slide_feature_input_val}
+                )
+        return image_dict, label
+
+
+class Features(BaseFeatureExtractor):
+    """Interface for obtaining predictions and features from intermediate layer
+    activations from Slideflow models.
+
+    Use by calling on either a batch of images (returning outputs for a single
+    batch), or by calling on a :class:`slideflow.WSI` object, which will
+    generate an array of spatially-mapped activations matching the slide.
+
+    Examples
+        *Calling on batch of images:*
+
+        .. code-block:: python
+
+            interface = Features('/model/path', layers='postconv')
+            for image_batch in train_data:
+                # Return shape: (batch_size, num_features)
+                batch_features = interface(image_batch)
+
+        *Calling on a slide:*
+
+        .. code-block:: python
+
+            slide = sf.WSI(...)
+            interface = Features('/model/path', layers='postconv')
+            # Returns shape:
+            # (slide.grid.shape[0], slide.grid.shape[1], num_features)
+            activations_grid = interface(slide)
+
+    Note:
+        When this interface is called on a batch of images, no image processing
+        or stain normalization will be performed, as it is assumed that
+        normalization will occur during data loader image processing. When the
+        interface is called on a `slideflow.WSI`, the normalization strategy
+        will be read from the model configuration file, and normalization will
+        be performed on image tiles extracted from the WSI. If this interface
+        was created from an existing model and there is no model configuration
+        file to read, a slideflow.norm.StainNormalizer object may be passed
+        during initialization via the argument `wsi_normalizer`.
+    """
+
+    def __init__(
+        self,
+        path: Optional[str],
+        layers: Optional[Union[str, List[str]]] = 'postconv',
+        include_preds: bool = False,
+        load_method: str = 'weights',
+        pooling: Optional[Any] = None,
+        device: Optional[str] = None,
+    ) -> None:
+        """Creates a features interface from a saved slideflow model which
+        outputs feature activations at the designated layers.
+
+        Intermediate layers are returned in the order of layers.
+        predictions are returned last.
+
+        Args:
+            path (str): Path to saved Slideflow model.
+            layers (list(str), optional): Layers from which to generate
+                activations.  The post-convolution activation layer is accessed
+                via 'postconv'. Defaults to 'postconv'.
+            include_preds (bool, optional): Include predictions in output. Will be
+                returned last. Defaults to False.
+            load_method (str): Either 'full' or 'weights'. Method to use
+                when loading a Tensorflow model. If 'full', loads the model with
+                ``tf.keras.models.load_model()``. If 'weights', will read the
+                ``params.json`` configuration file, build the model architecture,
+                and then load weights from the given model with
+                ``Model.load_weights()``. Loading with 'full' may improve
+                compatibility across Slideflow versions. Loading with 'weights'
+                may improve compatibility across hardware & environments.
+        """
+        super().__init__('tensorflow', include_preds=include_preds)
+        if layers and isinstance(layers, str):
+            layers = [layers]
+        self.layers = layers
+        self.path = path
+        self.device = device
+        if isinstance(device, str):
+            self.device = device.replace('cuda', 'gpu')
+        self._pooling = None
+        self._include_preds = None
+        if path is not None:
+            self._model = load(self.path, method=load_method)  # type: ignore
+            config = sf.util.get_model_config(path)
+            if 'img_format' in config:
+                self.img_format = config['img_format']
+            self.hp = sf.ModelParams()
+            self.hp.load_dict(config['hp'])
+            self.wsi_normalizer = self.hp.get_normalizer()
+            if 'norm_fit' in config and config['norm_fit'] is not None:
+                if self.wsi_normalizer is None:
+                    log.warn('norm_fit found in model config file, but model '
+                             'params does not use a normalizer. Ignoring.')
+                else:
+                    self.wsi_normalizer.set_fit(**config['norm_fit'])
+            self._build(
+                layers=layers, include_preds=include_preds, pooling=pooling  # type: ignore
+            )
+
+    @classmethod
+    def from_model(
+        cls,
+        model: tf.keras.Model,
+        layers: Optional[Union[str, List[str]]] = 'postconv',
+        include_preds: bool = False,
+        wsi_normalizer: Optional["StainNormalizer"] = None,
+        pooling: Optional[Any] = None,
+        device: Optional[str] = None
+    ):
+        """Creates a features interface from a loaded slideflow model which
+        outputs feature activations at the designated layers.
+
+        Intermediate layers are returned in the order of layers.
+        predictions are returned last.
+
+        Args:
+            model (:class:`tensorflow.keras.models.Model`): Loaded model.
+            layers (list(str), optional): Layers from which to generate
+                activations.  The post-convolution activation layer is accessed
+                via 'postconv'. Defaults to 'postconv'.
+            include_preds (bool, optional): Include predictions in output. Will be
+                returned last. Defaults to False.
+            wsi_normalizer (:class:`slideflow.norm.StainNormalizer`): Stain
+                normalizer to use on whole-slide images. Not used on
+                individual tile datasets via __call__. Defaults to None.
+        """
+        obj = cls(None, layers, include_preds, device=device)
+        if isinstance(model, tf.keras.models.Model):
+            obj._model = model
+        else:
+            raise errors.ModelError(f"Model {model} is not a valid Tensorflow "
+                                    "model.")
+        obj._build(
+            layers=layers, include_preds=include_preds, pooling=pooling  # type: ignore
+        )
+        obj.wsi_normalizer = wsi_normalizer
+        return obj
+
+    def __repr__(self):
+        return ("{}(\n".format(self.__class__.__name__) +
+                "    path={!r},\n".format(self.path) +
+                "    layers={!r},\n".format(self.layers) +
+                "    include_preds={!r},\n".format(self._include_preds) +
+                "    pooling={!r},\n".format(self._pooling) +
+                ")")
+
+    def __call__(
+        self,
+        inp: Union[tf.Tensor, "sf.WSI"],
+        **kwargs
+    ) -> Optional[Union[np.ndarray, tf.Tensor]]:
+        """Process a given input and return features and/or predictions.
+        Expects either a batch of images or a :class:`slideflow.WSI`.
+
+        When calling on a `WSI` object, keyword arguments are passed to
+        :meth:`slideflow.WSI.build_generator()`.
+
+        """
+        if isinstance(inp, sf.WSI):
+            return self._predict_slide(inp, **kwargs)
+        else:
+            return self._predict(inp)
+
+    def _predict_slide(
+        self,
+        slide: "sf.WSI",
+        *,
+        img_format: str = 'auto',
+        batch_size: int = 32,
+        dtype: type = np.float16,
+        grid: Optional[np.ndarray] = None,
+        shuffle: bool = False,
+        show_progress: bool = True,
+        callback: Optional[Callable] = None,
+        normalizer: Optional[Union[str, "sf.norm.StainNormalizer"]] = None,
+        normalizer_source: Optional[str] = None,
+        **kwargs
+    ) -> Optional[np.ndarray]:
+        """Generate activations from slide => activation grid array."""
+
+        # Check image format
+        if img_format == 'auto' and self.img_format is None:
+            raise ValueError(
+                'Unable to auto-detect image format (png or jpg). Set the '
+                'format by passing img_format=... to the call function.'
+            )
+        elif img_format == 'auto':
+            assert self.img_format is not None
+            img_format = self.img_format
+
+        return sf.model.extractors.features_from_slide(
+            self,
+            slide,
+            img_format=img_format,
+            batch_size=batch_size,
+            dtype=dtype,
+            grid=grid,
+            shuffle=shuffle,
+            show_progress=show_progress,
+            callback=callback,
+            normalizer=(normalizer if normalizer else self.wsi_normalizer),
+            normalizer_source=normalizer_source,
+            **kwargs
+        )
+
+    @tf.function
+    def _predict(self, inp: tf.Tensor) -> tf.Tensor:
+        """Return activations for a single batch of images."""
+        with tf.device(self.device) if self.device else no_scope():
+            return self.model(inp, training=False)
+
+    def _build(
+        self,
+        layers: Optional[Union[str, List[str]]],
+        include_preds: bool = True,
+        pooling: Optional[Any] = None
+    ) -> None:
+        """Builds the interface model that outputs feature activations at the
+        designated layers and/or predictions. Intermediate layers are returned in
+        the order of layers. predictions are returned last."""
+
+        self._pooling = pooling
+        self._include_preds = include_preds
+
+        if isinstance(pooling, str):
+            if pooling == 'avg':
+                pooling = tf.keras.layers.GlobalAveragePooling2D
+            elif pooling == 'max':
+                pooling = tf.keras.layers.GlobalMaxPool2D
+            else:
+                raise ValueError(f"Unrecognized pooling value {pooling}. "
+                                 "Expected 'avg', 'max', or Keras layer.")
+
+        if layers and not isinstance(layers, list):
+            layers = [layers]
+        if layers:
+            if 'postconv' in layers:
+                layers[layers.index('postconv')] = 'post_convolution'  # type: ignore
+            log.debug(f"Setting up interface to return activations from layers "
+                      f"{', '.join(layers)}")
+        else:
+            layers = []
+
+        def pool_if_3d(tensor):
+            if pooling is not None and len(tensor.shape) == 4:
+                return pooling()(tensor)
+            else:
+                return tensor
+
+        # Find the desired layers
+        outputs = {}
+        outer_layer_outputs = {
+            self._model.layers[i].name: self._model.layers[i].output
+            for i in range(len(self._model.layers))
+        }
+        core_layer_outputs = {}
+        inner_layers = [la for la in layers if la not in outer_layer_outputs]
+        if inner_layers:
+            intermediate_core = tf.keras.models.Model(
+                inputs=self._model.layers[1].input,
+                outputs=[
+                    pool_if_3d(self._model.layers[1].get_layer(il).output)
+                    for il in inner_layers
+                ]
+            )
+            if len(inner_layers) > 1:
+                int_out = intermediate_core(self._model.input)
+                for la, layer in enumerate(inner_layers):
+                    core_layer_outputs[layer] = int_out[la]
+            else:
+                outputs[inner_layers[0]] = intermediate_core(self._model.input)
+        for layer in layers:
+            if layer in outer_layer_outputs:
+                outputs[layer] = outer_layer_outputs[layer]
+            elif layer in core_layer_outputs:
+                outputs[layer] = core_layer_outputs[layer]
+
+        # Build a model that outputs the given layers
+        outputs_list = [] if not layers else [outputs[la] for la in layers]
+        if include_preds:
+            outputs_list += [self._model.output]
+        self.model = tf.keras.models.Model(
+            inputs=self._model.input,
+            outputs=outputs_list
+        )
+        self.num_features = sum([outputs[o].shape[1] for o in outputs])
+        self.num_outputs = len(outputs_list)
+        if isinstance(self._model.output, list) and include_preds:
+            log.warning("Multi-categorical outcomes is experimental "
+                        "for this interface.")
+            self.num_classes = sum(o.shape[1] for o in self._model.output)
+        elif include_preds:
+            self.num_classes = self._model.output.shape[1]
+        else:
+            self.num_classes = 0
+
+        if include_preds:
+            log.debug(f'Number of classes: {self.num_classes}')
+        log.debug(f'Number of activation features: {self.num_features}')
+
+    def dump_config(self):
+        return {
+            'class': 'slideflow.model.tensorflow.Features',
+            'kwargs': {
+                'path': self.path,
+                'layers': self.layers,
+                'include_preds': self._include_preds,
+                'pooling': self._pooling
+            }
+        }
+
+class UncertaintyInterface(Features):
+    def __init__(
+        self,
+        path: Optional[str],
+        layers: Optional[Union[str, List[str]]] = 'postconv',
+        load_method: str = 'weights',
+        pooling: Optional[Any] = None
+    ) -> None:
+        super().__init__(
+            path,
+            layers=layers,
+            include_preds=True,
+            load_method=load_method,
+            pooling=pooling
+        )
+        # TODO: As the below to-do suggests, this should be updated
+        # for multi-class
+        self.num_uncertainty = 1
+        if self.num_classes > 2:
+            log.warn("UncertaintyInterface not yet implemented for multi-class"
+                     " models")
+
+    @classmethod
+    def from_model(  # type: ignore
+        cls,
+        model: tf.keras.Model,
+        layers: Optional[Union[str, List[str]]] = None,
+        wsi_normalizer: Optional["StainNormalizer"] = None,
+        pooling: Optional[Any] = None
+    ):
+        obj = cls(None, layers)
+        if isinstance(model, tf.keras.models.Model):
+            obj._model = model
+        else:
+            raise errors.ModelError(f"Model {model} is not a valid Tensorflow "
+                                    "model.")
+        obj._build(
+            layers=layers, include_preds=True, pooling=pooling  # type: ignore
+        )
+        obj.wsi_normalizer = wsi_normalizer
+        return obj
+
+    def __repr__(self):
+        return ("{}(\n".format(self.__class__.__name__) +
+                "    path={!r},\n".format(self.path) +
+                "    layers={!r},\n".format(self.layers) +
+                "    pooling={!r},\n".format(self._pooling) +
+                ")")
+
+    @tf.function
+    def _predict(self, inp):
+        """Return activations (mean), predictions (mean), and uncertainty
+        (stdev) for a single batch of images."""
+
+        out_drop = [[] for _ in range(self.num_outputs)]
+        for _ in range(30):
+            yp = self.model(inp, training=False)
+            for n in range(self.num_outputs):
+                out_drop[n] += [(yp[n] if self.num_outputs > 1 else yp)]
+        for n in range(self.num_outputs):
+            out_drop[n] = tf.stack(out_drop[n], axis=0)
+        predictions = tf.math.reduce_mean(out_drop[-1], axis=0)
+
+        # TODO: Only takes STDEV from first outcome category which works for
+        # outcomes with 2 categories, but a better solution is needed
+        # for num_categories > 2
+        uncertainty = tf.math.reduce_std(out_drop[-1], axis=0)[:, 0]
+        uncertainty = tf.expand_dims(uncertainty, axis=-1)
+
+        if self.num_outputs > 1:
+            out = [
+                tf.math.reduce_mean(out_drop[n], axis=0)
+                for n in range(self.num_outputs-1)
+            ]
+            return out + [predictions, uncertainty]
+        else:
+            return predictions, uncertainty
+
+    def dump_config(self):
+        return {
+            'class': 'slideflow.model.tensorflow.UncertaintyInterface',
+            'kwargs': {
+                'path': self.path,
+                'layers': self.layers,
+                'pooling': self._pooling
+            }
+        }
+
+
[docs]def load( + path: str, + method: str = 'weights', + custom_objects: Optional[Dict[str, Any]] = None, + training: bool = False +) -> tf.keras.models.Model: + """Load a model trained with Slideflow. + + Args: + path (str): Path to saved model. Must be a model trained in Slideflow. + method (str): Method to use when loading the model; either 'full' or + 'weights'. If 'full', will load the saved model with + ``tf.keras.models.load_model()``. If 'weights', will read the + ``params.json`` configuration file, build the model architecture, + and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + custom_objects (dict, Optional): Dictionary mapping names + (strings) to custom classes or functions. Defaults to None. + + Returns: + tf.keras.models.Model: Loaded model. + """ + if method not in ('full', 'weights'): + raise ValueError(f"Unrecognized method {method}, expected " + "either 'full' or 'weights'") + log.debug(f"Loading model with method='{method}'") + if method == 'full': + return tf.keras.models.load_model(path, custom_objects=custom_objects) + else: + config = sf.util.get_model_config(path) + hp = ModelParams.from_dict(config['hp']) + if len(config['outcomes']) == 1 or config['model_type'] == 'regression': + num_classes = len(list(config['outcome_labels'].keys())) + else: + num_classes = { + outcome: len(list(config['outcome_labels'][outcome].keys())) + for outcome in config['outcomes'] + } # type: ignore + if config['model_type'] == 'survival': + survival_kw = dict(training=training) + else: + survival_kw = dict() + model = hp.build_model( # type: ignore + num_classes=num_classes, + num_slide_features=0 if not config['input_feature_sizes'] else sum(config['input_feature_sizes']), + pretrain=None, + **survival_kw + ) + model.load_weights(join(path, 'variables/variables')) + return model
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/tensorflow_utils/index.html b/docs/_modules/slideflow/model/tensorflow_utils/index.html new file mode 100644 index 000000000..888446040 --- /dev/null +++ b/docs/_modules/slideflow/model/tensorflow_utils/index.html @@ -0,0 +1,1056 @@ + + + + + + + + + + + + slideflow.model.tensorflow_utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.tensorflow_utils

+"""Tensorflow model utility functions."""
+
+import os
+import tempfile
+from typing import (TYPE_CHECKING, Any, Dict, List, Tuple, Union, Optional,
+                    Callable)
+
+import numpy as np
+import slideflow as sf
+from pandas.core.frame import DataFrame
+from slideflow.stats import df_from_pred
+from slideflow.util import log, ImgBatchSpeedColumn
+from rich.progress import Progress, TimeElapsedColumn, SpinnerColumn
+
+import tensorflow as tf
+
+if TYPE_CHECKING:
+    import neptune.new as neptune
+
+# -----------------------------------------------------------------------------
+
+def log_summary(
+    model: tf.keras.Model,
+    neptune_run: "neptune.Run" = None
+) -> None:
+    """Log the model summary.
+
+    Args:
+        model (tf.keras.Model): Tensorflow/Keras model.
+        neptune_run (neptune.Run, optional): Neptune run. Defaults to None.
+    """
+    if sf.getLoggingLevel() <= 20:
+        print()
+        model.summary()
+    if neptune_run:
+        summary_string = []
+        model.summary(print_fn=lambda x: summary_string.append(x))
+        neptune_run['summary'] = "\n".join(summary_string)
+
+
+def get_layer_index_by_name(model: tf.keras.Model, name: str) -> int:
+    for i, layer in enumerate(model.layers):
+        if layer.name == name:
+            return i
+    raise IndexError(f"Layer {name} not found.")
+
+
+def batch_loss_crossentropy(
+    features: tf.Tensor,
+    diff: float = 0.5,
+    eps: float = 1e-5
+) -> tf.Tensor:
+    split = tf.split(features, 8, axis=0)
+
+    def tstat(first, rest):
+        first_mean = tf.math.reduce_mean(first, axis=0)
+        rest_mean = tf.math.reduce_mean(rest, axis=0)
+
+        # Variance
+        A = tf.math.reduce_sum(tf.math.square(first - first_mean), axis=0) / (first_mean.shape[0] - 1)
+        B = tf.math.reduce_sum(tf.math.square(rest - rest_mean), axis=0) / (rest_mean.shape[0] - 1)
+
+        # Not performing square root of SE for computational reasons
+        se = tf.math.sqrt((A / first_mean.shape[0]) + (B / rest_mean.shape[0]))
+        t_square = tf.math.square((first_mean - rest_mean - diff) / se)
+        return tf.math.reduce_mean(t_square)
+
+    stats = [
+        tstat(
+            split[n],
+            tf.concat([
+                sp for i, sp in enumerate(split)
+                if i != n
+            ], axis=0))
+        for n in range(len(split))
+    ]
+    return tf.math.reduce_mean(tf.stack(stats)) * eps
+
+
+def negative_log_likelihood(y_true: tf.Tensor, y_pred: tf.Tensor) -> tf.Tensor:
+    """Negative log likelihood loss.
+
+    Implemented by Fred Howard, adapted from
+    https://github.com/havakv/pycox/blob/master/pycox/models/loss.py
+
+    Args:
+        y_true (tf.Tensor): True labels.
+        y_pred (tf.Tensor): Predictions.
+
+    Returns:
+        tf.Tensor: Loss.
+    """
+    events = tf.reshape(y_pred[:, -1], [-1])  # E
+    pred_hr = tf.reshape(y_pred[:, 0], [-1])  # y_pred
+    time = tf.reshape(y_true, [-1])           # y_true
+
+    order = tf.argsort(time)  # direction='DESCENDING'
+    sorted_events = tf.gather(events, order)            # pylint: disable=no-value-for-parameter
+    sorted_predictions = tf.gather(pred_hr, order)      # pylint: disable=no-value-for-parameter
+
+    # Finds maximum HR in predictions
+    gamma = tf.math.reduce_max(sorted_predictions)
+
+    # Small constant value
+    eps = tf.constant(1e-7, dtype=tf.float32)
+
+    log_cumsum_h = tf.math.add(
+                    tf.math.log(
+                        tf.math.add(
+                            tf.math.cumsum(             # pylint: disable=no-value-for-parameter
+                                tf.math.exp(
+                                    tf.math.subtract(sorted_predictions, gamma))),
+                            eps)),
+                    gamma)
+
+    neg_likelihood = -tf.math.divide(
+                        tf.reduce_sum(
+                            tf.math.multiply(
+                                tf.subtract(sorted_predictions, log_cumsum_h),
+                                sorted_events)),
+                        tf.reduce_sum(sorted_events))
+
+    return neg_likelihood
+
+
+def negative_log_likelihood_breslow(
+    y_true: tf.Tensor,
+    y_pred: tf.Tensor
+) -> tf.Tensor:
+    """Negative log likelihood loss, Breslow approximation.
+
+    Args:
+        y_true (tf.Tensor): True labels.
+        y_pred (tf.Tensor): Predictions.
+
+    Returns:
+        tf.Tensor: Breslow loss.
+    """
+    events = tf.reshape(y_pred[:, -1], [-1])
+    pred = tf.reshape(y_pred[:, 0], [-1])
+    time = tf.reshape(y_true, [-1])
+
+    order = tf.argsort(time, direction='DESCENDING')
+    sorted_time = tf.gather(time, order)                # pylint: disable=no-value-for-parameter
+    sorted_events = tf.gather(events, order)            # pylint: disable=no-value-for-parameter
+    sorted_pred = tf.gather(pred, order)                # pylint: disable=no-value-for-parameter
+
+    Y_hat_c = sorted_pred
+    Y_label_T = sorted_time
+    Y_label_E = sorted_events
+    Obs = tf.reduce_sum(Y_label_E)
+
+    # numerical stability
+    amax = tf.reduce_max(Y_hat_c)
+    Y_hat_c_shift = tf.subtract(Y_hat_c, amax)
+    # Y_hat_c_shift = tf.debugging.check_numerics(Y_hat_c_shift, message="checking y_hat_c_shift")
+    Y_hat_hr = tf.exp(Y_hat_c_shift)
+    Y_hat_cumsum = tf.math.log(tf.cumsum(Y_hat_hr)) + amax  # pylint: disable=no-value-for-parameter
+
+    unique_values, segment_ids = tf.unique(Y_label_T)
+    loss_s2_v = tf.math.segment_max(Y_hat_cumsum, segment_ids)
+    loss_s2_count = tf.math.segment_sum(Y_label_E, segment_ids)
+
+    loss_s2 = tf.reduce_sum(tf.multiply(loss_s2_v, loss_s2_count))
+    loss_s1 = tf.reduce_sum(tf.multiply(Y_hat_c, Y_label_E))
+    loss_breslow = tf.divide(tf.subtract(loss_s2, loss_s1), Obs)
+    return loss_breslow
+
+
+def concordance_index(y_true: tf.Tensor, y_pred: tf.Tensor) -> tf.Tensor:
+    """Calculate concordance index (C-index).
+
+    Args:
+        y_true (tf.Tensor): True labels.
+        y_pred (tf.Tensor): Predictions.
+
+    Returns:
+        tf.Tensor: Concordance index.
+    """
+    E = y_pred[:, -1]
+    y_pred = y_pred[:, :-1]
+    E = tf.reshape(E, [-1])
+    y_pred = tf.reshape(y_pred, [-1])
+    y_pred = -y_pred  # negative of log hazard ratio to have correct relationship with survival
+    g = tf.subtract(tf.expand_dims(y_pred, -1), y_pred)
+    g = tf.cast(g == 0.0, tf.float32) * 0.5 + tf.cast(g > 0.0, tf.float32)
+    f = tf.subtract(tf.expand_dims(y_true, -1), y_true) > 0.0
+    event = tf.multiply(tf.transpose(E), E)
+    f = tf.multiply(tf.cast(f, tf.float32), event)
+    f = tf.compat.v1.matrix_band_part(tf.cast(f, tf.float32), -1, 0)
+    g = tf.reduce_sum(tf.multiply(g, f))
+    f = tf.reduce_sum(f)
+    return tf.where(tf.equal(f, 0), 0.0, g/f)
+
+
+def add_regularization(
+    model: tf.keras.Model,
+    regularizer: tf.keras.layers.Layer
+) -> tf.keras.Model:
+    '''Adds regularization (e.g. L2) to all eligible layers of a model.
+    This function is from "https://sthalles.github.io/keras-regularizer/" '''
+
+    if not isinstance(regularizer, tf.keras.regularizers.Regularizer):
+        print('Regularizer must be a subclass of tf.keras.regularizers.Regularizer')
+        return model
+
+    for layer in model.layers:
+        for attr in ['kernel_regularizer']:
+            if hasattr(layer, attr):
+                setattr(layer, attr, regularizer)
+
+    # When we change the layers attributes, the change only happens in the model config file
+    model_json = model.to_json()
+
+    # Save the weights before reloading the model.
+    tmp_weights_path = os.path.join(tempfile.gettempdir(), 'tmp_weights.h5')
+    model.save_weights(tmp_weights_path)
+
+    # load the model from the config
+    model = tf.keras.models.model_from_json(model_json)
+
+    # Reload the model weights
+    model.load_weights(tmp_weights_path, by_name=True)
+    return model
+
+
+def get_uq_predictions(
+    img: tf.Tensor,
+    pred_fn: tf.keras.Model,
+    num_outcomes: Optional[int] = None,
+    uq_n: int = 30
+) -> Tuple[tf.Tensor, tf.Tensor, int]:
+    if not num_outcomes:
+        yp_drop = {}  # type: Union[List[Any], Dict[int, List]]
+    else:
+        yp_drop = {n: [] for n in range(num_outcomes)}
+    for _ in range(uq_n):
+        yp = pred_fn(img, training=False)
+        if not num_outcomes:
+            num_outcomes = 1 if not isinstance(yp, list) else len(yp)
+            yp_drop = {n: [] for n in range(num_outcomes)}
+        if num_outcomes > 1:
+            for o in range(num_outcomes):
+                yp_drop[o] += [yp[o]]
+        else:
+            yp_drop[0] += [yp]
+    if num_outcomes > 1:
+        yp_drop = [tf.stack(yp_drop[n], axis=0) for n in range(num_outcomes)]
+        yp_mean = [tf.math.reduce_mean(yp_drop[n], axis=0) for n in range(num_outcomes)]
+        yp_std = [tf.math.reduce_std(yp_drop[n], axis=0) for n in range(num_outcomes)]
+    else:
+        yp_drop = tf.stack(yp_drop[0], axis=0)
+        yp_mean = tf.math.reduce_mean(yp_drop, axis=0)
+        yp_std = tf.math.reduce_std(yp_drop, axis=0)
+    return yp_mean, yp_std, num_outcomes
+
+
+
[docs]def unwrap( + model: tf.keras.models.Model +) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: + """Unwraps a Tensorflow model built in Slideflow, returning the + input tensor, post-convolutional output tensor, and final model output + tensor. + + Args: + model (tf.keras.models.Model): Model built with Slideflow. + + Returns: + A tuple containing + + tf.Tensor: Input tensor. + + tf.Tensor: Post-convolutional layer output tensor. + + tf.Tensor: Final model output tensor. + """ + submodel = model.layers[1] + x = submodel.outputs[0] + postconv = x + for layer_index in range(2, len(model.layers)): + extracted_layer = model.layers[layer_index] + x = extracted_layer(x) + + return submodel.inputs, postconv, x
+ + +
[docs]def flatten( + model: tf.keras.models.Model +) -> tf.keras.models.Model: + """Unwrapped then flattens a Tensorflow model.""" + + inputs, _, outputs = unwrap(model) + return tf.keras.models.Model(inputs=inputs, outputs=outputs)
+ + +def eval_from_model( + model: "tf.keras.Model", + dataset: "tf.data.Dataset", + model_type: Optional[str], + loss: Optional[Callable], + num_tiles: int = 0, + uq: bool = False, + uq_n: int = 30, + steps: Optional[int] = None, + predict_only: bool = False, + pb_label: str = "Evaluating...", + verbosity: str = 'full', +) -> Tuple[DataFrame, float, float]: + """Evaluates predictions (y_true, y_pred, tile_to_slide) from a given + Tensorflow model and dataset, generating predictions, accuracy, and loss. + + Args: + model (str): Path to Tensorflow model. + dataset (tf.data.Dataset): Tensorflow dataset. + model_type (str, optional): 'classification', 'regression', or 'survival'. + Will not attempt to calculate accuracy for non-classification models. + Defaults to 'classification'. + loss (Callable, optional): Loss function which accepts (y_true, y_pred). + + Keyword args: + num_tiles (int, optional): Used for progress bar. Defaults to 0. + uq (bool, optional): Perform uncertainty quantification with dropout. + Defaults to False. + uq_n (int, optional): Number of per-tile inferences to perform is + calculating uncertainty via dropout. + steps (int, optional): Number of steps (batches) of evaluation to + perform. If None, uses the full dataset. Defaults to None. + predict_only (bool, optional): Only generate predictions without + comparisons to y_true. Defaults to False. + pb_label (str, optional): Progress bar label. + Defaults to "Evaluating..." + verbosity (str, optional): Either 'full', 'quiet', or 'silent'. + Verbosity for progress bars. + + Returns: + pd.DataFrame, accuracy, loss + """ + + if verbosity not in ('silent', 'quiet', 'full'): + raise ValueError(f"Invalid value '{verbosity}' for argument 'verbosity'") + + @tf.function + def get_predictions(img, training=False): + return model(img, training=training) + + y_true, y_pred, tile_to_slides, locations, y_std = [], [], [], [], [] + num_vals, num_batches, num_outcomes, running_loss = 0, 0, 0, 0 + batch_size = 0 + loc_missing = False + + is_cat = (model_type == 'classification') + if not is_cat: + acc = None + + if verbosity != 'silent': + pb = Progress(SpinnerColumn(), transient=True) + pb.add_task(pb_label, total=None) + pb.start() + else: + pb = None + try: + for step, batch in enumerate(dataset): + if steps is not None and step >= steps: + break + + # --- Detect data structure, if this is the first batch --------------- + if not batch_size: + if len(batch) not in (3, 5): + raise IndexError( + "Unexpected number of items returned from dataset batch. " + f"Expected either '3' or '5', got: {len(batch)}") + + incl_loc = (len(batch) == 5) + batch_size = batch[2].shape[0] + if verbosity != 'silent': + pb.stop() + pb = Progress( + SpinnerColumn(), + *Progress.get_default_columns(), + TimeElapsedColumn(), + ImgBatchSpeedColumn(), + transient=sf.getLoggingLevel()>20 or verbosity == 'quiet') + task = pb.add_task( + pb_label, + total=num_tiles if not steps else steps*batch_size) + pb.start() + # --------------------------------------------------------------------- + + if incl_loc: + img, yt, slide, loc_x, loc_y = batch + if not loc_missing and loc_x is None: + log.warning("TFrecord location information not found.") + loc_missing = True + elif not loc_missing: + locations += [tf.stack([loc_x, loc_y], axis=-1).numpy()] # type: ignore + else: + img, yt, slide = batch + + if verbosity != 'silent': + pb.advance(task, slide.shape[0]) + tile_to_slides += [_byte.decode('utf-8') for _byte in slide.numpy()] + num_vals += slide.shape[0] + num_batches += 1 + + if uq: + yp, yp_std, num_outcomes = get_uq_predictions( + img, get_predictions, num_outcomes, uq_n + ) + y_pred += [yp] + y_std += [yp_std] # type: ignore + else: + yp = get_predictions(img, training=False) + y_pred += [yp] + + if not predict_only: + if isinstance(yt, dict): + y_true += [[yt[f'out-{o}'].numpy() for o in range(len(yt))]] + yt = [yt[f'out-{o}'] for o in range(len(yt))] + if loss is not None: + loss_val = [loss(yt[i], yp[i]) for i in range(len(yt))] + loss_val = [tf.boolean_mask(l, tf.math.is_finite(l)) for l in loss_val] + batch_loss = tf.math.reduce_sum(loss_val).numpy() + running_loss = (((num_vals - slide.shape[0]) * running_loss) + batch_loss) / num_vals + else: + y_true += [yt.numpy()] + if loss is not None: + loss_val = loss(yt, yp) + if tf.rank(loss_val): + # Loss is a vector + is_finite = tf.math.is_finite(loss_val) + batch_loss = tf.math.reduce_sum(tf.boolean_mask(loss_val, is_finite)).numpy() + else: + # Loss is a scalar + batch_loss = loss_val.numpy() # type: ignore + running_loss = (((num_vals - slide.shape[0]) * running_loss) + batch_loss) / num_vals + except KeyboardInterrupt: + if pb is not None: + pb.stop() + raise + + if verbosity != 'silent': + pb.stop() + + if y_pred == []: + raise ValueError("Insufficient data for evaluation.") + + if isinstance(y_pred[0], list): + # Concatenate predictions for each outcome + y_pred = [np.concatenate(yp) for yp in zip(*y_pred)] + if uq: + y_std = [np.concatenate(ys) for ys in zip(*y_std)] # type: ignore + else: + y_pred = [np.concatenate(y_pred)] + if uq: + y_std = [np.concatenate(y_std)] + + if not predict_only and isinstance(y_true[0], list): + # Concatenate y_true for each outcome + y_true = [np.concatenate(yt) for yt in zip(*y_true)] + if is_cat: + acc = [ + np.sum(y_true[i] == np.argmax(y_pred[i], axis=1)) / num_vals + for i in range(len(y_true)) + ] + elif not predict_only: + y_true = [np.concatenate(y_true)] + if is_cat: + acc = np.sum(y_true[0] == np.argmax(y_pred[0], axis=1)) / num_vals + else: + y_true = None # type: ignore + + if locations != []: + locations = np.concatenate(locations) + else: + locations = None # type: ignore + if not uq: + y_std = None # type: ignore + + # Create pandas DataFrame from arrays + df = df_from_pred(y_true, y_pred, y_std, tile_to_slides, locations) + + # Note that Keras loss during training includes regularization losses, + # so this loss will not match validation loss calculated during training + log.debug("Evaluation complete.") + return df, acc, running_loss # type: ignore + + +def predict_from_model( + model: "tf.keras.Model", + dataset: "tf.data.Dataset", + pb_label: str = "Predicting...", + **kwargs +) -> DataFrame: + """Generate a DataFrame of predictions from a model. + + Args: + model (str): Path to Tensorflow model. + dataset (tf.data.Dataset): Tensorflow dataset. + + Keyword args: + num_tiles (int, optional): Used for progress bar. Defaults to 0. + uq (bool, optional): Perform uncertainty quantification with dropout. + Defaults to False. + uq_n (int, optional): Number of per-tile inferences to perform is + calculating uncertainty via dropout. + steps (int, optional): Number of steps (batches) of evaluation to + perform. If None, uses the full dataset. Defaults to None. + pb_label (str, optional): Progress bar label. + Defaults to "Predicting..." + verbosity (str, optional): Either 'full', 'quiet', or 'silent'. + Verbosity for progress bars. + + Returns: + pd.DataFrame + """ + df, _, _ = eval_from_model( + model, + dataset, + model_type=None, + loss=None, + predict_only=True, + pb_label=pb_label, + **kwargs + ) + return df + +# ----------------------------------------------------------------------------- + +class CosineAnnealer: + + def __init__(self, start, end, steps): + self.start = start + self.end = end + self.steps = steps + self.n = 0 + + def step(self): + self.n += 1 + cos = np.cos(np.pi * (self.n / self.steps)) + 1 + return self.end + (self.start - self.end) / 2. * cos + + +class OneCycleScheduler(tf.keras.callbacks.Callback): + """ `Callback` that schedules the learning rate on a 1cycle policy as per Leslie Smith's paper(https://arxiv.org/pdf/1803.09820.pdf). + If the model supports a momentum parameter, it will also be adapted by the schedule. + The implementation adopts additional improvements as per the fastai library: https://docs.fast.ai/callbacks.one_cycle.html, where + only two phases are used and the adaptation is done using cosine annealing. + In phase 1 the LR increases from `lr_max / div_factor` to `lr_max` and momentum decreases from `mom_max` to `mom_min`. + In the second phase the LR decreases from `lr_max` to `lr_max / (div_factor * 1e4)` and momemtum from `mom_max` to `mom_min`. + By default the phases are not of equal length, with the phase 1 percentage controlled by the parameter `phase_1_pct`. + """ + + def __init__(self, lr_max, steps, mom_min=0.85, mom_max=0.95, phase_1_pct=0.3, div_factor=25.): + super(OneCycleScheduler, self).__init__() + lr_min = lr_max / div_factor + final_lr = lr_max / (div_factor * 1e4) + phase_1_steps = steps * phase_1_pct + phase_2_steps = steps - phase_1_steps + + self.phase_1_steps = phase_1_steps + self.phase_2_steps = phase_2_steps + self.phase = 0 + self.step = 0 + + self.phases = [[CosineAnnealer(lr_min, lr_max, phase_1_steps), CosineAnnealer(mom_max, mom_min, phase_1_steps)], + [CosineAnnealer(lr_max, final_lr, phase_2_steps), CosineAnnealer(mom_min, mom_max, phase_2_steps)]] + + self.lrs = [] + self.moms = [] + + def on_train_begin(self, logs=None): + self.phase = 0 + self.step = 0 + + self.set_lr(self.lr_schedule().start) + self.set_momentum(self.mom_schedule().start) + + def on_train_batch_begin(self, batch, logs=None): + self.lrs.append(self.get_lr()) + self.moms.append(self.get_momentum()) + + def on_train_batch_end(self, batch, logs=None): + self.step += 1 + if self.step >= self.phase_1_steps: + self.phase = 1 + + self.set_lr(self.lr_schedule().step()) + self.set_momentum(self.mom_schedule().step()) + + def get_lr(self): + try: + return tf.keras.backend.get_value(self.model.optimizer.lr) + except AttributeError: + return None + + def get_momentum(self): + try: + return tf.keras.backend.get_value(self.model.optimizer.momentum) + except AttributeError: + return None + + def set_lr(self, lr): + try: + tf.keras.backend.set_value(self.model.optimizer.lr, lr) + except AttributeError: + pass # ignore + + def set_momentum(self, mom): + try: + tf.keras.backend.set_value(self.model.optimizer.momentum, mom) + except AttributeError: + pass # ignore + + def lr_schedule(self): + return self.phases[self.phase][0] + + def mom_schedule(self): + return self.phases[self.phase][1] + + def plot(self): + import matplotlib.pyplot as plt + ax = plt.subplot(1, 2, 1) + ax.plot(self.lrs) + ax.set_title('Learning Rate') + ax = plt.subplot(1, 2, 2) + ax.plot(self.moms) + ax.set_title('Momentum') + +# ----------------------------------------------------------------------------- + +def build_uq_model(model, n_repeat=30): + """Rebuild a dropout-based UQ model to return predictions and uncertainties.""" + layers = [l for l in model.layers] + n_dim = model.layers[2].output.shape[1] + n_out = model.output.shape[1] + log.info("Building UQ model with n_repeat={} (n_dim={}, n_out={})".format( + n_repeat, n_dim, n_out + )) + new_layers = (layers[0:3] + + [tf.keras.layers.RepeatVector(n_repeat), + tf.keras.layers.Lambda(lambda x: tf.reshape(x, (-1, n_dim)))] + + layers[3:] + + [tf.keras.layers.Lambda(lambda x: tf.reshape(x, (-1, n_repeat, n_out)))]) + new_core = tf.keras.models.Sequential(new_layers) + yp_mean = tf.math.reduce_mean(new_core.output, axis=1) + yp_std = tf.math.reduce_std(new_core.output, axis=1) + uq_model = tf.keras.models.Model(inputs=new_core.input, outputs=[yp_mean, yp_std]) + return uq_model +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/model/torch/index.html b/docs/_modules/slideflow/model/torch/index.html new file mode 100644 index 000000000..51cc1fa53 --- /dev/null +++ b/docs/_modules/slideflow/model/torch/index.html @@ -0,0 +1,3097 @@ + + + + + + + + + + + + slideflow.model.torch — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.model.torch

+'''PyTorch backend for the slideflow.model submodule.'''
+
+import inspect
+import json
+import os
+import types
+import numpy as np
+import multiprocessing as mp
+import pandas as pd
+import torch
+import torchvision
+
+from torch import Tensor
+from torch.nn.functional import softmax
+from packaging import version
+from rich.progress import Progress, TimeElapsedColumn
+from collections import defaultdict
+from os.path import join
+from pandas.api.types import is_float_dtype, is_integer_dtype
+from typing import (TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Tuple,
+                    Union, Callable)
+
+import slideflow as sf
+import slideflow.util.neptune_utils
+from slideflow import errors
+from slideflow.model import base as _base
+from slideflow.model import torch_utils
+from slideflow.model.torch_utils import autocast
+from slideflow.model.base import log_manifest, BaseFeatureExtractor
+from slideflow.util import log, NormFit, ImgBatchSpeedColumn, no_scope
+
+if TYPE_CHECKING:
+    import pandas as pd
+    from slideflow.norm import StainNormalizer
+
+
+class LinearBlock(torch.nn.Module):
+    '''Block module that includes a linear layer -> ReLU -> BatchNorm'''
+
+    def __init__(
+        self,
+        in_ftrs: int,
+        out_ftrs: int,
+        dropout: Optional[float] = None
+    ) -> None:
+        super().__init__()
+        self.in_ftrs = in_ftrs
+        self.out_ftrs = out_ftrs
+        self.linear = torch.nn.Linear(in_ftrs, out_ftrs)
+        self.relu = torch.nn.ReLU(inplace=True)
+        self.bn = torch.nn.BatchNorm1d(out_ftrs)
+        if dropout:
+            self.dropout = torch.nn.Dropout(dropout)
+        else:
+            self.dropout = None  # type: ignore
+
+    def forward(self, x: Tensor) -> Tensor:
+        x = self.linear(x)
+        x = self.relu(x)
+        x = self.bn(x)
+        if self.dropout is not None:
+            x = self.dropout(x)
+        return x
+
+
+class ModelWrapper(torch.nn.Module):
+    '''Wrapper for PyTorch modules to support multiple outcomes, clinical
+    (patient-level) inputs, and additional hidden layers.'''
+
+    def __init__(
+        self,
+        model: Any,
+        n_classes: List[int],
+        num_slide_features: int = 0,
+        hidden_layers: Optional[List[int]] = None,
+        drop_images: bool = False,
+        dropout: Optional[float] = None,
+        include_top: bool = True
+    ) -> None:
+        super().__init__()
+        self.model = model
+        self.n_classes = len(n_classes)
+        self.drop_images = drop_images
+        self.num_slide_features = num_slide_features
+        self.num_hidden_layers = 0 if not hidden_layers else len(hidden_layers)
+        self.has_aux = False
+        log.debug(f'Model class name: {model.__class__.__name__}')
+        if not drop_images:
+            # Check for auxillary classifier
+            if model.__class__.__name__ in ('Inception3',):
+                log.debug("Auxillary classifier detected")
+                self.has_aux = True
+
+            # Get the last linear layer prior to the logits layer
+            if model.__class__.__name__ in ('Xception', 'NASNetALarge'):
+                num_ftrs = self.model.last_linear.in_features
+                self.model.last_linear = torch.nn.Identity()
+            elif model.__class__.__name__ in ('SqueezeNet'):
+                num_ftrs = 1000
+            elif hasattr(self.model, 'classifier'):
+                children = list(self.model.classifier.named_children())
+                if len(children):
+                    # VGG, AlexNet
+                    if include_top:
+                        log.debug("Including existing fully-connected "
+                                  "top classifier layers")
+                        last_linear_name, last_linear = children[-1]
+                        num_ftrs = last_linear.in_features
+                        setattr(
+                            self.model.classifier,
+                            last_linear_name,
+                            torch.nn.Identity()
+                        )
+                    elif model.__class__.__name__ in ('AlexNet',
+                                                      'MobileNetV2',
+                                                      'MNASNet'):
+                        log.debug("Removing fully-connected classifier layers")
+                        _, first_classifier = children[1]
+                        num_ftrs = first_classifier.in_features
+                        self.model.classifier = torch.nn.Identity()
+                    elif model.__class__.__name__ in ('VGG', 'MobileNetV3'):
+                        log.debug("Removing fully-connected classifier layers")
+                        _, first_classifier = children[0]
+                        num_ftrs = first_classifier.in_features
+                        self.model.classifier = torch.nn.Identity()
+                else:
+                    num_ftrs = self.model.classifier.in_features
+                    self.model.classifier = torch.nn.Identity()
+            elif hasattr(self.model, 'fc'):
+                num_ftrs = self.model.fc.in_features
+                self.model.fc = torch.nn.Identity()
+            elif hasattr(self.model, 'out_features'):
+                num_ftrs = self.model.out_features
+            elif hasattr(self.model, 'head'):
+                num_ftrs = self.model.head.out_features
+            else:
+                print(self.model)
+                raise errors.ModelError("Unable to find last linear layer for "
+                                        f"model {model.__class__.__name__}")
+        else:
+            num_ftrs = 0
+
+        # Add slide-level features
+        num_ftrs += num_slide_features
+
+        # Add hidden layers
+        if hidden_layers:
+            hl_ftrs = [num_ftrs] + hidden_layers
+            for i in range(len(hidden_layers)):
+                setattr(self, f'h{i}', LinearBlock(hl_ftrs[i],
+                                                   hl_ftrs[i+1],
+                                                   dropout=dropout))
+            num_ftrs = hidden_layers[-1]
+
+        # Add the outcome/logits layers for each outcome, if multiple outcomes
+        for i, n in enumerate(n_classes):
+            setattr(self, f'fc{i}', torch.nn.Linear(num_ftrs, n))
+
+    def __getattr__(self, name: str) -> Any:
+        try:
+            return super().__getattr__(name)
+        except AttributeError as e:
+            if name == 'model':
+                raise e
+            return getattr(self.model, name)
+
+    def forward(
+        self,
+        img: Tensor,
+        slide_features: Optional[Tensor] = None
+    ):
+        if slide_features is None and self.num_slide_features:
+            raise ValueError("Expected 2 inputs, got 1")
+
+        # Last linear of core convolutional model
+        if not self.drop_images:
+            x = self.model(img)
+
+        # Discard auxillary classifier
+        if self.has_aux and self.training:
+            x = x.logits
+
+        # Merging image data with any slide-level input data
+        if self.num_slide_features and not self.drop_images:
+            assert slide_features is not None
+            x = torch.cat([x, slide_features], dim=1)
+        elif self.num_slide_features:
+            x = slide_features
+
+        # Hidden layers
+        if self.num_hidden_layers:
+            x = self.h0(x)
+        if self.num_hidden_layers > 1:
+            for h in range(1, self.num_hidden_layers):
+                x = getattr(self, f'h{h}')(x)
+
+        # Return a list of outputs if we have multiple outcomes
+        if self.n_classes > 1:
+            out = [getattr(self, f'fc{i}')(x) for i in range(self.n_classes)]
+
+        # Otherwise, return the single output
+        else:
+            out = self.fc0(x)
+
+        return out  # , x
+
+
+
[docs]class ModelParams(_base._ModelParams): + """Build a set of hyperparameters.""" + + ModelDict = { + 'resnet18': torchvision.models.resnet18, + 'resnet50': torchvision.models.resnet50, + 'alexnet': torchvision.models.alexnet, + 'squeezenet': torchvision.models.squeezenet.squeezenet1_1, + 'densenet': torchvision.models.densenet161, + 'inception': torchvision.models.inception_v3, + 'googlenet': torchvision.models.googlenet, + 'shufflenet': torchvision.models.shufflenet_v2_x1_0, + 'resnext50_32x4d': torchvision.models.resnext50_32x4d, + 'vgg16': torchvision.models.vgg16, # needs support added + 'mobilenet_v2': torchvision.models.mobilenet_v2, + 'mobilenet_v3_small': torchvision.models.mobilenet_v3_small, + 'mobilenet_v3_large': torchvision.models.mobilenet_v3_large, + 'wide_resnet50_2': torchvision.models.wide_resnet50_2, + 'mnasnet': torchvision.models.mnasnet1_0, + 'xception': torch_utils.xception, + 'nasnet_large': torch_utils.nasnetalarge + } + + def __init__(self, *, loss: str = 'CrossEntropy', **kwargs) -> None: + self.OptDict = { + 'Adadelta': torch.optim.Adadelta, + 'Adagrad': torch.optim.Adagrad, + 'Adam': torch.optim.Adam, + 'AdamW': torch.optim.AdamW, + 'SparseAdam': torch.optim.SparseAdam, + 'Adamax': torch.optim.Adamax, + 'ASGD': torch.optim.ASGD, + 'LBFGS': torch.optim.LBFGS, + 'RMSprop': torch.optim.RMSprop, + 'Rprop': torch.optim.Rprop, + 'SGD': torch.optim.SGD + } + self.RegressionLossDict = { + 'L1': torch.nn.L1Loss, + 'MSE': torch.nn.MSELoss, + 'NLL': torch.nn.NLLLoss, # negative log likelihood + 'HingeEmbedding': torch.nn.HingeEmbeddingLoss, + 'SmoothL1': torch.nn.SmoothL1Loss, + 'CosineEmbedding': torch.nn.CosineEmbeddingLoss, + } + self.AllLossDict = { + 'CrossEntropy': torch.nn.CrossEntropyLoss, + 'CTC': torch.nn.CTCLoss, + 'PoissonNLL': torch.nn.PoissonNLLLoss, + 'GaussianNLL': torch.nn.GaussianNLLLoss, + 'KLDiv': torch.nn.KLDivLoss, + 'BCE': torch.nn.BCELoss, + 'BCEWithLogits': torch.nn.BCEWithLogitsLoss, + 'MarginRanking': torch.nn.MarginRankingLoss, + 'MultiLabelMargin': torch.nn.MultiLabelMarginLoss, + 'Huber': torch.nn.HuberLoss, + 'SoftMargin': torch.nn.SoftMarginLoss, + 'MultiLabelSoftMargin': torch.nn.MultiLabelSoftMarginLoss, + 'MultiMargin': torch.nn.MultiMarginLoss, + 'TripletMargin': torch.nn.TripletMarginLoss, + 'TripletMarginWithDistance': torch.nn.TripletMarginWithDistanceLoss, + 'L1': torch.nn.L1Loss, + 'MSE': torch.nn.MSELoss, + 'NLL': torch.nn.NLLLoss, # negative log likelihood + 'HingeEmbedding': torch.nn.HingeEmbeddingLoss, + 'SmoothL1': torch.nn.SmoothL1Loss, + 'CosineEmbedding': torch.nn.CosineEmbeddingLoss, + } + super().__init__(loss=loss, **kwargs) + assert self.model in self.ModelDict.keys() or self.model.startswith('timm_') + assert self.optimizer in self.OptDict.keys() + assert self.loss in self.AllLossDict + if self.model == 'inception': + log.warn("Model 'inception' has an auxillary classifier, which " + "is currently ignored during training. Auxillary " + "classifier loss will be included during training " + "starting in version 1.3") + + + def get_opt(self, params_to_update: Iterable) -> torch.optim.Optimizer: + return self.OptDict[self.optimizer]( + params_to_update, + lr=self.learning_rate, + weight_decay=self.l2 + ) + + def get_loss(self) -> torch.nn.modules.loss._Loss: + return self.AllLossDict[self.loss]() + + def get_model_loader(self, model: str) -> Callable: + if model in self.ModelDict: + return self.ModelDict[model] + elif model.startswith('timm_'): + + def loader(**kwargs): + try: + import timm + except ImportError: + raise ImportError(f"Unable to load model {model}; " + "timm package not installed.") + return timm.create_model(model[5:], **kwargs) + + return loader + else: + raise ValueError(f"Model {model} not found.") + + def build_model( + self, + labels: Optional[Dict] = None, + num_classes: Optional[Union[int, Dict[Any, int]]] = None, + num_slide_features: int = 0, + pretrain: Optional[str] = None, + checkpoint: Optional[str] = None + ) -> torch.nn.Module: + + assert num_classes is not None or labels is not None + if num_classes is None: + assert labels is not None + num_classes = self._detect_classes_from_labels(labels) + if not isinstance(num_classes, dict): + num_classes = {'out-0': num_classes} + + # Prepare custom model pretraining + if pretrain: + log.debug(f"Using pretraining: [green]{pretrain}") + if (isinstance(pretrain, str) + and sf.util.path_to_ext(pretrain).lower() == 'zip'): + _pretrained = pretrain + pretrain = None + else: + _pretrained = None + + # Build base model + if self.model in ('xception', 'nasnet_large'): + _model = self.get_model_loader(self.model)( + num_classes=1000, + pretrained=pretrain + ) + else: + # Compatibility logic for prior versions of PyTorch + model_fn = self.get_model_loader(self.model) + model_fn_sig = inspect.signature(model_fn) + model_kw = [ + param.name + for param in model_fn_sig.parameters.values() + if param.kind == param.POSITIONAL_OR_KEYWORD + ] + call_kw = {} + if 'image_size' in model_kw: + call_kw.update(dict(image_size=self.tile_px)) + if (version.parse(torchvision.__version__) >= version.parse("0.13") + and not self.model.startswith('timm_')): + # New Torchvision API + w = 'DEFAULT' if pretrain == 'imagenet' else pretrain + call_kw.update(dict(weights=w)) # type: ignore + else: + call_kw.update(dict(pretrained=pretrain)) # type: ignore + _model = model_fn(**call_kw) + + # Add final layers to models + hidden_layers = [ + self.hidden_layer_width + for _ in range(self.hidden_layers) + ] + model = ModelWrapper( + _model, + list(num_classes.values()), + num_slide_features, + hidden_layers, + self.drop_images, + dropout=self.dropout, + include_top=self.include_top + ) + if _pretrained is not None: + lazy_load_pretrained(model, _pretrained) + if checkpoint is not None: + model.load_state_dict(torch.load(checkpoint)) + return model + + def model_type(self) -> str: + """Returns 'regression', 'classification', or 'survival', reflecting the loss.""" + #check if loss is custom_[type] and returns type + if self.loss.startswith('custom'): + return self.loss[7:] + elif self.loss == 'NLL': + return 'survival' + elif self.loss in self.RegressionLossDict: + return 'regression' + else: + return 'classification'
+ + +
[docs]class Trainer: + """Base trainer class containing functionality for model building, input + processing, training, and evaluation. + + This base class requires categorical outcome(s). Additional outcome types + are supported by :class:`slideflow.model.RegressionTrainer` and + :class:`slideflow.model.SurvivalTrainer`. + + Slide-level (e.g. clinical) features can be used as additional model input + by providing slide labels in the slide annotations dictionary, under + the key 'input'. + """ + + _model_type = 'classification' + + def __init__( + self, + hp: ModelParams, + outdir: str, + labels: Dict[str, Any], + *, + slide_input: Optional[Dict[str, Any]] = None, + name: str = 'Trainer', + feature_sizes: Optional[List[int]] = None, + feature_names: Optional[List[str]] = None, + outcome_names: Optional[List[str]] = None, + mixed_precision: bool = True, + allow_tf32: bool = False, + config: Dict[str, Any] = None, + use_neptune: bool = False, + neptune_api: Optional[str] = None, + neptune_workspace: Optional[str] = None, + load_method: str = 'weights', + custom_objects: Optional[Dict[str, Any]] = None, + device: Optional[str] = None, + transform: Optional[Union[Callable, Dict[str, Callable]]] = None, + pin_memory: bool = True, + num_workers: int = 4, + chunk_size: int = 8 + ): + """Sets base configuration, preparing model inputs and outputs. + + Args: + hp (:class:`slideflow.ModelParams`): ModelParams object. + outdir (str): Destination for event logs and checkpoints. + labels (dict): Dict mapping slide names to outcome labels (int or + float format). + slide_input (dict): Dict mapping slide names to additional + slide-level input, concatenated after post-conv. + name (str, optional): Optional name describing the model, used for + model saving. Defaults to None. + feature_sizes (list, optional): List of sizes of input features. + Required if providing additional input features as model input. + feature_names (list, optional): List of names for input features. + Used when permuting feature importance. + outcome_names (list, optional): Name of each outcome. Defaults to + "Outcome {X}" for each outcome. + mixed_precision (bool, optional): Use FP16 mixed precision (rather + than FP32). Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + config (dict, optional): Training configuration dictionary, used + for logging and image format verification. Defaults to None. + use_neptune (bool, optional): Use Neptune API logging. + Defaults to False + neptune_api (str, optional): Neptune API token, used for logging. + Defaults to None. + neptune_workspace (str, optional): Neptune workspace. + Defaults to None. + load_method (str): Loading method to use when reading model. + This argument is ignored in the PyTorch backend, as all models + are loaded by first building the model with hyperparameters + detected in ``params.json``, then loading weights with + ``torch.nn.Module.load_state_dict()``. Defaults to + 'full' (ignored). + transform (callable or dict, optional): Optional transform to + apply to input images. If dict, must have the keys 'train' + and/or 'val', mapping to callables that takes a single + image Tensor as input and returns a single image Tensor. + If None, no transform is applied. If a single callable is + provided, it will be applied to both training and validation + data. If a dict is provided, the 'train' transform will be + applied to training data and the 'val' transform will be + applied to validation data. If a dict is provided and either + 'train' or 'val' is None, no transform will be applied to + that data. Defaults to None. + pin_memory (bool): Set the ``pin_memory`` attribute for dataloaders. + Defaults to True. + num_workers (int): Set the number of workers for dataloaders. + Defaults to 4. + chunk_size (int): Set the chunk size for TFRecord reading. + Defaults to 8. + """ + self.hp = hp + self.outdir = outdir + self.labels = labels + self.patients = dict() # type: Dict[str, str] + self.name = name + self.model = None # type: Optional[torch.nn.Module] + self.inference_model = None # type: Optional[torch.nn.Module] + self.mixed_precision = mixed_precision + self.device = torch_utils.get_device(device) + self.mid_train_val_dts: Optional[Iterable] = None + self.loss_fn: torch.nn.modules.loss._Loss + self.use_tensorboard: bool + self.writer = None # type: Optional[torch.utils.tensorboard.SummaryWriter] + self.pin_memory = pin_memory + self.num_workers = num_workers + self.chunk_size = chunk_size + self._reset_training_params() + + if custom_objects is not None: + log.warn("custom_objects argument ignored in PyTorch backend.") + + # Enable or disable Tensorflow-32 + # Allows PyTorch to internally use tf32 for matmul and convolutions + torch.backends.cuda.matmul.allow_tf32 = allow_tf32 + torch.backends.cudnn.allow_tf32 = allow_tf32 # type: ignore + self._allow_tf32 = allow_tf32 + + # Slide-level input args + if slide_input: + self.slide_input = { + k: [float(vi) for vi in v] + for k, v in slide_input.items() + } + else: + self.slide_input = None # type: ignore + self.feature_names = feature_names + self.feature_sizes = feature_sizes + self.num_slide_features = 0 if not feature_sizes else sum(feature_sizes) + + self.normalizer = self.hp.get_normalizer() + if self.normalizer: + log.info(f'Using realtime {self.hp.normalizer} normalization') + + if not os.path.exists(outdir): + os.makedirs(outdir) + + self._process_transforms(transform) + self._process_outcome_labels(outcome_names) + if isinstance(labels, pd.DataFrame): + cat_assign = self._process_category_assignments() + + # Log parameters + if config is None: + config = { + 'slideflow_version': sf.__version__, + 'backend': sf.backend(), + 'git_commit': sf.__gitcommit__, + 'model_name': self.name, + 'full_model_name': self.name, + 'outcomes': self.outcome_names, + 'model_type': self.hp.model_type(), + 'img_format': None, + 'tile_px': self.hp.tile_px, + 'tile_um': self.hp.tile_um, + 'input_features': None, + 'input_feature_sizes': None, + 'input_feature_labels': None, + 'hp': self.hp.to_dict(), + } + if isinstance(labels, pd.DataFrame): + config['outcome_labels'] = {str(k): v for k,v in cat_assign.items()} + + sf.util.write_json(config, join(self.outdir, 'params.json')) + + # Neptune logging + self.config = config + self.img_format = config['img_format'] if 'img_format' in config else None + self.use_neptune = use_neptune + self.neptune_run = None + if self.use_neptune: + if neptune_api is None or neptune_workspace is None: + raise ValueError("If using Neptune, must supply neptune_api" + " and neptune_workspace.") + self.neptune_logger = sf.util.neptune_utils.NeptuneLog( + neptune_api, + neptune_workspace + ) + + @property + def num_outcomes(self) -> int: + if self.hp.model_type() == 'classification': + assert self.outcome_names is not None + return len(self.outcome_names) + else: + return 1 + + @property + def multi_outcome(self) -> bool: + return (self.num_outcomes > 1) + + def _process_category_assignments(self) -> Dict[int, str]: + """Get category assignments for categorical outcome labels. + + Dataframes with integer labels are assumed to be categorical if + if hp.model_type is 'classification'. + Dataframes with float labels are assumed to be continuous. + Dataframes with other labels are assumed to be categorical, and will + be assigned an integer label based on the order of unique values. + + """ + if not isinstance(self.labels, pd.DataFrame): + raise ValueError("Expected DataFrame with 'label' column.") + if 'label' not in self.labels.columns: + raise ValueError("Expected DataFrame with 'label' column.") + if self.hp.model_type() == 'classification': + if is_integer_dtype(self.labels['label']) or is_float_dtype(self.labels['label']): + return {i: str(i) for i in sorted(self.labels['label'].unique())} + else: + int_to_str = dict(enumerate(sorted(self.labels['label'].unique()))) + str_to_int = {v: k for k, v in int_to_str.items()} + log.info("Assigned integer labels to categories:") + log.info(str_to_int) + self.labels['label'] = self.labels['label'].map(str_to_int) + return int_to_str + else: + return {} + + def _process_transforms( + self, + transform: Optional[Union[Callable, Dict[str, Callable]]] = None + ) -> None: + """Process custom transformations for training and/or validation.""" + if not isinstance(transform, dict): + transform = {'train': transform, 'val': transform} + if any([t not in ('train', 'val') for t in transform]): + raise ValueError("transform must be a callable or dict with keys " + "'train' and/or 'val'") + if 'train' not in transform: + transform['train'] = None + if 'val' not in transform: + transform['val'] = None + self.transform = transform + + def _process_outcome_labels( + self, + outcome_names: Optional[List[str]] = None, + ) -> None: + """Process outcome labels to determine number of outcomes and names. + + Supports experimental tile-level labels provided via pandas DataFrame. + + Args: + labels (dict): Dict mapping slide names to outcome labels (int or + float format). Experimental funtionality: if labels is a + pandas DataFrame, the 'label' column will be used as the + outcome labels. + outcome_names (list, optional): Name of each outcome. Defaults to + "Outcome {X}" for each outcome. + + """ + # Process DataFrame tile-level labels + if isinstance(self.labels, pd.DataFrame): + if 'label' not in self.labels.columns: + raise errors.ModelError("Expected DataFrame with 'label' " + "column.") + if outcome_names and len(outcome_names) > 1: + raise errors.ModelError( + "Expected single outcome name for labels from a pandas dataframe." + ) + self.outcome_names = outcome_names or ['Outcome 0'] + return + + # Process dictionary slide-level labels + outcome_labels = np.array(list(self.labels.values())) + if len(outcome_labels.shape) == 1: + outcome_labels = np.expand_dims(outcome_labels, axis=1) + if not outcome_names: + self.outcome_names = [ + f'Outcome {i}' + for i in range(outcome_labels.shape[1]) + ] + else: + self.outcome_names = outcome_names + if not len(self.outcome_names) == outcome_labels.shape[1]: + n_names = len(self.outcome_names) + n_out = outcome_labels.shape[1] + raise errors.ModelError(f"Number of outcome names ({n_names}) does" + f" not match number of outcomes ({n_out})") + + def _reset_training_params(self) -> None: + self.global_step = 0 + self.epoch = 0 # type: int + self.step = 0 # type: int + self.log_frequency = 0 # type: int + self.early_stop = False # type: bool + self.moving_average = [] # type: List + self.dataloaders = {} # type: Dict[str, Any] + self.validation_batch_size = None # type: Optional[int] + self.validate_on_batch = 0 + self.validation_steps = 0 + self.ema_observations = 0 # type: int + self.ema_smoothing = 0 + self.last_ema = -1 # type: float + self.ema_one_check_prior = -1 # type: float + self.ema_two_checks_prior = -1 # type: float + self.epoch_records = 0 # type: int + self.running_loss = 0.0 + self.running_corrects = {} # type: Union[Tensor, Dict[str, Tensor]] + + def _accuracy_as_numpy( + self, + acc: Union[Tensor, float, List[Tensor], List[float]] + ) -> Union[float, List[float]]: + if isinstance(acc, list): + return [t.item() if isinstance(t, Tensor) else t for t in acc] + else: + return (acc.item() if isinstance(acc, Tensor) else acc) + + def _build_model( + self, + checkpoint: Optional[str] = None, + pretrain: Optional[str] = None + ) -> None: + if checkpoint: + log.info(f"Loading checkpoint at [green]{checkpoint}") + self.load(checkpoint) + else: + self.model = self.hp.build_model( + labels=self.labels, + pretrain=pretrain, + num_slide_features=self.num_slide_features + ) + # Create an inference model before any multi-GPU parallelization + # is applied to the self.model parameter + self.inference_model = self.model + + def _calculate_accuracy( + self, + running_corrects: Union[Tensor, Dict[Any, Tensor]], + num_records: int = 1 + ) -> Tuple[Union[Tensor, List[Tensor]], str]: + '''Reports accuracy of each outcome.''' + assert self.hp.model_type() == 'classification' + if self.num_outcomes > 1: + if not isinstance(running_corrects, dict): + raise ValueError("Expected running_corrects to be a dict:" + " num_outcomes is > 1") + acc_desc = '' + acc_list = [running_corrects[r] / num_records + for r in running_corrects] + for o in range(len(running_corrects)): + _acc = running_corrects[f'out-{o}'] / num_records + acc_desc += f"out-{o} acc: {_acc:.4f} " + return acc_list, acc_desc + else: + assert not isinstance(running_corrects, dict) + _acc = running_corrects / num_records + return _acc, f'acc: {_acc:.4f}' + + def _calculate_loss( + self, + outputs: Union[Tensor, List[Tensor]], + labels: Union[Tensor, Dict[Any, Tensor]], + loss_fn: torch.nn.modules.loss._Loss + ) -> Tensor: + '''Calculates loss in a manner compatible with multiple outcomes.''' + if self.num_outcomes > 1: + if not isinstance(labels, dict): + raise ValueError("Expected labels to be a dict: num_outcomes" + " is > 1") + loss = sum([ + loss_fn(out, labels[f'out-{o}']) + for o, out in enumerate(outputs) + ]) + else: + loss = loss_fn(outputs, labels) + return loss # type: ignore + + def _check_early_stopping( + self, + val_acc: Optional[Union[float, List[float]]] = None, + val_loss: Optional[float] = None + ) -> str: + if val_acc is None and val_loss is None: + if (self.hp.early_stop + and self.hp.early_stop_method == 'manual' + and self.hp.manual_early_stop_epoch <= self.epoch # type: ignore + and self.hp.manual_early_stop_batch <= self.step): # type: ignore + log.info(f'Manual early stop triggered: epoch {self.epoch}, ' + f'batch {self.step}') + if self.epoch not in self.hp.epochs: + self.hp.epochs += [self.epoch] + self.early_stop = True + else: + if self.hp.early_stop_method == 'accuracy': + if self.num_outcomes > 1: + raise errors.ModelError( + "Early stopping method 'accuracy' not supported with" + " multiple outcomes; use 'loss'.") + early_stop_val = val_acc + else: + early_stop_val = val_loss + assert early_stop_val is not None + assert isinstance(early_stop_val, float) + + self.moving_average += [early_stop_val] + if len(self.moving_average) >= self.ema_observations: + # Only keep track of the last [ema_observations] + self.moving_average.pop(0) + if self.last_ema == -1: + # Simple moving average + self.last_ema = (sum(self.moving_average) + / len(self.moving_average)) # type: ignore + log_msg = f' (SMA: {self.last_ema:.3f})' + else: + alpha = (self.ema_smoothing / (1 + self.ema_observations)) + self.last_ema = (early_stop_val * alpha + + (self.last_ema * (1 - alpha))) + log_msg = f' (EMA: {self.last_ema:.3f})' + if self.neptune_run and self.last_ema != -1: + neptune_dest = "metrics/val/batch/exp_moving_avg" + self.neptune_run[neptune_dest].log(self.last_ema) + + if (self.hp.early_stop + and self.ema_two_checks_prior != -1 + and self.epoch > self.hp.early_stop_patience): + + if ((self.hp.early_stop_method == 'accuracy' + and self.last_ema <= self.ema_two_checks_prior) + or (self.hp.early_stop_method == 'loss' + and self.last_ema >= self.ema_two_checks_prior)): + + log.info(f'Early stop triggered: epoch {self.epoch}, ' + f'step {self.step}') + self._log_early_stop_to_neptune() + if self.epoch not in self.hp.epochs: + self.hp.epochs += [self.epoch] + self.early_stop = True + return log_msg + + self.ema_two_checks_prior = self.ema_one_check_prior + self.ema_one_check_prior = self.last_ema + return '' + + def _detect_patients(self, *args): + self.patients = dict() + for dataset in args: + if dataset is None: + continue + dataset_patients = dataset.patients() + if not dataset_patients: + self.patients.update({s: s for s in self.slides}) + else: + self.patients.update(dataset_patients) + + def _empty_corrects(self) -> Union[int, Dict[str, int]]: + if self.multi_outcome: + return { + f'out-{o}': 0 + for o in range(self.num_outcomes) + } + else: + return 0 + + def _epoch_metrics( + self, + acc: Union[float, List[float]], + loss: float, + label: str + ) -> Dict[str, Dict[str, Union[float, List[float]]]]: + epoch_metrics = {'loss': loss} # type: Dict + if self.hp.model_type() == 'classification': + epoch_metrics.update({'accuracy': acc}) + return {f'{label}_metrics': epoch_metrics} + + def _val_metrics(self, **kwargs) -> Dict[str, Dict[str, float]]: + """Evaluate model and calculate metrics. + + Returns: + Dict[str, Dict[str, float]]: Dict with validation metrics. + Returns metrics in the form: + ``` + { + 'val_metrics': { + 'loss': ..., + 'accuracy': ..., + }, + 'tile_auc': ..., + 'slide_auc': ..., + ... + } + ``` + """ + if hasattr(self, 'optimizer'): + self.optimizer.zero_grad() + assert self.model is not None + self.model.eval() + results_log = os.path.join(self.outdir, 'results_log.csv') + epoch_results = {} + + # Preparations for calculating accuracy/loss in metrics_from_dataset() + def update_corrects(pred, labels, running_corrects=None): + if running_corrects is None: + running_corrects = self._empty_corrects() + if self.hp.model_type() == 'classification': + labels = self._labels_to_device(labels, self.device) + return self._update_corrects(pred, labels, running_corrects) + else: + return 0 + + def update_loss(pred, labels, running_loss, size): + labels = self._labels_to_device(labels, self.device) + loss = self._calculate_loss(pred, labels, self.loss_fn) + return running_loss + (loss.item() * size) + + torch_args = types.SimpleNamespace( + update_corrects=update_corrects, + update_loss=update_loss, + num_slide_features=self.num_slide_features, + slide_input=self.slide_input, + normalizer=(self.normalizer if self._has_gpu_normalizer() else None), + ) + # Calculate patient/slide/tile metrics (AUC, R-squared, C-index, etc) + metrics, acc, loss = sf.stats.metrics_from_dataset( + self.inference_model, + model_type=self.hp.model_type(), + patients=self.patients, + dataset=self.dataloaders['val'], + data_dir=self.outdir, + outcome_names=self.outcome_names, + neptune_run=self.neptune_run, + torch_args=torch_args, + uq=bool(self.hp.uq), + **kwargs + ) + loss_and_acc = {'loss': loss} + if self.hp.model_type() == 'classification': + loss_and_acc.update({'accuracy': acc}) + self._log_epoch( + 'val', + self.epoch, + loss, + self._calculate_accuracy(acc)[1] # type: ignore + ) + epoch_metrics = {'val_metrics': loss_and_acc} + + for metric in metrics: + if metrics[metric]['tile'] is None: + continue + epoch_results[f'tile_{metric}'] = metrics[metric]['tile'] + epoch_results[f'slide_{metric}'] = metrics[metric]['slide'] + epoch_results[f'patient_{metric}'] = metrics[metric]['patient'] + epoch_metrics.update(epoch_results) + sf.util.update_results_log( + results_log, + 'trained_model', + {f'epoch{self.epoch}': epoch_metrics} + ) + self._log_eval_to_neptune(loss, acc, metrics, epoch_metrics) + return epoch_metrics + + def _fit_normalizer(self, norm_fit: Optional[NormFit]) -> None: + """Fit the Trainer normalizer using the specified fit, if applicable. + + Args: + norm_fit (Optional[Dict[str, np.ndarray]]): Normalizer fit. + """ + if norm_fit is not None and not self.normalizer: + raise ValueError("norm_fit supplied, but model params do not" + "specify a normalizer.") + if self.normalizer and norm_fit is not None: + self.normalizer.set_fit(**norm_fit) # type: ignore + elif (self.normalizer + and 'norm_fit' in self.config + and self.config['norm_fit'] is not None): + log.debug("Detecting normalizer fit from model config") + self.normalizer.set_fit(**self.config['norm_fit']) + + def _has_gpu_normalizer(self) -> bool: + import slideflow.norm.torch + return (isinstance(self.normalizer, sf.norm.torch.TorchStainNormalizer) + and self.normalizer.device != "cpu") + + def _labels_to_device( + self, + labels: Union[Dict[Any, Tensor], Tensor], + device: torch.device + ) -> Union[Dict[Any, Tensor], Tensor]: + '''Moves a set of outcome labels to the given device.''' + if self.num_outcomes > 1: + if not isinstance(labels, dict): + raise ValueError("Expected labels to be a dict: num_outcomes" + " is > 1") + labels = { + k: la.to(device, non_blocking=True) for k, la in labels.items() + } + elif isinstance(labels, dict): + labels = torch.stack(list(labels.values()), dim=1) + return labels.to(device, non_blocking=True) + else: + labels = labels.to(device, non_blocking=True) + return labels + + def _log_epoch( + self, + phase: str, + epoch: int, + loss: float, + accuracy_desc: str, + ) -> None: + """Logs epoch description.""" + log.info(f'[bold blue]{phase}[/] Epoch {epoch} | loss:' + f' {loss:.4f} {accuracy_desc}') + + def _log_manifest( + self, + train_dts: Optional["sf.Dataset"], + val_dts: Optional["sf.Dataset"], + labels: Optional[Union[str, Dict]] = 'auto' + ) -> None: + """Log the tfrecord and label manifest to slide_manifest.csv + + Args: + train_dts (sf.Dataset): Training dataset. May be None. + val_dts (sf.Dataset): Validation dataset. May be None. + labels (dict, optional): Labels dictionary. May be None. + Defaults to 'auto' (read from self.labels). + """ + if labels == 'auto': + _labels = self.labels + elif labels is None: + _labels = None + else: + assert isinstance(labels, dict) + _labels = labels + log_manifest( + (train_dts.tfrecords() if train_dts else None), + (val_dts.tfrecords() if val_dts else None), + labels=_labels, + filename=join(self.outdir, 'slide_manifest.csv') + ) + + def _log_to_tensorboard( + self, + loss: float, + acc: Union[float, List[float]], + label: str + ) -> None: + self.writer.add_scalar(f'Loss/{label}', loss, self.global_step) + if self.hp.model_type() == 'classification': + if self.num_outcomes > 1: + assert isinstance(acc, list) + for o, _acc in enumerate(acc): + self.writer.add_scalar( + f'Accuracy-{o}/{label}', _acc, self.global_step + ) + else: + self.writer.add_scalar( + f'Accuracy/{label}', acc, self.global_step + ) + + def _log_to_neptune( + self, + loss: float, + acc: Union[Tensor, List[Tensor]], + label: str, + phase: str + ) -> None: + """Logs epoch loss/accuracy to Neptune.""" + assert phase in ('batch', 'epoch') + step = self.epoch if phase == 'epoch' else self.global_step + if self.neptune_run: + self.neptune_run[f"metrics/{label}/{phase}/loss"].log(loss, + step=step) + acc = self._accuracy_as_numpy(acc) + if isinstance(acc, list): + for a, _acc in enumerate(acc): + sf.util.neptune_utils.list_log( + run=self.neptune_run, + label=f'metrics/{label}/{phase}/accuracy-{a}', + val=_acc, + step=step + ) + else: + sf.util.neptune_utils.list_log( + run=self.neptune_run, + label=f'metrics/{label}/{phase}/accuracy', + val=acc, + step=step + ) + + def _log_early_stop_to_neptune(self) -> None: + # Log early stop to neptune + if self.neptune_run: + self.neptune_run["early_stop/early_stop_epoch"] = self.epoch + self.neptune_run["early_stop/early_stop_batch"] = self.step + self.neptune_run["early_stop/method"] = self.hp.early_stop_method + self.neptune_run["sys/tags"].add("early_stopped") + + def _log_eval_to_neptune( + self, + loss: float, + acc: float, + metrics: Dict[str, Any], + epoch_results: Dict[str, Any] + ) -> None: + if self.use_neptune: + assert self.neptune_run is not None + self.neptune_run['results'] = epoch_results + + # Validation epoch metrics + self.neptune_run['metrics/val/epoch/loss'].log(loss, + step=self.epoch) + sf.util.neptune_utils.list_log( + self.neptune_run, + 'metrics/val/epoch/accuracy', + acc, + step=self.epoch + ) + for metric in metrics: + if metrics[metric]['tile'] is None: + continue + for outcome in metrics[metric]['tile']: + # If only one outcome, + # log to metrics/val/epoch/[metric]. + # If more than one outcome, + # log to metrics/val/epoch/[metric]/[outcome_name] + def metric_label(s): + if len(metrics[metric]['tile']) == 1: + return f'metrics/val/epoch/{s}_{metric}' + else: + return f'metrics/val/epoch/{s}_{metric}/{outcome}' + + tile_metric = metrics[metric]['tile'][outcome] + slide_metric = metrics[metric]['slide'][outcome] + patient_metric = metrics[metric]['patient'][outcome] + + # If only one value for a metric, log to .../[metric] + # If more than one value for a metric + # (e.g. AUC for each category), + # log to .../[metric]/[i] + sf.util.neptune_utils.list_log( + self.neptune_run, + metric_label('tile'), + tile_metric, + step=self.epoch + ) + sf.util.neptune_utils.list_log( + self.neptune_run, + metric_label('slide'), + slide_metric, + step=self.epoch + ) + sf.util.neptune_utils.list_log( + self.neptune_run, + metric_label('patient'), + patient_metric, + step=self.epoch + ) + + def _mid_training_validation(self) -> None: + """Perform mid-epoch validation, if appropriate.""" + + if not self.validate_on_batch: + return + elif not ( + 'val' in self.dataloaders + and self.step > 0 + and self.step % self.validate_on_batch == 0 + ): + return + + if self.model is None or self.inference_model is None: + raise errors.ModelError("Model not yet initialized.") + self.model.eval() + running_val_loss = 0 + num_val = 0 + running_val_correct = self._empty_corrects() + + for _ in range(self.validation_steps): + val_img, val_label, slides, *_ = next(self.mid_train_val_dts) # type:ignore + val_img = val_img.to(self.device) + val_img = val_img.to(memory_format=torch.channels_last) + + with torch.inference_mode(): + _mp = (self.mixed_precision and self.device.type in ('cuda', 'cpu')) + with autocast(self.device.type, mixed_precision=_mp): # type: ignore + + # GPU normalization, if specified. + if self._has_gpu_normalizer(): + val_img = self.normalizer.preprocess(val_img) + + if self.num_slide_features: + _slide_in = [self.slide_input[s] for s in slides] + inp = (val_img, Tensor(_slide_in).to(self.device)) + else: + inp = (val_img,) # type: ignore + val_outputs = self.inference_model(*inp) + val_label = self._labels_to_device(val_label, self.device) + val_batch_loss = self._calculate_loss( + val_outputs, val_label, self.loss_fn + ) + + running_val_loss += val_batch_loss.item() * val_img.size(0) + if self.hp.model_type() == 'classification': + running_val_correct = self._update_corrects( + val_outputs, val_label, running_val_correct # type: ignore + ) + num_val += val_img.size(0) + val_loss = running_val_loss / num_val + if self.hp.model_type() == 'classification': + val_acc, val_acc_desc = self._calculate_accuracy( + running_val_correct, num_val # type: ignore + ) + else: + val_acc, val_acc_desc = 0, '' # type: ignore + log_msg = f'Batch {self.step}: val loss: {val_loss:.4f} {val_acc_desc}' + + # Log validation metrics to neptune & check early stopping + self._log_to_neptune(val_loss, val_acc, 'val', phase='batch') + log_msg += self._check_early_stopping( + self._accuracy_as_numpy(val_acc), + val_loss + ) + log.info(log_msg) + + # Log to tensorboard + if self.use_tensorboard: + if self.num_outcomes > 1: + assert isinstance(running_val_correct, dict) + _val_acc = [ + running_val_correct[f'out-{o}'] / num_val + for o in range(len(val_outputs)) + ] + else: + assert not isinstance(running_val_correct, dict) + _val_acc = running_val_correct / num_val # type: ignore + self._log_to_tensorboard( + val_loss, + self._accuracy_as_numpy(_val_acc), + 'test' + ) # type: ignore + + # Put model back in training mode + self.model.train() + + def _prepare_optimizers_and_loss(self) -> None: + if self.model is None: + raise ValueError("Model has not yet been initialized.") + self.optimizer = self.hp.get_opt(self.model.parameters()) + if self.hp.learning_rate_decay: + self.scheduler = torch.optim.lr_scheduler.ExponentialLR( + self.optimizer, + gamma=self.hp.learning_rate_decay + ) + log.debug("Using exponentially decaying learning rate") + else: + self.scheduler = None # type: ignore + self.loss_fn = self.hp.get_loss() + if self.mixed_precision and self.device.type == 'cuda': + self.scaler = torch.cuda.amp.GradScaler() + + def _prepare_neptune_run(self, dataset: "sf.Dataset", label: str) -> None: + if self.use_neptune: + tags = [label] + if 'k-fold' in self.config['validation_strategy']: + tags += [f'k-fold{self.config["k_fold_i"]}'] + self.neptune_run = self.neptune_logger.start_run( + self.name, + self.config['project'], + dataset, + tags=tags + ) + assert self.neptune_run is not None + self.neptune_logger.log_config(self.config, label) + self.neptune_run['data/slide_manifest'].upload( + os.path.join(self.outdir, 'slide_manifest.csv') + ) + try: + config_path = join(self.outdir, 'params.json') + config = sf.util.load_json(config_path) + config['neptune_id'] = self.neptune_run['sys/id'].fetch() + except Exception: + log.info("Unable to log params (params.json) with Neptune.") + + def _print_model_summary(self, train_dts) -> None: + """Prints model summary and logs to neptune.""" + if self.model is None: + raise ValueError("Model has not yet been initialized.") + empty_inp = [torch.empty( + [self.hp.batch_size, 3, train_dts.tile_px, train_dts.tile_px] + )] + if self.num_slide_features: + empty_inp += [ + torch.empty([self.hp.batch_size, self.num_slide_features]) + ] + if sf.getLoggingLevel() <= 20: + model_summary = torch_utils.print_module_summary( + self.model, empty_inp + ) + if self.neptune_run: + self.neptune_run['summary'] = model_summary + + def _save_model(self) -> None: + assert self.model is not None + name = self.name if self.name else 'trained_model' + save_path = os.path.join(self.outdir, f'{name}_epoch{self.epoch}.zip') + torch.save(self.model.state_dict(), save_path) + log.info(f"Model saved to [green]{save_path}") + + def _close_dataloaders(self): + """Close dataloaders, ensuring threads have joined.""" + del self.mid_train_val_dts + for name, d in self.dataloaders.items(): + if '_dataset' in dir(d): + log.debug(f"Closing dataloader {name} via _dataset.close()") + d._dataset.close() + elif 'dataset' in dir(d): + log.debug(f"Closing dataloader {name} via dataset.close()") + d.dataset.close() + + def _setup_dataloaders( + self, + train_dts: Optional["sf.Dataset"], + val_dts: Optional["sf.Dataset"], + mid_train_val: bool = False, + incl_labels: bool = True, + from_wsi: bool = False, + **kwargs + ) -> None: + """Prepare dataloaders from training and validation.""" + interleave_args = types.SimpleNamespace( + rank=0, + num_replicas=1, + labels=(self.labels if incl_labels else None), + chunk_size=self.chunk_size, + pin_memory=self.pin_memory, + num_workers=self.num_workers if not from_wsi else 0, + onehot=False, + incl_slidenames=True, + from_wsi=from_wsi, + **kwargs + ) + # Use GPU stain normalization for PyTorch normalizers, if supported + _augment_str = self.hp.augment + if self._has_gpu_normalizer(): + log.info("Using GPU for stain normalization") + interleave_args.standardize = False + if isinstance(_augment_str, str): + _augment_str = _augment_str.replace('n', '') + else: + interleave_args.normalizer = self.normalizer + + if train_dts is not None: + self.dataloaders = { + 'train': iter(train_dts.torch( + infinite=True, + batch_size=self.hp.batch_size, + augment=_augment_str, + transform=self.transform['train'], + drop_last=True, + **vars(interleave_args) + )) + } + else: + self.dataloaders = {} + if val_dts is not None: + if not self.validation_batch_size: + validation_batch_size = self.hp.batch_size + self.dataloaders['val'] = val_dts.torch( + infinite=False, + batch_size=validation_batch_size, + augment=False, + transform=self.transform['val'], + incl_loc=True, + **vars(interleave_args) + ) + # Mid-training validation dataset + if mid_train_val: + self.mid_train_val_dts = torch_utils.cycle( + self.dataloaders['val'] + ) + if not self.validate_on_batch: + val_log_msg = '' + else: + val_log_msg = f'every {str(self.validate_on_batch)} steps and ' + log.debug(f'Validation during training: {val_log_msg}at epoch end') + if self.validation_steps: + num_samples = self.validation_steps * self.hp.batch_size + log.debug( + f'Using {self.validation_steps} batches ({num_samples} ' + 'samples) each validation check' + ) + else: + log.debug('Using entire validation set each validation check') + else: + log.debug('Validation during training: None') + + def _training_step(self, pb: Progress) -> None: + assert self.model is not None + images, labels, slides = next(self.dataloaders['train']) + images = images.to(self.device, non_blocking=True) + images = images.to(memory_format=torch.channels_last) + labels = self._labels_to_device(labels, self.device) + self.optimizer.zero_grad() + with torch.set_grad_enabled(True): + _mp = (self.mixed_precision and self.device.type in ('cuda', 'cpu')) + with autocast(self.device.type, mixed_precision=_mp): # type: ignore + + # GPU normalization, if specified. + if self._has_gpu_normalizer(): + images = self.normalizer.preprocess( + images, + augment=(isinstance(self.hp.augment, str) + and 'n' in self.hp.augment) + ) + + # Slide-level features + if self.num_slide_features: + _slide_in = [self.slide_input[s] for s in slides] + inp = (images, Tensor(_slide_in).to(self.device)) + else: + inp = (images,) # type: ignore + outputs = self.model(*inp) + loss = self._calculate_loss(outputs, labels, self.loss_fn) + + # Update weights + if self.mixed_precision and self.device.type == 'cuda': + self.scaler.scale(loss).backward() + self.scaler.step(self.optimizer) + self.scaler.update() + else: + loss.backward() + self.optimizer.step() + + # Update learning rate if using a scheduler + _lr_decay_steps = self.hp.learning_rate_decay_steps + if self.scheduler and (self.global_step+1) % _lr_decay_steps == 0: + log.debug("Stepping learning rate decay") + self.scheduler.step() + + # Record accuracy and loss + self.epoch_records += images.size(0) + if self.hp.model_type() == 'classification': + self.running_corrects = self._update_corrects( + outputs, labels, self.running_corrects + ) + train_acc, acc_desc = self._calculate_accuracy( + self.running_corrects, self.epoch_records + ) + else: + train_acc, acc_desc = 0, '' # type: ignore + self.running_loss += loss.item() * images.size(0) + _loss = self.running_loss / self.epoch_records + pb.update(task_id=0, # type: ignore + description=(f'[bold blue]train[/] ' + f'loss: {_loss:.4f} {acc_desc}')) + pb.advance(task_id=0) # type: ignore + + # Log to tensorboard + if self.use_tensorboard and self.global_step % self.log_frequency == 0: + if self.num_outcomes > 1: + _train_acc = [ + (self.running_corrects[f'out-{o}'] # type: ignore + / self.epoch_records) + for o in range(len(outputs)) + ] + else: + _train_acc = (self.running_corrects # type: ignore + / self.epoch_records) + self._log_to_tensorboard( + loss.item(), + self._accuracy_as_numpy(_train_acc), + 'train' + ) + # Log to neptune & check early stopping + self._log_to_neptune(loss.item(), train_acc, 'train', phase='batch') + self._check_early_stopping(None, None) + + def _update_corrects( + self, + outputs: Union[Tensor, Dict[Any, Tensor]], + labels: Union[Tensor, Dict[str, Tensor]], + running_corrects: Union[Tensor, Dict[str, Tensor]], + ) -> Union[Tensor, Dict[str, Tensor]]: + '''Updates running accuracy in a manner compatible with >1 outcomes.''' + assert self.hp.model_type() == 'classification' + if self.num_outcomes > 1: + for o, out in enumerate(outputs): + _, preds = torch.max(out, 1) + running_corrects[f'out-{o}'] += torch.sum( # type: ignore + preds == labels[f'out-{o}'].data # type: ignore + ) + else: + _, preds = torch.max(outputs, 1) # type: ignore + running_corrects += torch.sum(preds == labels.data) # type: ignore + return running_corrects + + def _validate_early_stop(self) -> None: + """Validates early stopping parameters.""" + + if (self.hp.early_stop and self.hp.early_stop_method == 'accuracy' and + self.hp.model_type() == 'classification' and self.num_outcomes > 1): + raise errors.ModelError("Cannot combine 'accuracy' early stopping " + "with multiple categorical outcomes.") + if (self.hp.early_stop_method == 'manual' + and (self.hp.manual_early_stop_epoch is None + or self.hp.manual_early_stop_batch is None)): + raise errors.ModelError( + "Early stopping method 'manual' requires that both " + "manual_early_stop_epoch and manual_early_stop_batch are set " + "in model params." + ) + + def _verify_img_format(self, dataset, *datasets: Optional["sf.Dataset"]) -> str: + """Verify that the image format of the dataset matches the model config. + + Args: + dataset (sf.Dataset): Dataset to check. + *datasets (sf.Dataset): Additional datasets to check. May be None. + + Returns: + str: Image format, either 'png' or 'jpg', if a consistent image + format was found, otherwise None. + + """ + # First, verify all datasets have the same image format + img_formats = set([d.img_format for d in datasets if d]) + if len(img_formats) > 1: + log.error("Multiple image formats detected: {}.".format( + ', '.join(img_formats) + )) + return None + elif self.img_format and not dataset.img_format: + log.warning("Unable to verify image format (PNG/JPG) of dataset.") + return None + elif self.img_format and dataset.img_format != self.img_format: + log.error( + "Mismatched image formats. Expected '{}' per model config, " + "but dataset has format '{}'.".format( + self.img_format, + dataset.img_format)) + return None + else: + return dataset.img_format + + def load(self, model: str, training=True) -> None: + """Loads a state dict at the given model location. Requires that the + Trainer's hyperparameters (Trainer.hp) + match the hyperparameters of the model to be loaded.""" + + if self.labels is not None: + self.model = self.hp.build_model( + labels=self.labels, + num_slide_features=self.num_slide_features + ) + else: + self.model = self.hp.build_model( + num_classes=len(self.outcome_names), + num_slide_features=self.num_slide_features + ) + self.model.load_state_dict(torch.load(model)) + self.inference_model = self.model + + def predict( + self, + dataset: "sf.Dataset", + batch_size: Optional[int] = None, + norm_fit: Optional[NormFit] = None, + format: str = 'parquet', + from_wsi: bool = False, + roi_method: str = 'auto', + reduce_method: Union[str, Callable] = 'average', + ) -> Dict[str, "pd.DataFrame"]: + """Perform inference on a model, saving predictions. + + Args: + dataset (:class:`slideflow.dataset.Dataset`): Dataset containing + TFRecords to evaluate. + batch_size (int, optional): Evaluation batch size. Defaults to the + same as training (per self.hp) + norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit + parameters (e.g. target_means, target_stds) to values + (np.ndarray). If not provided, will fit normalizer using + model params (if applicable). Defaults to None. + format (str, optional): Format in which to save predictions. Either + 'csv', 'feather', or 'parquet'. Defaults to 'parquet'. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + roi_method (str): ROI method to use if from_wsi=True (ignored if + from_wsi=False). Either 'inside', 'outside', 'auto', 'ignore'. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + + Returns: + Dict[str, pd.DataFrame]: Dictionary with keys 'tile', 'slide', and + 'patient', and values containing DataFrames with tile-, slide-, + and patient-level predictions. + """ + if format not in ('csv', 'feather', 'parquet'): + raise ValueError(f"Unrecognized format {format}") + + self._detect_patients(dataset) + + # Verify image format + self._verify_img_format(dataset) + + # Fit normalizer + self._fit_normalizer(norm_fit) + + # Load and initialize model + if not self.model: + raise errors.ModelNotLoadedError + self.model.to(self.device) + self.model.eval() + self._log_manifest(None, dataset, labels=None) + + if from_wsi and sf.slide_backend() == 'libvips': + pool = mp.Pool( + sf.util.num_cpu(default=8), + initializer=sf.util.set_ignore_sigint + ) + elif from_wsi: + pool = mp.dummy.Pool(sf.util.num_cpu(default=8)) + else: + pool = None + if not batch_size: + batch_size = self.hp.batch_size + + self._setup_dataloaders( + train_dts=None, + val_dts=dataset, + incl_labels=False, + from_wsi=from_wsi, + roi_method=roi_method, + pool=pool) + + log.info('Generating predictions...') + torch_args = types.SimpleNamespace( + num_slide_features=self.num_slide_features, + slide_input=self.slide_input, + normalizer=(self.normalizer if self._has_gpu_normalizer() else None), + ) + dfs = sf.stats.predict_dataset( + model=self.model, + dataset=self.dataloaders['val'], + model_type=self._model_type, + torch_args=torch_args, + outcome_names=self.outcome_names, + uq=bool(self.hp.uq), + patients=self.patients, + reduce_method=reduce_method + ) + # Save predictions + sf.stats.metrics.save_dfs(dfs, format=format, outdir=self.outdir) + self._close_dataloaders() + if pool is not None: + pool.close() + return dfs + + def evaluate( + self, + dataset: "sf.Dataset", + batch_size: Optional[int] = None, + save_predictions: Union[bool, str] = 'parquet', + reduce_method: Union[str, Callable] = 'average', + norm_fit: Optional[NormFit] = None, + uq: Union[bool, str] = 'auto', + from_wsi: bool = False, + roi_method: str = 'auto', + ): + """Evaluate model, saving metrics and predictions. + + Args: + dataset (:class:`slideflow.dataset.Dataset`): Dataset to evaluate. + batch_size (int, optional): Evaluation batch size. Defaults to the + same as training (per self.hp) + save_predictions (bool or str, optional): Save tile, slide, and + patient-level predictions at each evaluation. May be 'csv', + 'feather', or 'parquet'. If False, will not save predictions. + Defaults to 'parquet'. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit + parameters (e.g. target_means, target_stds) to values + (np.ndarray). If not provided, will fit normalizer using + model params (if applicable). Defaults to None. + uq (bool or str, optional): Enable UQ estimation (for + applicable models). Defaults to 'auto'. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + roi_method (str): ROI method to use if from_wsi=True (ignored if + from_wsi=False). Either 'inside', 'outside', 'auto', 'ignore'. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + + Returns: + Dictionary of evaluation metrics. + """ + if uq != 'auto': + if not isinstance(uq, bool): + raise ValueError(f"Unrecognized value {uq} for uq") + self.hp.uq = uq + if batch_size: + self.validation_batch_size = batch_size + if not self.model: + raise errors.ModelNotLoadedError + if from_wsi and sf.slide_backend() == 'libvips': + pool = mp.Pool( + sf.util.num_cpu(default=8), + initializer=sf.util.set_ignore_sigint + ) + elif from_wsi: + pool = mp.dummy.Pool(sf.util.num_cpu(default=8)) + else: + pool = None + + self._detect_patients(dataset) + self._verify_img_format(dataset) + self._fit_normalizer(norm_fit) + self.model.to(self.device) + self.model.eval() + self.loss_fn = self.hp.get_loss() + self._log_manifest(None, dataset) + self._prepare_neptune_run(dataset, 'eval') + self._setup_dataloaders( + train_dts=None, + val_dts=dataset, + from_wsi=from_wsi, + roi_method=roi_method, + pool=pool) + + # Generate performance metrics + log.info('Performing evaluation...') + metrics = self._val_metrics( + label='eval', + reduce_method=reduce_method, + save_predictions=save_predictions + ) + results = {'eval': { + k: v for k, v in metrics.items() if k != 'val_metrics' + }} + results['eval'].update(metrics['val_metrics']) # type: ignore + results_str = json.dumps(results['eval'], indent=2, sort_keys=True) + log.info(f"Evaluation metrics: {results_str}") + results_log = os.path.join(self.outdir, 'results_log.csv') + sf.util.update_results_log(results_log, 'eval_model', results) + + if self.neptune_run: + self.neptune_run['eval/results'] = results['eval'] + self.neptune_run.stop() + self._close_dataloaders() + if pool is not None: + pool.close() + return results + + def train( + self, + train_dts: "sf.Dataset", + val_dts: "sf.Dataset", + log_frequency: int = 20, + validate_on_batch: int = 0, + validation_batch_size: Optional[int] = None, + validation_steps: int = 50, + starting_epoch: int = 0, + ema_observations: int = 20, + ema_smoothing: int = 2, + use_tensorboard: bool = True, + steps_per_epoch_override: int = 0, + save_predictions: Union[bool, str] = 'parquet', + save_model: bool = True, + resume_training: Optional[str] = None, + pretrain: Optional[str] = 'imagenet', + checkpoint: Optional[str] = None, + save_checkpoints: bool = False, + multi_gpu: bool = False, + norm_fit: Optional[NormFit] = None, + reduce_method: Union[str, Callable] = 'average', + seed: int = 0, + from_wsi: bool = False, + roi_method: str = 'auto', + ) -> Dict[str, Any]: + """Builds and trains a model from hyperparameters. + + Args: + train_dts (:class:`slideflow.dataset.Dataset`): Training dataset. + val_dts (:class:`slideflow.dataset.Dataset`): Validation dataset. + log_frequency (int, optional): How frequent to update Tensorboard + logs, in batches. Defaults to 100. + validate_on_batch (int, optional): Validation will be performed + every N batches. Defaults to 0. + validation_batch_size (int, optional): Validation batch size. + Defaults to same as training (per self.hp). + validation_steps (int, optional): Number of batches to use for each + instance of validation. Defaults to 200. + starting_epoch (int, optional): Starts training at this epoch. + Defaults to 0. + ema_observations (int, optional): Number of observations over which + to perform exponential moving average smoothing. + Defaults to 20. + ema_smoothing (int, optional): Exponential average smoothing value. + Defaults to 2. + use_tensoboard (bool, optional): Enable tensorboard callbacks. + Defaults to False. + steps_per_epoch_override (int, optional): Manually set the number + of steps per epoch. Defaults to None. + save_predictions (bool or str, optional): Save tile, slide, and + patient-level predictions at each evaluation. May be 'csv', + 'feather', or 'parquet'. If False, will not save predictions. + Defaults to 'parquet'. + save_model (bool, optional): Save models when evaluating at + specified epochs. Defaults to False. + resume_training (str, optional): Not applicable to PyTorch backend. + Included as argument for compatibility with Tensorflow backend. + Will raise NotImplementedError if supplied. + pretrain (str, optional): Either 'imagenet' or path to Tensorflow + model from which to load weights. Defaults to 'imagenet'. + checkpoint (str, optional): Path to cp.ckpt from which to load + weights. Defaults to None. + norm_fit (Dict[str, np.ndarray]): Normalizer fit, mapping fit + parameters (e.g. target_means, target_stds) to values + (np.ndarray). If not provided, will fit normalizer using + model params (if applicable). Defaults to None. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + seed (int): Set numpy random seed. Defaults to 0. + from_wsi (bool): Generate predictions from tiles dynamically + extracted from whole-slide images, rather than TFRecords. + Defaults to False (use TFRecords). + roi_method (str): ROI method to use if from_wsi=True (ignored if + from_wsi=False). Either 'inside', 'outside', 'auto', 'ignore'. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + + Returns: + Dict: Nested dict containing metrics for each evaluated epoch. + """ + if resume_training is not None: + raise NotImplementedError( + "PyTorch backend does not support `resume_training`; " + "please use `checkpoint`" + ) + if save_checkpoints: + log.warning( + "The argument save_checkpoints is ignored when training models " + "in the PyTorch backend. To save a model throughout training, " + "use the `epochs` hyperparameter." + ) + results = {'epochs': defaultdict(dict)} # type: Dict[str, Any] + starting_epoch = max(starting_epoch, 1) + self._detect_patients(train_dts, val_dts) + self._reset_training_params() + self.validation_batch_size = validation_batch_size + self.validate_on_batch = validate_on_batch + self.validation_steps = validation_steps + self.ema_observations = ema_observations + self.ema_smoothing = ema_smoothing + self.log_frequency = log_frequency + self.use_tensorboard = use_tensorboard + + # Verify image format across datasets. + img_format = self._verify_img_format(train_dts, val_dts) + if img_format and self.config['img_format'] is None: + self.config['img_format'] = img_format + sf.util.write_json(self.config, join(self.outdir, 'params.json')) + + if self.use_tensorboard: + from google.protobuf import __version__ as protobuf_version + if version.parse(protobuf_version) >= version.parse('3.21'): + log.warning( + "Tensorboard is incompatible with protobuf >= 3.21." + "Downgrade protobuf to enable tensorboard logging." + ) + self.use_tensorboard = False + + if from_wsi and sf.slide_backend() == 'libvips': + pool = mp.Pool( + sf.util.num_cpu(default=8), + initializer=sf.util.set_ignore_sigint + ) + elif from_wsi: + pool = mp.dummy.Pool(sf.util.num_cpu(default=8)) + else: + pool = None + + # Validate early stopping parameters + self._validate_early_stop() + + # Fit normalizer to dataset, if applicable + self._fit_normalizer(norm_fit) + if self.normalizer and self.hp.normalizer_source == 'dataset': + self.normalizer.fit(train_dts) + + if self.normalizer: + config_path = join(self.outdir, 'params.json') + if not os.path.exists(config_path): + config = { + 'slideflow_version': sf.__version__, + 'hp': self.hp.to_dict(), + 'backend': sf.backend() + } + else: + config = sf.util.load_json(config_path) + config['norm_fit'] = self.normalizer.get_fit(as_list=True) + sf.util.write_json(config, config_path) + + # Training preparation + if steps_per_epoch_override: + self.steps_per_epoch = steps_per_epoch_override + log.info(f"Setting steps per epoch = {steps_per_epoch_override}") + else: + self.steps_per_epoch = train_dts.num_tiles // self.hp.batch_size + log.info(f"Steps per epoch = {self.steps_per_epoch}") + if self.use_tensorboard: + # Delayed import due to protobuf version conflicts. + + from torch.utils.tensorboard import SummaryWriter + self.writer = SummaryWriter(self.outdir, flush_secs=60) + self._log_manifest(train_dts, val_dts) + + # Prepare neptune run + self._prepare_neptune_run(train_dts, 'train') + + # Build model + self._build_model(checkpoint, pretrain) + assert self.model is not None + + # Print model summary + self._print_model_summary(train_dts) + + # Multi-GPU + if multi_gpu: + self.model = torch.nn.DataParallel(self.model) + self.model = self.model.to(self.device) + + # Setup dataloaders + self._setup_dataloaders( + train_dts=train_dts, + val_dts=val_dts, + mid_train_val=True, + roi_method=roi_method, + from_wsi=from_wsi, + pool=pool) + + # Model parameters and optimizer + self._prepare_optimizers_and_loss() + + # === Epoch loop ====================================================== + for self.epoch in range(starting_epoch, max(self.hp.epochs)+1): + np.random.seed(seed+self.epoch) + log.info(f'[bold]Epoch {self.epoch}/{max(self.hp.epochs)}') + + # Training loop --------------------------------------------------- + self.epoch_records = 0 + self.running_loss = 0.0 + self.step = 1 + self.running_corrects = self._empty_corrects() # type: ignore + self.model.train() + pb = Progress( + *Progress.get_default_columns(), + TimeElapsedColumn(), + ImgBatchSpeedColumn(self.hp.batch_size), + transient=sf.getLoggingLevel()>20 + ) + task = pb.add_task("Training...", total=self.steps_per_epoch) + pb.start() + with sf.util.cleanup_progress(pb): + while self.step <= self.steps_per_epoch: + self._training_step(pb) + if self.early_stop: + break + self._mid_training_validation() + self.step += 1 + self.global_step += 1 + + # Update and log epoch metrics ------------------------------------ + loss = self.running_loss / self.epoch_records + epoch_metrics = {'train_metrics': {'loss': loss}} + if self.hp.model_type() == 'classification': + acc, acc_desc = self._calculate_accuracy( + self.running_corrects, self.epoch_records + ) + epoch_metrics['train_metrics'].update({ + 'accuracy': self._accuracy_as_numpy(acc) # type: ignore + }) + else: + acc, acc_desc = 0, '' # type: ignore + results['epochs'][f'epoch{self.epoch}'].update(epoch_metrics) + self._log_epoch('train', self.epoch, loss, acc_desc) + self._log_to_neptune(loss, acc, 'train', 'epoch') + if save_model and (self.epoch in self.hp.epochs or self.early_stop): + self._save_model() + + # Full evaluation ------------------------------------------------- + # Perform full evaluation if the epoch is one of the + # predetermined epochs at which to save/eval a model + if 'val' in self.dataloaders and self.epoch in self.hp.epochs: + epoch_res = self._val_metrics( + save_predictions=save_predictions, + reduce_method=reduce_method, + label=f'val_epoch{self.epoch}', + ) + results['epochs'][f'epoch{self.epoch}'].update(epoch_res) + + # Early stopping -------------------------------------------------- + if self.early_stop: + break + + # === [end epoch loop] ================================================ + + if self.neptune_run: + self.neptune_run['sys/tags'].add('training_complete') + self.neptune_run.stop() + self._close_dataloaders() + if pool is not None: + pool.close() + return results
+ + +
[docs]class RegressionTrainer(Trainer): + + """Extends the base :class:`slideflow.model.Trainer` class to add support + for continuous outcomes. Requires that all outcomes be continuous, with appropriate + regression loss function. Uses R-squared as the evaluation metric, rather + than AUROC. + + In this case, for the PyTorch backend, the continuous outcomes support is + already baked into the base Trainer class, so no additional modifications + are required. This class is written to inherit the Trainer class without + modification to maintain consistency with the Tensorflow backend. + """ + + _model_type = 'regression' + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs)
+ + +
[docs]class SurvivalTrainer(Trainer): + + """Cox proportional hazards (CPH) models are not yet implemented, but are + planned for a future update.""" + + def __init__(self, *args, **kwargs): + raise NotImplementedError
+ +# ----------------------------------------------------------------------------- + +
[docs]class Features(BaseFeatureExtractor): + """Interface for obtaining predictions and features from intermediate layer + activations from Slideflow models. + + Use by calling on either a batch of images (returning outputs for a single + batch), or by calling on a :class:`slideflow.WSI` object, which will + generate an array of spatially-mapped activations matching the slide. + + Examples + *Calling on batch of images:* + + .. code-block:: python + + interface = Features('/model/path', layers='postconv') + for image_batch in train_data: + # Return shape: (batch_size, num_features) + batch_features = interface(image_batch) + + *Calling on a slide:* + + .. code-block:: python + + slide = sf.slide.WSI(...) + interface = Features('/model/path', layers='postconv') + # Return shape: + # (slide.grid.shape[0], slide.grid.shape[1], num_features) + activations_grid = interface(slide) + + Note: + When this interface is called on a batch of images, no image processing + or stain normalization will be performed, as it is assumed that + normalization will occur during data loader image processing. When the + interface is called on a `slideflow.WSI`, the normalization strategy + will be read from the model configuration file, and normalization will + be performed on image tiles extracted from the WSI. If this interface + was created from an existing model and there is no model configuration + file to read, a slideflow.norm.StainNormalizer object may be passed + during initialization via the argument `wsi_normalizer`. + + """ + + def __init__( + self, + path: Optional[str], + layers: Optional[Union[str, List[str]]] = 'postconv', + *, + include_preds: bool = False, + mixed_precision: bool = True, + channels_last: bool = True, + device: Optional[torch.device] = None, + apply_softmax: Optional[bool] = None, + pooling: Optional[Any] = None, + load_method: str = 'weights', + ): + """Creates an activations interface from a saved slideflow model which + outputs feature activations at the designated layers. + + Intermediate layers are returned in the order of layers. + predictions are returned last. + + Args: + path (str): Path to saved Slideflow model. + layers (list(str), optional): Layers from which to generate + activations. The post-convolution activation layer is accessed + via 'postconv'. Defaults to 'postconv'. + include_preds (bool, optional): Include predictions in output. Will be + returned last. Defaults to False. + mixed_precision (bool, optional): Use mixed precision. + Defaults to True. + device (:class:`torch.device`, optional): Device for model. + Defaults to torch.device('cuda') + apply_softmax (bool): Apply softmax transformation to model output. + Defaults to True for classification models, False for regression models. + pooling (Callable or str, optional): PyTorch pooling function to use + on feature layers. May be a string ('avg' or 'max') or a + callable PyTorch function. + load_method (str): Loading method to use when reading model. + This argument is ignored in the PyTorch backend, as all models + are loaded by first building the model with hyperparameters + detected in ``params.json``, then loading weights with + ``torch.nn.Module.load_state_dict()``. Defaults to + 'full' (ignored). + """ + super().__init__('torch', include_preds=include_preds) + if layers and isinstance(layers, str): + layers = [layers] + self.layers = layers + self.path = path + self.apply_softmax = apply_softmax + self.mixed_precision = mixed_precision + self.channels_last = channels_last + self._model = None + self._pooling = None + self._include_preds = None + + # Transformation for standardizing uint8 images to float32 + self.transform = torchvision.transforms.Lambda(lambda x: x / 127.5 - 1) + + # Hook for storing layer activations during model inference + self.activation = {} # type: Dict[Any, Tensor] + + # Configure device + self.device = torch_utils.get_device(device) + + if path is not None: + config = sf.util.get_model_config(path) + if 'img_format' in config: + self.img_format = config['img_format'] + self.hp = ModelParams() # type: Optional[ModelParams] + self.hp.load_dict(config['hp']) + self.wsi_normalizer = self.hp.get_normalizer() + if 'norm_fit' in config and config['norm_fit'] is not None: + self.wsi_normalizer.set_fit(**config['norm_fit']) # type: ignore + self.tile_px = self.hp.tile_px + self._model = self.hp.build_model( + num_classes=len(config['outcome_labels']) + ) + if apply_softmax is None: + self.apply_softmax = True if config['model_type'] == 'classification' else False + log.debug(f"Using apply_softmax={self.apply_softmax}") + self._model.load_state_dict(torch.load(path)) + self._model.to(self.device) + self._model.eval() + if self._model.__class__.__name__ == 'ModelWrapper': + self.model_type = self._model.model.__class__.__name__ + else: + self.model_type = self._model.__class__.__name__ + self._build(pooling=pooling) + + @classmethod + def from_model( + cls, + model: torch.nn.Module, + tile_px: int, + layers: Optional[Union[str, List[str]]] = 'postconv', + *, + include_preds: bool = False, + mixed_precision: bool = True, + channels_last: bool = True, + wsi_normalizer: Optional["StainNormalizer"] = None, + apply_softmax: bool = True, + pooling: Optional[Any] = None + ): + """Creates an activations interface from a loaded slideflow model which + outputs feature activations at the designated layers. + + Intermediate layers are returned in the order of layers. + predictions are returned last. + + Args: + model (:class:`tensorflow.keras.models.Model`): Loaded model. + tile_px (int): Width/height of input image size. + layers (list(str), optional): Layers from which to generate + activations. The post-convolution activation layer is accessed + via 'postconv'. Defaults to 'postconv'. + include_preds (bool, optional): Include predictions in output. Will be + returned last. Defaults to False. + mixed_precision (bool, optional): Use mixed precision. + Defaults to True. + wsi_normalizer (:class:`slideflow.norm.StainNormalizer`): Stain + normalizer to use on whole-slide images. Is not used on + individual tile datasets via __call__. Defaults to None. + apply_softmax (bool): Apply softmax transformation to model output. + Defaults to True. + pooling (Callable or str, optional): PyTorch pooling function to use + on feature layers. May be a string ('avg' or 'max') or a + callable PyTorch function. + """ + device = next(model.parameters()).device + if include_preds is not None: + kw = dict(include_preds=include_preds) + else: + kw = dict() + obj = cls( + None, + layers, + mixed_precision=mixed_precision, + channels_last=channels_last, + device=device, + **kw + ) + if isinstance(model, torch.nn.Module): + obj._model = model + obj._model.eval() + else: + raise errors.ModelError("Model is not a valid PyTorch model.") + obj.hp = None + if obj._model.__class__.__name__ == 'ModelWrapper': + obj.model_type = obj._model.model.__class__.__name__ + else: + obj.model_type = obj._model.__class__.__name__ + obj.tile_px = tile_px + obj.wsi_normalizer = wsi_normalizer + obj.apply_softmax = apply_softmax + obj._build(pooling=pooling) + return obj + + def __call__( + self, + inp: Union[Tensor, "sf.WSI"], + **kwargs + ) -> Optional[Union[List[Tensor], np.ndarray]]: + """Process a given input and return activations and/or predictions. Expects + either a batch of images or a :class:`slideflow.slide.WSI` object. + + When calling on a `WSI` object, keyword arguments are passed to + :meth:`slideflow.WSI.build_generator()`. + + """ + if isinstance(inp, sf.slide.WSI): + return self._predict_slide(inp, **kwargs) + else: + return self._predict(inp, **kwargs) + + def __repr__(self): + return ("{}(\n".format(self.__class__.__name__) + + " path={!r},\n".format(self.path) + + " layers={!r},\n".format(self.layers) + + " include_preds={!r},\n".format(self.include_preds) + + " apply_softmax={!r},\n".format(self.apply_softmax) + + " pooling={!r},\n".format(self._pooling) + + ")") + + def _predict_slide( + self, + slide: "sf.WSI", + *, + img_format: str = 'auto', + batch_size: int = 32, + dtype: type = np.float16, + grid: Optional[np.ndarray] = None, + shuffle: bool = False, + show_progress: bool = True, + callback: Optional[Callable] = None, + normalizer: Optional[Union[str, "StainNormalizer"]] = None, + normalizer_source: Optional[str] = None, + **kwargs + ) -> Optional[np.ndarray]: + """Generate activations from slide => activation grid array.""" + + # Check image format + if img_format == 'auto' and self.img_format is None: + raise ValueError( + 'Unable to auto-detect image format (png or jpg). Set the ' + 'format by passing img_format=... to the call function.' + ) + elif img_format == 'auto': + assert self.img_format is not None + img_format = self.img_format + + return sf.model.extractors.features_from_slide( + self, + slide, + img_format=img_format, + batch_size=batch_size, + dtype=dtype, + grid=grid, + shuffle=shuffle, + show_progress=show_progress, + callback=callback, + normalizer=(normalizer if normalizer else self.wsi_normalizer), + normalizer_source=normalizer_source, + preprocess_fn=self.transform, + **kwargs + ) + + def _predict(self, inp: Tensor, no_grad: bool = True) -> List[Tensor]: + """Return activations for a single batch of images.""" + assert torch.is_floating_point(inp), "Input tensor must be float" + _mp = (self.mixed_precision and self.device.type in ('cuda', 'cpu')) + with autocast(self.device.type, mixed_precision=_mp): # type: ignore + with torch.inference_mode() if no_grad else no_scope(): + inp = inp.to(self.device) + if self.channels_last: + inp = inp.to(memory_format=torch.channels_last) + logits = self._model(inp) + if isinstance(logits, (tuple, list)) and self.apply_softmax: + logits = [softmax(l, dim=1) for l in logits] + elif self.apply_softmax: + logits = softmax(logits, dim=1) + + layer_activations = [] + if self.layers: + for la in self.layers: + act = self.activation[la] + if la == 'postconv': + act = self._postconv_processing(act) + layer_activations.append(act) + if self.include_preds: + layer_activations += [logits] + self.activation = {} + return layer_activations + + def _get_postconv(self): + """Returns post-convolutional layer.""" + + if self.model_type == 'ViT': + return self._model.to_latent + if self.model_type in ('ResNet', 'Inception3', 'GoogLeNet'): + return self._model.avgpool + if self.model_type in ('AlexNet', 'SqueezeNet', 'VGG', 'MobileNetV2', + 'MobileNetV3', 'MNASNet'): + if self._model.classifier.__class__.__name__ == 'Identity': + return self._model.classifier + else: + return next(self._model.classifier.children()) + if self.model_type == 'DenseNet': + return self._model.features.norm5 + if self.model_type == 'ShuffleNetV2': + return list(self._model.conv5.children())[1] + if self.model_type == 'Xception': + return self._model.bn4 + raise errors.FeaturesError(f"'postconv' layer not configured for " + f"model type {self.model_type}") + + def _postconv_processing(self, output: Tensor) -> Tensor: + """Applies processing (pooling, resizing) to post-conv outputs, + to convert output to the shape (batch_size, num_features)""" + + def pool(x): + return torch.nn.functional.adaptive_avg_pool2d(x, (1, 1)) + + def squeeze(x): + return x.view(x.size(0), -1) + + if self.model_type in ('ViT', 'AlexNet', 'VGG', 'MobileNetV2', + 'MobileNetV3', 'MNASNet'): + return output + if self.model_type in ('ResNet', 'Inception3', 'GoogLeNet'): + return squeeze(output) + if self.model_type in ('SqueezeNet', 'DenseNet', 'ShuffleNetV2', + 'Xception'): + return squeeze(pool(output)) + return output + + def _build(self, pooling: Optional[Any] = None) -> None: + """Builds the interface model that outputs feature activations at the + designated layers and/or predictions. Intermediate layers are returned in + the order of layers. predictions are returned last. + + Args: + pooling (Callable or str, optional): PyTorch pooling function to use + on feature layers. May be a string ('avg' or 'max') or a + callable PyTorch function. + """ + + self._pooling = pooling + + if isinstance(pooling, str): + if pooling == 'avg': + pooling = lambda x: torch.nn.functional.adaptive_avg_pool2d(x, (1, 1)) + elif pooling == 'max': + pooling = lambda x: torch.nn.functional.adaptive_max_pool2d(x, (1, 1)) + else: + raise ValueError(f"Unrecognized pooling value {pooling}. " + "Expected 'avg', 'max', or custom Tensor op.") + + self.activation = {} + + def squeeze(x): + return x.view(x.size(0), -1) + + def get_activation(name): + def hook(model, input, output): + if len(output.shape) == 4 and pooling is not None: + self.activation[name] = squeeze(pooling(output)).detach() + else: + self.activation[name] = output.detach() + return hook + + if isinstance(self.layers, list): + for la in self.layers: + if la == 'postconv': + self._get_postconv().register_forward_hook( + get_activation('postconv') + ) + else: + la_out = torch_utils.get_module_by_name(self._model, la) + la_out.register_forward_hook( + get_activation(la) + ) + elif self.layers is not None: + raise errors.FeaturesError(f"Unrecognized type {type(self.layers)}" + " for self.layers") + + # Calculate output and layer sizes + rand_data = torch.rand(1, 3, self.tile_px, self.tile_px) + output = self._model(rand_data.to(self.device)) + if isinstance(output, (tuple, list)) and self.include_preds: + log.warning("Multi-categorical outcomes is experimental " + "for this interface.") + self.num_classes = sum(o.shape[1] for o in output) + self.num_outputs = len(output) + elif self.include_preds: + self.num_classes = output.shape[1] + self.num_outputs = 1 + else: + self.num_classes = 0 + self.num_outputs = 0 + self.num_features = sum([f.shape[1] for f in self.activation.values()]) + + if self.include_preds: + log.debug(f'Number of classes: {self.num_classes}') + log.debug(f'Number of activation features: {self.num_features}') + + def dump_config(self): + return { + 'class': 'slideflow.model.torch.Features', + 'kwargs': { + 'path': self.path, + 'layers': self.layers, + 'include_preds': self.include_preds, + 'apply_softmax': self.apply_softmax, + 'pooling': self._pooling + } + }
+ + +class UncertaintyInterface(Features): + + def __init__( + self, + path: Optional[str], + layers: Optional[Union[str, List[str]]] = 'postconv', + *, + mixed_precision: bool = True, + channels_last: bool = True, + device: Optional[torch.device] = None, + apply_softmax: Optional[bool] = None, + pooling: Optional[Any] = None, + load_method: str = 'weights', + ) -> None: + super().__init__( + path, + layers=layers, + mixed_precision=mixed_precision, + channels_last=channels_last, + device=device, + apply_softmax=apply_softmax, + pooling=pooling, + load_method=load_method, + include_preds=True + ) + if self._model is not None: + torch_utils.enable_dropout(self._model) + # TODO: As the below to-do suggests, this should be updated + # for multi-class + self.num_uncertainty = 1 + if self.num_classes > 2: + log.warn("UncertaintyInterface not yet implemented for multi-class" + " models") + + @classmethod + def from_model(cls, *args, **kwargs): + if 'include_preds' in kwargs and not kwargs['include_preds']: + raise ValueError("UncertaintyInterface requires include_preds=True") + kwargs['include_preds'] = None + obj = super().from_model(*args, **kwargs) + torch_utils.enable_dropout(obj._model) + return obj + + def __repr__(self): + return ("{}(\n".format(self.__class__.__name__) + + " path={!r},\n".format(self.path) + + " layers={!r},\n".format(self.layers) + + " apply_softmax={!r},\n".format(self.apply_softmax) + + " pooling={!r},\n".format(self._pooling) + + ")") + + def _predict(self, inp: Tensor, no_grad: bool = True) -> List[Tensor]: + """Return activations (mean), predictions (mean), and uncertainty + (stdev) for a single batch of images.""" + + assert torch.is_floating_point(inp), "Input tensor must be float" + _mp = (self.mixed_precision and self.device.type in ('cuda', 'cpu')) + + out_pred_drop = [[] for _ in range(self.num_outputs)] + if self.layers: + out_act_drop = [[] for _ in range(len(self.layers))] + for _ in range(30): + with autocast(self.device.type, mixed_precision=_mp): # type: ignore + with torch.inference_mode() if no_grad else no_scope(): + inp = inp.to(self.device) + if self.channels_last: + inp = inp.to(memory_format=torch.channels_last) + logits = self._model(inp) + if isinstance(logits, (tuple, list)) and self.apply_softmax: + logits = [softmax(l, dim=1) for l in logits] + elif self.apply_softmax: + logits = softmax(logits, dim=1) + for n in range(self.num_outputs): + out_pred_drop[n] += [ + (logits[n] if self.num_outputs > 1 else logits) + ] + + layer_activations = [] + if self.layers: + for la in self.layers: + act = self.activation[la] + if la == 'postconv': + act = self._postconv_processing(act) + layer_activations.append(act) + for n in range(len(self.layers)): + out_act_drop[n].append(layer_activations[n] + ) + self.activation = {} + + for n in range(self.num_outputs): + out_pred_drop[n] = torch.stack(out_pred_drop[n], axis=0) + predictions = torch.mean(torch.cat(out_pred_drop), dim=0) + + # TODO: Only takes STDEV from first outcome category which works for + # outcomes with 2 categories, but a better solution is needed + # for num_categories > 2 + uncertainty = torch.std(torch.cat(out_pred_drop), dim=0)[:, 0] + uncertainty = torch.unsqueeze(uncertainty, axis=-1) + + if self.layers: + for n in range(self.layers): + out_act_drop[n] = torch.stack(out_act_drop[n], axis=0) + reduced_activations = [ + torch.mean(out_act_drop[n], dim=0) + for n in range(len(self.layers)) + ] + return reduced_activations + [predictions, uncertainty] + else: + return predictions, uncertainty + + def dump_config(self): + return { + 'class': 'slideflow.model.torch.UncertaintyInterface', + 'kwargs': { + 'path': self.path, + 'layers': self.layers, + 'apply_softmax': self.apply_softmax, + 'pooling': self._pooling + } + } + +# ----------------------------------------------------------------------------- + +
[docs]def load(path: str) -> torch.nn.Module: + """Load a model trained with Slideflow. + + Args: + path (str): Path to saved model. Must be a model trained in Slideflow. + + Returns: + torch.nn.Module: Loaded model. + """ + config = sf.util.get_model_config(path) + hp = ModelParams.from_dict(config['hp']) + if len(config['outcomes']) == 1 or config['model_type'] == 'regression': + num_classes = len(list(config['outcome_labels'].keys())) + else: + num_classes = { + outcome: len(list(config['outcome_labels'][outcome].keys())) + for outcome in config['outcomes'] + } + model = hp.build_model( + num_classes=num_classes, + num_slide_features=0 if not config['input_feature_sizes'] else sum(config['input_feature_sizes']), + pretrain=None + ) + if not torch.cuda.is_available(): + kw = dict(map_location=torch.device('cpu')) + else: + kw = dict() + model.load_state_dict(torch.load(path, **kw)) + return model
+ + +
[docs]def lazy_load_pretrained( + module: torch.nn.Module, + to_load: str +) -> None: + """Loads pretrained model weights into an existing module, ignoring + incompatible Tensors. + + Args: + module (torch.nn.Module): Destination module for weights. + to_load (str, torch.nn.Module): Module with weights to load. Either + path to PyTorch Slideflow model, or an existing PyTorch module. + + Returns: + None + """ + # Get state dictionaries + current_model_dict = module.state_dict() + if isinstance(to_load, str): + loaded_state_dict = torch.load(to_load) + else: + loaded_state_dict = to_load.state_dict() + + # Only transfer valid states + new_state_dict = {k:v if v.size()==current_model_dict[k].size() + else current_model_dict[k] + for k,v in zip(current_model_dict.keys(), + loaded_state_dict.values())} + n_states = len(list(new_state_dict.keys())) + log.info(f"Loaded {n_states} Tensor states from " + f"pretrained model [green] {to_load}") + module.load_state_dict(new_state_dict, strict=False)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/mosaic/index.html b/docs/_modules/slideflow/mosaic/index.html new file mode 100644 index 000000000..dc0693852 --- /dev/null +++ b/docs/_modules/slideflow/mosaic/index.html @@ -0,0 +1,1054 @@ + + + + + + + + + + + + slideflow.mosaic — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.mosaic

+from __future__ import absolute_import, division, print_function
+
+import csv
+import os
+import sys
+import time
+import warnings
+from functools import partial
+from multiprocessing.dummy import Pool as DPool
+from os.path import join
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
+
+import cv2
+import numpy as np
+from rich.progress import track
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.stats import SlideMap, get_centroid_index
+from slideflow.util import log
+from slideflow.stats import get_centroid_index
+
+if TYPE_CHECKING:
+    from slideflow.norm import StainNormalizer
+
+# -----------------------------------------------------------------------------
+
+def process_tile_image(args, decode_kwargs):
+    if args is None:
+        return None, None, None, None
+    point_index, x, y, display_size, alpha, image = args
+    if not point_index:
+        return None, None, None, None
+    if isinstance(image, tuple):
+        tfr, tfr_idx = image
+        image = sf.io.get_tfrecord_by_index(tfr, tfr_idx)['image_raw']
+    if image is None:
+        return point_index, None, None, None
+    if sf.model.is_tensorflow_tensor(image):
+        image = image.numpy()
+    image = decode_image(image, **decode_kwargs)
+    extent = [
+        x - display_size/2,
+        x + display_size/2,
+        y - display_size/2,
+        y + display_size/2
+    ]
+    return point_index, image, extent, alpha
+
+def decode_image(
+    image: Union[str, np.ndarray],
+    normalizer: Optional["StainNormalizer"],
+    img_format: str
+) -> np.ndarray:
+    """Internal method to convert an image string (as stored in TFRecords)
+    to an RGB array."""
+
+    if normalizer:
+        try:
+            if isinstance(image, np.ndarray):
+                return normalizer.rgb_to_rgb(image)
+            elif img_format in ('jpg', 'jpeg'):
+                return normalizer.jpeg_to_rgb(image)
+            elif img_format == 'png':
+                return normalizer.png_to_rgb(image)
+            else:
+                return normalizer.transform(image)
+        except Exception as e:
+            log.error("Error encountered during image normalization, "
+                        f"displaying image tile non-normalized. {e}")
+    if isinstance(image, np.ndarray):
+        return image
+    else:
+        image_arr = np.fromstring(image, np.uint8)
+        tile_image_bgr = cv2.imdecode(image_arr, cv2.IMREAD_COLOR)
+        return cv2.cvtColor(tile_image_bgr, cv2.COLOR_BGR2RGB)
+
+def find_corresponding_points(row, points):
+    return points.loc[((points.grid_x == row.x) & (points.grid_y == row.y))].index
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Mosaic: + """Visualization of plotted image tiles.""" + + def __init__( + self, + images: Union[SlideMap, List[np.ndarray], np.ndarray, List[Tuple[str, int]]], + coords: Optional[Union[Tuple[int, int], np.ndarray]] = None, + *, + tfrecords: List[str] = None, + normalizer: Optional[Union[str, "StainNormalizer"]] = None, + normalizer_source: Optional[str] = None, + **grid_kwargs + ) -> None: + """Generate a mosaic map, which visualizes plotted image tiles. + + Creating a mosaic map requires two components: a set of images and + corresponding coordinates. Images and coordinates can either be manually + provided, or the mosaic can dynamically read images from TFRecords as + needed, reducing memory requirements. + + The first argument provides the images, and may be any of the following: + + - A list or array of images (np.ndarray, HxWxC) + - A list of tuples, containing ``(slide_name, tfrecord_index)`` + - A ``slideflow.SlideMap`` object + + The second argument provides the coordinates, and may be any of: + + - A list or array of (x, y) coordinates for each image + - None (if the first argument is a ``SlideMap``, which has coordinates) + + If images are to be read dynamically from tfrecords (with a ``SlideMap``, + or by providing tfrecord indices directly), the keyword argument + ``tfrecords`` must be specified with paths to tfrecords. + + Published examples: + + - Figure 4: https://doi.org/10.1038/s41379-020-00724-3 + - Figure 6: https://doi.org/10.1038/s41467-022-34025-x + + Examples + Generate a mosaic map from a list of images and coordinates. + + .. code-block:: python + + # Example data (images are HxWxC, np.ndarray) + images = [np.ndarray(...), ...] + coords = [(0.2, 0.9), ...] + + # Generate the mosaic + mosaic = Mosaic(images, coordinates) + + Generate a mosaic map from tuples of TFRecord paths and indices. + + .. code-block:: python + + # Example data + paths = ['/path/to/tfrecord.tfrecords', ...] + idx = [253, 112, ...] + coords = [(0.2, 0.9), ...] + tuples = [(tfr, idx) for tfr, i in zip(paths, idx)] + + # Generate mosaic map + mosaic = sf.Mosaic(tuples, coords) + + Generate a mosaic map from a SlideMap and list of TFRecord paths. + + .. code-block:: python + + # Prepare a SlideMap from a project + P = sf.Project('/project/path') + ftrs = P.generate_features('/path/to/model') + slide_map = sf.SlideMap.from_features(ftrs) + + # Generate mosaic + mosaic = Mosaic(slide_map, tfrecords=ftrs.tfrecords) + + Args: + images (list(np.ndarray), tuple, :class:`slideflow.SlideMap`): + Images from which to generate the mosaic. May be a list or + array of images (np.ndarray, HxWxC), a list of tuples, + containing ``(slide_name, tfrecord_index)``, or a + ``slideflow.SlideMap`` object. + coords (list(str)): Coordinates for images. May be a list or array + of (x, y) coordinates for each image (of same length + as ``images``), or None (if ``images`` is a ``SlideMap`` object). + + Keyword args: + tfrecords (list(str), optional): TFRecord paths. Required if + ``images`` is either a ``SlideMap`` object or a list of tuples + containing ``(slide_name, tfrecord_index)``. Defaults to None. + num_tiles_x (int, optional): Mosaic map grid size. Defaults to 50. + tile_select (str, optional): 'first', 'nearest', or 'centroid'. + Determines how to choose a tile for display on each grid space. + If 'first', will display the first valid tile in a grid space + (fastest; recommended). If 'nearest', will display tile nearest + to center of grid space. If 'centroid', for each grid, will + calculate which tile is nearest to centroid tile_meta. + Defaults to 'nearest'. + tile_meta (dict, optional): Tile metadata, used for tile_select. + Dictionary should have slide names as keys, mapped to list of + metadata (length of list = number of tiles in slide). + Defaults to None. + normalizer ((str or :class:`slideflow.norm.StainNormalizer`), optional): + Normalization strategy to use on image tiles. Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + """ + self.tile_point_distances = [] # type: List[Dict] + self.slide_map = None + self.tfrecords = tfrecords + self.grid_images = {} + self.grid_coords = [] # type: np.ndarray + self.grid_idx = [] # type: np.ndarray + + if isinstance(images, SlideMap): + if tfrecords is None: + raise ValueError("If building a Mosaic from a SlideMap, must " + "provide paths to tfrecords via keyword arg " + "tfrecords=...") + elif isinstance(tfrecords, list) and not len(tfrecords): + raise errors.TFRecordsNotFoundError() + self._prepare_from_slidemap(images) + elif isinstance(images[0], (tuple, list)) and isinstance(images[0][0], str): + self._prepare_from_tuples(images, coords) # type: ignore + else: + assert coords is not None + assert len(images) == len(coords) + self._prepare_from_coords(images, coords) # type: ignore + + # --------------------------------------------------------------------- + + # Detect tfrecord image format + if self.tfrecords is not None: + _, self.img_format = sf.io.detect_tfrecord_format(self.tfrecords[0]) + else: + self.img_format = 'numpy' + + # Setup normalization + if isinstance(normalizer, str): + log.info(f'Using realtime {normalizer} normalization') + self.normalizer = sf.norm.autoselect( + method=normalizer, + source=normalizer_source + ) # type: Optional[StainNormalizer] + elif normalizer is not None: + self.normalizer = normalizer + else: + self.normalizer = None + + self.generate_grid(**grid_kwargs) + + def _prepare_from_coords( + self, + images: Union[List[np.ndarray], np.ndarray], + coords: List[Union[Tuple[int, int], np.ndarray]] + ) -> None: + """Prepare the Mosaic map from a set of images and coordinates.""" + log.info('Loading coordinates and plotting points...') + self.images = images + self.mapped_tiles = [] # type: List[int] + self.points = [{ + 'coord': coords[i], + 'global_index': i, + 'category': 'none', + 'has_paired_tile': False, + } for i in range(len(coords))] + + def _prepare_from_slidemap( + self, + slide_map: SlideMap, + *, + tile_meta: Optional[Dict] = None, + ) -> None: + """Prepare the Mosaic map from a ``SlideMap`` object.""" + log.info('Loading coordinates from SlideMap and plotting points...') + self.slide_map = slide_map + self.mapped_tiles = {} # type: Dict[str, List[int]] + self.points = slide_map.data.copy() + self.points['has_paired_tile'] = False + self.points['points_index'] = self.points.index + self.points['alpha'] = 1. + if tile_meta: + self.points['meta'] = self.points.apply(lambda row: tile_meta[row.slide][row.tfr_index], axis=1) + log.debug("Loading complete.") + + def _prepare_from_tuples( + self, + images: List[Tuple[str, int]], + coords: List[Union[Tuple[int, int], np.ndarray]], + ) -> None: + """Prepare from a list of tuples with TFRecord names/indices.""" + log.info('Loading coordinates from SlideMap and plotting points...') + self.mapped_tiles = {} # type: Dict[str, List[int]] + self.points = [] + for i, (tfr, idx) in enumerate(images): + self.points.append({ + 'coord': np.array(coords[i]), + 'global_index': i, + 'category': 'none', + 'slide': (tfr if self.tfrecords is not None + else sf.util.path_to_name(tfr)), + 'tfrecord': (tfr if self.tfrecords is None + else self._get_tfrecords_from_slide(tfr)), + 'tfrecord_index': idx, + 'has_paired_tile': None, + }) + + def _get_image_from_point(self, index): + point = self.points.loc[index] + if 'tfr_index' in point: + tfr = self._get_tfrecords_from_slide(point.slide) + tfr_idx = point.tfr_index + if not tfr: + log.error(f"TFRecord {tfr} not found in slide_map") + return None + image = sf.io.get_tfrecord_by_index(tfr, tfr_idx)['image_raw'] + else: + image = self.images[index] + return image + + def _get_tfrecords_from_slide(self, slide: str) -> Optional[str]: + """Using the internal list of TFRecord paths, returns the path to a + TFRecord for a given corresponding slide.""" + for tfr in self.tfrecords: + if sf.util.path_to_name(tfr) == slide: + return tfr + log.error(f'Unable to find TFRecord path for slide [green]{slide}') + return None + + def _initialize_figure(self, figsize, background): + import matplotlib.pyplot as plt + fig = plt.figure(figsize=figsize) + self.ax = fig.add_subplot(111, aspect='equal') + self.ax.set_facecolor(background) + fig.tight_layout() + plt.subplots_adjust( + left=0.02, + bottom=0, + right=0.98, + top=1, + wspace=0.1, + hspace=0 + ) + self.ax.set_aspect('equal', 'box') + self.ax.set_xticklabels([]) + self.ax.set_yticklabels([]) + + def _plot_tile_image(self, image, extent, alpha=1): + return self.ax.imshow( + image, + aspect='equal', + origin='lower', + extent=extent, + zorder=99, + alpha=alpha + ) + + def _finalize_figure(self): + self.ax.autoscale(enable=True, tight=None) + + def _record_point(self, index): + point = self.points.loc[index] + if 'tfr_index' in point: + tfr = self._get_tfrecords_from_slide(point.slide) + if tfr is None: + return + if tfr in self.mapped_tiles: + self.mapped_tiles[tfr] += [point.tfr_index] + else: + self.mapped_tiles[tfr] = [point.tfr_index] + else: + self.mapped_tiles += [index] + + @property + def decode_kwargs(self): + return dict(normalizer=self.normalizer, img_format=self.img_format) + + def points_at_grid_index(self, x, y): + return self.points.loc[((self.points.grid_x == x) & (self.points.grid_y == y))] + + def selected_points(self): + return self.points.loc[self.points.selected] + + def generate_grid( + self, + num_tiles_x: int = 50, + tile_meta: Optional[Dict] = None, + tile_select: str = 'first', + max_dist: Optional[float] = None, + ): + """Generate the mosaic map grid. + + Args: + num_tiles_x (int, optional): Mosaic map grid size. Defaults to 50. + tile_meta (dict, optional): Tile metadata, used for tile_select. + Dictionary should have slide names as keys, mapped to list of + metadata (length of list = number of tiles in slide). + Defaults to None. + tile_select (str, optional): 'first', 'nearest', or 'centroid'. + Determines how to choose a tile for display on each grid space. + If 'first', will display the first valid tile in a grid space + (fastest; recommended). If 'nearest', will display tile nearest + to center of grid space. If 'centroid', for each grid, will + calculate which tile is nearest to centroid tile_meta. + Defaults to 'nearest'. + """ + # Initial validation checks + if tile_select not in ('nearest', 'centroid', 'first'): + raise TypeError(f'Unknown tile selection method {tile_select}') + else: + log.debug(f'Tile selection method: {tile_select}') + self.num_tiles_x = num_tiles_x + self.grid_images = {} + + # Build the grid + x_points = self.points.x.values + y_points = self.points.y.values + max_x = x_points.max() + min_x = x_points.min() + max_y = y_points.max() + min_y = y_points.min() + log.debug(f'Loaded {len(self.points)} points.') + + self.tile_size = (max_x - min_x) / self.num_tiles_x + self.num_tiles_y = int((max_y - min_y) / self.tile_size) + + self.grid_idx = np.reshape(np.dstack(np.indices((self.num_tiles_x, self.num_tiles_y))), (self.num_tiles_x * self.num_tiles_y, 2)) + _grid_offset = np.array([(self.tile_size/2) + min_x, (self.tile_size/2) + min_y]) + self.grid_coords = (self.grid_idx * self.tile_size) + _grid_offset + + points_added = 0 + x_bins = np.arange(min_x, max_x, ((max_x - min_x) / self.num_tiles_x))[1:] + y_bins = np.arange(min_y, max_y, ((max_y - min_y) / self.num_tiles_y))[1:] + self.points['grid_x'] = np.digitize(self.points.x.values, x_bins, right=False) + self.points['grid_y'] = np.digitize(self.points.y.values, y_bins, right=False) + self.points['selected'] = False + log.debug(f'{points_added} points added to grid') + + # Then, calculate distances from each point to each spot on the grid + def select_nearest_points(idx): + grid_x, grid_y = self.grid_idx[idx][0], self.grid_idx[idx][1] + grid_coords = self.grid_coords[idx] + # Calculate distance for each point within the grid tile from + # center of the grid tile + _points = self.points_at_grid_index(grid_x, grid_y) + if not _points.empty: + if tile_select == 'nearest': + point_coords = np.stack([_points.x.values, _points.y.values], axis=-1) + dist = np.linalg.norm( + point_coords - grid_coords, + ord=2, + axis=1. + ) + if max_dist is not None: + masked_dist = np.ma.masked_array(dist, (dist >= (max_dist * self.tile_size))) + if masked_dist.count(): + self.points.loc[_points.index[np.argmin(masked_dist)], 'selected'] = True + else: + self.points.loc[_points.index[np.argmin(dist)], 'selected'] = True + elif not tile_meta: + raise errors.MosaicError( + 'Mosaic centroid option requires tile_meta.' + ) + else: + centroid_index = get_centroid_index(_points.meta.values) + self.points.loc[_points.index[centroid_index], 'selected'] = True + + start = time.time() + + if tile_select == 'first': + grid_group = self.points.groupby(['grid_x', 'grid_y']) + first_indices = grid_group.nth(0).points_index.values + self.points.loc[first_indices, 'selected'] = True + elif tile_select in ('nearest', 'centroid'): + self.points['selected'] = False + dist_fn = partial(select_nearest_points) + pool = DPool(sf.util.num_cpu()) + for i, _ in track(enumerate(pool.imap_unordered(dist_fn, range(len(self.grid_idx))), 1), total=len(self.grid_idx)): + pass + pool.close() + pool.join() + else: + raise ValueError( + f'Unrecognized value for tile_select: "{tile_select}"' + ) + end = time.time() + if sf.getLoggingLevel() <= 20: + sys.stdout.write('\r\033[K') + log.debug(f'Tile image selection complete ({end - start:.1f} sec)') + + def export(self, path: str) -> None: + """Export SlideMap and configuration for later loading. + + Args: + path (str): Directory in which to save configuration. + + """ + if self.slide_map is None: + raise ValueError( + "Mosaic.export() requires a Mosaic built from a SlideMap." + ) + self.slide_map.save(path) + if isinstance(self.tfrecords, list): + tfr = self.tfrecords + else: + tfr = list(self.tfrecords) + sf.util.write_json(tfr, join(path, 'tfrecords.json')) + log.info(f"Mosaic configuration exported to {path}") + + def plot( + self, + figsize: Tuple[int, int] = (200, 200), + focus: Optional[List[str]] = None, + focus_slide: Optional[str] = None, + background: str = '#dfdfdf', + pool: Optional[Any] = None, + ) -> None: + """Initializes figures and places image tiles. + + If in a Jupyter notebook, the heatmap will be displayed in the cell + output. If running via script or shell, the heatmap can then be + shown on screen using matplotlib ``plt.show()``: + + .. code-block:: + + import slideflow as sf + import matplotlib.pyplot as plt + + heatmap = sf.Heatmap(...) + heatmap.plot() + plt.show() + + Args: + figsize (Tuple[int, int], optional): Figure size. Defaults to + (200, 200). + focus (list, optional): List of tfrecords (paths) to highlight + on the mosaic. Defaults to None. + focus_slide (str, optional): Highlight tiles from this slide. + Defaults to None. + """ + if (focus is not None or focus_slide is not None) and self.tfrecords is None: + raise ValueError("Unable to plot with focus; slides/tfrecords not configured.") + + log.debug("Initializing figure...") + self._initialize_figure(figsize=figsize, background=background) + + # Reset alpha and display size + if focus_slide: + self.points['alpha'] = 1. + self.points['display_size'] = self.tile_size + + if focus_slide: + for idx in self.grid_idx: + _points = self.points_at_grid_index(x=idx[0], y=idx[1]) + if not _points.empty and focus_slide: + n_matching = len(_points.loc[_points.slide == focus_slide]) + self.points.loc[_points.index, 'alpha'] = n_matching / len(_points) + + # Then, pair grid tiles and points according to their distances + log.info('Placing image tiles...') + placed = 0 + start = time.time() + to_map = [] + should_close_pool = False + has_tfr = 'tfr_index' in self.points.columns + selected_points = self.selected_points() + + for idx, point in selected_points.iterrows(): + if has_tfr: + tfr = self._get_tfrecords_from_slide(point.slide) + tfr_idx = point.tfr_index + if tfr: + image = (tfr, tfr_idx) + else: + log.error(f"TFRecord {tfr} not found in slide_map") + image = None + else: + image = self.images[idx] + to_map.append((idx, point.grid_x * self.tile_size, point.grid_y * self.tile_size, point.display_size, point.alpha, image)) + + if pool is None: + pool = DPool(sf.util.num_cpu()) + should_close_pool = True + for i, (point_idx, image, extent, alpha) in track(enumerate(pool.imap(partial(process_tile_image, decode_kwargs=self.decode_kwargs), to_map)), total=len(selected_points)): + if point_idx is not None: + self._record_point(point_idx) + self._plot_tile_image(image, extent, alpha) + point = self.points.loc[point_idx] + self.grid_images[(point.grid_x, point.grid_y)] = image + placed += 1 + + if should_close_pool: + pool.close() + pool.join() + log.debug(f'Tile images placed: {placed} ({time.time()-start:.2f}s)') + if focus: + self.focus(focus) + self._finalize_figure() + + def save(self, filename: str, **kwargs: Any) -> None: + """Saves the mosaic map figure to the given filename. + + Args: + filename (str): Path at which to save the mosiac image. + + Keyword args: + figsize (Tuple[int, int], optional): Figure size. Defaults to + (200, 200). + focus (list, optional): List of tfrecords (paths) to highlight on + the mosaic. + """ + with sf.util.matplotlib_backend('Agg'): + import matplotlib.pyplot as plt + + self.plot(**kwargs) + log.info('Exporting figure...') + try: + if not os.path.exists(os.path.dirname(filename)): + os.makedirs(os.path.dirname(filename)) + except FileNotFoundError: + pass + plt.savefig(filename, bbox_inches='tight') + log.info(f'Saved figure to [green]{filename}') + plt.close() + + def save_report(self, filename: str) -> None: + """Saves a report of which tiles (and their corresponding slide) + were displayed on the Mosaic map, in CSV format.""" + with open(filename, 'w') as f: + writer = csv.writer(f) + writer.writerow(['slide', 'index']) + if isinstance(self.mapped_tiles, dict): + for tfr in self.mapped_tiles: + for idx in self.mapped_tiles[tfr]: + writer.writerow([tfr, idx]) + else: + for idx in self.mapped_tiles: + writer.writerow([idx]) + log.info(f'Mosaic report saved to [green]{filename}') + + def view(self, slides: List[str] = None) -> None: + """Open Mosaic in Slideflow Studio. + + See :ref:`studio` for more information. + + Args: + slides (list(str), optional): Path to whole-slide images. Used for + displaying image tile context when hovering over a mosaic grid. + Defaults to None. + + """ + from slideflow.studio.widgets import MosaicWidget + from slideflow.studio import Studio + + studio = Studio(widgets=[MosaicWidget]) + mosaic = studio.get_widget('MosaicWidget') + mosaic.load( + self.slide_map, + tfrecords=self.tfrecords, + slides=slides, + normalizer=self.normalizer + ) + studio.run()
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/norm/index.html b/docs/_modules/slideflow/norm/index.html new file mode 100644 index 000000000..53409835b --- /dev/null +++ b/docs/_modules/slideflow/norm/index.html @@ -0,0 +1,1229 @@ + + + + + + + + + + + + slideflow.norm — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.norm

+"""H&E stain normalization and augmentation tools."""
+
+from __future__ import absolute_import
+
+import os
+import sys
+import multiprocessing as mp
+from io import BytesIO
+from functools import partial
+from typing import TYPE_CHECKING, Any, Dict, Optional, Tuple, Union
+
+import cv2
+import numpy as np
+import slideflow as sf
+from PIL import Image
+from contextlib import contextmanager
+from rich.progress import Progress
+from slideflow import errors
+from slideflow.dataset import Dataset
+from slideflow.util import detuple, log, cleanup_progress, _as_list
+from slideflow.norm import (augment, macenko, reinhard, vahadane)
+
+if TYPE_CHECKING:
+    import tensorflow as tf
+    import torch
+
+
+
[docs]class StainNormalizer: + + vectorized = False + normalizers = { + 'macenko': macenko.MacenkoNormalizer, + 'macenko_fast': macenko.MacenkoFastNormalizer, + 'reinhard': reinhard.ReinhardNormalizer, + 'reinhard_fast': reinhard.ReinhardFastNormalizer, + 'reinhard_mask': reinhard.ReinhardMaskNormalizer, + 'reinhard_fast_mask': reinhard.ReinhardFastMaskNormalizer, + 'vahadane': vahadane.VahadaneSpamsNormalizer, + 'vahadane_sklearn': vahadane.VahadaneSklearnNormalizer, + 'vahadane_spams': vahadane.VahadaneSpamsNormalizer, + 'augment': augment.AugmentNormalizer + } # type: Dict[str, Any] + + def __init__(self, method: str, **kwargs) -> None: + """H&E Stain normalizer supporting various normalization methods. + + The stain normalizer supports numpy images, PNG or JPG strings, + Tensorflow tensors, and PyTorch tensors. The default ``.transform()`` + method will attempt to preserve the original image type while + minimizing conversions to and from Tensors. + + Alternatively, you can manually specify the image conversion type + by using the appropriate function. For example, to convert a Tensor + to a normalized numpy RGB image, use ``.tf_to_rgb()``. + + Args: + method (str): Normalization method. Options include 'macenko', + 'reinhard', 'reinhard_fast', 'reinhard_mask', + 'reinhard_fast_mask', 'vahadane', 'vahadane_spams', + 'vahadane_sklearn', and 'augment'. + + Keyword args: + stain_matrix_target (np.ndarray, optional): Set the stain matrix + target for the normalizer. May raise an error if the normalizer + does not have a stain_matrix_target fit attribute. + target_concentrations (np.ndarray, optional): Set the target + concentrations for the normalizer. May raise an error if the + normalizer does not have a target_concentrations fit attribute. + target_means (np.ndarray, optional): Set the target means for the + normalizer. May raise an error if the normalizer does not have + a target_means fit attribute. + target_stds (np.ndarray, optional): Set the target standard + deviations for the normalizer. May raise an error if the + normalizer does not have a target_stds fit attribute. + + Raises: + ValueError: If the specified normalizer method is not available. + + Examples + Normalize a numpy image using the default fit. + + >>> import slideflow as sf + >>> macenko = sf.norm.StainNormalizer('macenko') + >>> macenko.transform(image) + + Fit the normalizer to a target image (numpy or path). + + >>> macenko.fit(target_image) + + Fit the normalizer using a preset configuration. + + >>> macenko.fit('v2') + + Fit the normalizer to all images in a Dataset. + + >>> dataset = sf.Dataset(...) + >>> macenko.fit(dataset) + + Normalize an image and convert from Tensor to numpy array (RGB). + + >>> macenko.tf_to_rgb(image) + + Normalize images during DataLoader pre-processing. + + >>> dataset = sf.Dataset(...) + >>> dataloader = dataset.torch(..., normalizer=macenko) + >>> dts = dataset.tensorflow(..., normalizer=macenko) + + """ + if method not in self.normalizers: + raise ValueError(f"Unrecognized normalizer method {method}") + + self.method = method + self.n = self.normalizers[method]() + + if kwargs: + self.n.fit(**kwargs) + + def __repr__(self): + base = "{}(\n".format(self.__class__.__name__) + base += " method = {!r},\n".format(self.method) + for fit_param, fit_val in self.get_fit().items(): + base += " {} = {!r},\n".format(fit_param, fit_val) + base += ")" + return base + + @property + def device(self) -> str: + return 'cpu' + + def _torch_transform( + self, + inp: "torch.Tensor", + *, + augment: bool = False + ) -> "torch.Tensor": + """Normalize a torch uint8 image (CWH). + + Normalization ocurs via intermediate conversion to WHC. + + Args: + inp (torch.Tensor): Image, uint8. Images are normalized in + W x H x C space. Images provided as C x W x H will be + auto-converted and permuted back after normalization. + + Returns: + torch.Tensor: Image, uint8. + + """ + import torch + from slideflow.io.torch import cwh_to_whc, whc_to_cwh, is_cwh + + if len(inp.shape) == 4: + return torch.stack([self._torch_transform(img) for img in inp]) + elif is_cwh(inp): + # Convert from CWH -> WHC (normalize) -> CWH + return whc_to_cwh( + torch.from_numpy( + self.rgb_to_rgb( + cwh_to_whc(inp).cpu().numpy(), + augment=augment + ) + ) + ) + else: + return torch.from_numpy( + self.rgb_to_rgb(inp.cpu().numpy(), augment=augment) + ) + + def _torch_augment(self, inp: "torch.Tensor") -> "torch.Tensor": + """Augment a torch uint8 image (CWH). + + Augmentation ocurs via intermediate conversion to WHC. + + Args: + inp (torch.Tensor): Image, uint8. Images are normalized in + W x H x C space. Images provided as C x W x H will be + auto-converted and permuted back after normalization. + + Returns: + torch.Tensor: Image, uint8. + + """ + import torch + from slideflow.io.torch import cwh_to_whc, whc_to_cwh, is_cwh + + if len(inp.shape) == 4: + return torch.stack([self._torch_augment(img) for img in inp]) + elif is_cwh(inp): + # Convert from CWH -> WHC (normalize) -> CWH + return whc_to_cwh( + torch.from_numpy( + self.augment_rgb(cwh_to_whc(inp).cpu().numpy()) + ) + ) + else: + return torch.from_numpy(self.augment_rgb(inp.cpu().numpy())) + + def fit( + self, + arg1: Optional[Union[Dataset, np.ndarray, str]], + batch_size: int = 64, + num_threads: Union[str, int] = 'auto', + **kwargs, + ) -> "StainNormalizer": + """Fit the normalizer to a target image or dataset of images. + + Args: + arg1: (Dataset, np.ndarray, str): Target to fit. May be a str, + numpy image array (uint8), path to an image, or a Slideflow + Dataset. If this is a string, will fit to the corresponding + preset fit (either 'v1', 'v2', or 'v3'). + If a Dataset is provided, will average fit values across + all images in the dataset. + batch_size (int, optional): Batch size during fitting, if fitting + to dataset. Defaults to 64. + num_threads (Union[str, int], optional): Number of threads to use + during fitting, if fitting to a dataset. Defaults to 'auto'. + """ + + # Fit to a dataset + if isinstance(arg1, Dataset): + # Set up thread pool + if num_threads == 'auto': + num_threads = sf.util.num_cpu(default=8) # type: ignore + log.debug(f"Setting up pool (size={num_threads}) for norm fitting") + log.debug(f"Using normalizer batch size of {batch_size}") + pool = mp.dummy.Pool(num_threads) # type: ignore + + dataset = arg1 + if sf.backend() == 'tensorflow': + dts = dataset.tensorflow( + None, + batch_size, + standardize=False, + infinite=False + ) + elif sf.backend() == 'torch': + dts = dataset.torch( + None, + batch_size, + standardize=False, + infinite=False, + num_workers=8 + ) + all_fit_vals = [] # type: ignore + pb = Progress(transient=True) + task = pb.add_task('Fitting normalizer...', total=dataset.num_tiles) + pb.start() + with cleanup_progress(pb): + for img_batch, slide in dts: + if sf.model.is_torch_tensor(img_batch): + img_batch = img_batch.permute(0, 2, 3, 1) # BCWH -> BWHC + + mapped = pool.imap(lambda x: self.n.fit(x.numpy()), img_batch) + for fit_vals in mapped: + if all_fit_vals == []: + all_fit_vals = [[] for _ in range(len(fit_vals))] + for v, val in enumerate(fit_vals): + all_fit_vals[v] += [np.squeeze(val)] + pb.advance(task, batch_size) + self.n.set_fit(*[np.array(v).mean(axis=0) for v in all_fit_vals]) + pool.close() + + # Fit to numpy image + elif isinstance(arg1, np.ndarray): + self.n.fit(arg1, **kwargs) + + # Fit to a preset + elif (isinstance(arg1, str) + and arg1 in sf.norm.utils.fit_presets[self.n.preset_tag]): + self.n.fit_preset(arg1, **kwargs) + + # Fit to a path to an image + elif isinstance(arg1, str): + self.src_img = cv2.cvtColor(cv2.imread(arg1), cv2.COLOR_BGR2RGB) + self.n.fit(self.src_img, **kwargs) + + elif arg1 is None and kwargs: + self.set_fit(**kwargs) + + else: + raise ValueError(f'Unrecognized args for fit: {arg1}') + + log.debug('Fit normalizer: {}'.format( + ', '.join([f"{fit_key} = {fit_val}" + for fit_key, fit_val in self.get_fit().items()]) + )) + return self + + def get_fit(self, as_list: bool = False): + """Get the current normalizer fit. + + Args: + as_list (bool). Convert the fit values (numpy arrays) to list + format. Defaults to False. + + Returns: + Dict[str, np.ndarray]: Dictionary mapping fit parameters (e.g. + 'target_concentrations') to their respective fit values. + """ + _fit = self.n.get_fit() + if as_list: + return {k: _as_list(v) for k, v in _fit.items()} + else: + return _fit + + def set_fit(self, **kwargs) -> None: + """Set the normalizer fit to the given values. + + Keyword args: + stain_matrix_target (np.ndarray, optional): Set the stain matrix + target for the normalizer. May raise an error if the normalizer + does not have a stain_matrix_target fit attribute. + target_concentrations (np.ndarray, optional): Set the target + concentrations for the normalizer. May raise an error if the + normalizer does not have a target_concentrations fit attribute. + target_means (np.ndarray, optional): Set the target means for the + normalizer. May raise an error if the normalizer does not have + a target_means fit attribute. + target_stds (np.ndarray, optional): Set the target standard + deviations for the normalizer. May raise an error if the + normalizer does not have a target_stds fit attribute. + """ + self.n.set_fit(**{k:v for k, v in kwargs.items() if v is not None}) + + def set_augment(self, preset: Optional[str] = None, **kwargs) -> None: + """Set the normalizer augmentation space. + + Args: + preset (str, optional): Augmentation preset. Defaults to None. + + Keyword args: + matrix_stdev (np.ndarray): Standard deviation + of the stain matrix target. Must have the shape (3, 2). + Used for Macenko normalizers. + Defaults to None (will not augment stain matrix). + concentrations_stdev (np.ndarray): Standard deviation + of the target concentrations. Must have the shape (2,). + Used for Macenko normalizers. + Defaults to None (will not augment target concentrations). + means_stdev (np.ndarray): Standard deviation + of the target means. Must have the shape (3,). + Used for Reinhard normalizers. + Defaults to None (will not augment target means). + stds_stdev (np.ndarray): Standard deviation + of the target stds. Must have the shape (3,). + Used for Reinhard normalizers. + Defaults to None (will not augment target stds). + + """ + if preset is not None: + return self.n.augment_preset(preset) + if kwargs: + self.n.set_augment(**{k:v for k, v in kwargs.items() if v is not None}) + + def transform( + self, + image: Union[np.ndarray, "tf.Tensor", "torch.Tensor"], + *, + augment: bool = False + ) -> Union[np.ndarray, "tf.Tensor", "torch.Tensor"]: + """Normalize a target image, attempting to preserve the original type. + + Args: + image (np.ndarray, tf.Tensor, or torch.Tensor): Image as a uint8 + array. Numpy and Tensorflow images are normalized in W x H x C + space. PyTorch images provided as C x W x H will be + auto-converted and permuted back after normalization. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + Normalized image of the original type (uint8). + """ + if isinstance(image, (str, bytes)): + raise ValueError("Unable to auto-transform bytes or str; please " + "use .png_to_png() or .jpeg_to_jpeg().") + if 'tensorflow' in sys.modules: + import tensorflow as tf + if isinstance(image, tf.Tensor): + return self.tf_to_tf(image, augment=augment) + if 'torch' in sys.modules: + import torch + if isinstance(image, torch.Tensor): + return self.torch_to_torch(image, augment=augment) + if isinstance(image, np.ndarray): + return self.rgb_to_rgb(image, augment=augment) + raise ValueError(f"Unrecognized image type {type(image)}; expected " + "np.ndarray, tf.Tensor, or torch.Tensor") + + def augment( + self, + image: Union[np.ndarray, "tf.Tensor", "torch.Tensor"] + ) -> Union[np.ndarray, "tf.Tensor", "torch.Tensor"]: + """Augment a target image, attempting to preserve the original type. + + Args: + image (np.ndarray, tf.Tensor, or torch.Tensor): Image as a uint8 + array. Numpy and Tensorflow images are normalized in W x H x C + space. PyTorch images provided as C x W x H will be + auto-converted and permuted back after normalization. + + Returns: + Augmented image of the original type (uint8). + """ + if not hasattr(self.n, 'augment'): + raise errors.AugmentationNotSupportedError( + f"Normalizer {self.method} does not support augmentation.") + if isinstance(image, (str, bytes)): + raise ValueError("Unable to augment bytes or str; image " + "must first be converted to an array or Tensor.") + + if 'tensorflow' in sys.modules: + import tensorflow as tf + if isinstance(image, tf.Tensor): + if isinstance(image, dict): + image['tile_image'] = tf.py_function( + self.augment_rgb, + [image['tile_image']], + tf.uint8 + ) + elif len(image.shape) == 4: + image = tf.stack([self.augment_rgb(_i) for _i in image]) + else: + image = tf.py_function( + self.augment_rgb, + [image], + tf.uint8 + ) + return image + + if 'torch' in sys.modules: + import torch + if isinstance(image, torch.Tensor): + if isinstance(image, dict): + to_return = { + k: v for k, v in image.items() + if k != 'tile_image' + } + to_return['tile_image'] = self._torch_augment( + image['tile_image'] + ) + return to_return + else: + return self._torch_augment(image) + + if isinstance(image, np.ndarray): + return self.augment_rgb(image) + raise ValueError(f"Unrecognized image type {type(image)}; expected " + "np.ndarray, tf.Tensor, or torch.Tensor") + + def augment_rgb(self, image: np.ndarray) -> np.ndarray: + """Augment a numpy array (uint8), returning a numpy array (uint8). + + Args: + image (np.ndarray): Image (uint8). + + Returns: + np.ndarray: Augmented image, uint8, W x H x C. + """ + return self.n.augment(image) + + def jpeg_to_jpeg( + self, + jpeg_string: Union[str, bytes], + *, + quality: int = 100, + augment: bool = False + ) -> bytes: + """Normalize a JPEG image, returning a JPEG image. + + Args: + jpeg_string (str, bytes): JPEG image data. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + quality (int, optional): Quality level for creating the resulting + normalized JPEG image. Defaults to 100. + + Returns: + bytes: Normalized JPEG image. + """ + cv_image = self.jpeg_to_rgb(jpeg_string, augment=augment) + with BytesIO() as output: + Image.fromarray(cv_image).save( + output, + format="JPEG", + quality=quality + ) + return output.getvalue() + + def jpeg_to_rgb( + self, + jpeg_string: Union[str, bytes], + *, + augment: bool = False + ) -> np.ndarray: + """Normalize a JPEG image, returning a numpy uint8 array. + + Args: + jpeg_string (str, bytes): JPEG image data. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + np.ndarray: Normalized image, uint8, W x H x C. + """ + cv_image = cv2.imdecode( + np.fromstring(jpeg_string, dtype=np.uint8), + cv2.IMREAD_COLOR + ) + cv_image = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB) + return self.rgb_to_rgb(cv_image, augment=augment) + + def png_to_png( + self, + png_string: Union[str, bytes], + *, + augment: bool = False + ) -> bytes: + """Normalize a PNG image, returning a PNG image. + + Args: + png_string (str, bytes): PNG image data. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + bytes: Normalized PNG image. + """ + cv_image = self.png_to_rgb(png_string, augment=augment) + with BytesIO() as output: + Image.fromarray(cv_image).save(output, format="PNG") + return output.getvalue() + + def png_to_rgb( + self, + png_string: Union[str, bytes], + *, + augment: bool = False + ) -> np.ndarray: + """Normalize a PNG image, returning a numpy uint8 array. + + Args: + png_string (str, bytes): PNG image data. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + np.ndarray: Normalized image, uint8, W x H x C. + """ + return self.jpeg_to_rgb(png_string, augment=augment) # It should auto-detect format + + def rgb_to_rgb( + self, + image: np.ndarray, + *, + augment: bool = False + ) -> np.ndarray: + """Normalize a numpy array (uint8), returning a numpy array (uint8). + + Args: + image (np.ndarray): Image (uint8). + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + np.ndarray: Normalized image, uint8, W x H x C. + """ + return self.n.transform(image, augment=augment) + + def tf_to_rgb( + self, + image: "tf.Tensor", + *, + augment: bool = False + ) -> np.ndarray: + """Normalize a tf.Tensor (uint8), returning a numpy array (uint8). + + Args: + image (tf.Tensor): Image (uint8). + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + np.ndarray: Normalized image, uint8, W x H x C. + """ + return self.rgb_to_rgb(np.array(image), augment=augment) + + def tf_to_tf( + self, + image: Union[Dict, "tf.Tensor"], + *args: Any, + augment: bool = False + ) -> Tuple[Union[Dict, "tf.Tensor"], ...]: + """Normalize a tf.Tensor (uint8), returning a numpy array (uint8). + + Args: + image (tf.Tensor, Dict): Image (uint8) either as a raw Tensor, + or a Dictionary with the image under the key 'tile_image'. + args (Any, optional): Any additional arguments, which will be passed + and returned unmodified. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + A tuple containing the normalized tf.Tensor image (uint8, + W x H x C) and any additional arguments provided. + """ + import tensorflow as tf + + if isinstance(image, dict): + image['tile_image'] = tf.py_function( + partial(self.tf_to_rgb, augment=augment), + [image['tile_image']], + tf.uint8 + ) + elif len(image.shape) == 4: + image = tf.stack([self.tf_to_tf(_i, augment=augment) for _i in image]) + else: + image = tf.py_function( + partial(self.tf_to_rgb, augment=augment), + [image], + tf.uint8 + ) + return detuple(image, args) + + def torch_to_torch( + self, + image: Union[Dict, "torch.Tensor"], + *args, + augment: bool = False + ) -> Tuple[Union[Dict, "torch.Tensor"], ...]: + """Normalize a torch.Tensor (uint8), returning a numpy array (uint8). + + Args: + image (torch.Tensor, Dict): Image (uint8) either as a raw Tensor, + or a Dictionary with the image under the key 'tile_image'. + args (Any, optional): Any additional arguments, which will be passed + and returned unmodified. + + Keyword args: + augment (bool): Transform using stain aumentation. + Defaults to False. + + Returns: + A tuple containing + + np.ndarray: Normalized torch.Tensor image, uint8 (channel dimension matching the input image) + + args (Any, optional): Any additional arguments provided, unmodified. + """ + if isinstance(image, dict): + to_return = { + k: v for k, v in image.items() + if k != 'tile_image' + } + to_return['tile_image'] = self._torch_transform( + image['tile_image'], + augment=augment + ) + return detuple(to_return, args) + else: + return detuple(self._torch_transform(image, augment=augment), args) + + # --- Context management -------------------------------------------------- + + @contextmanager + def context( + self, + context: Union[str, "sf.WSI", np.ndarray, "tf.Tensor", "torch.Tensor"] + ): + """Set the whole-slide context for the stain normalizer. + + With contextual normalization, max concentrations are determined + from the context (whole-slide image) rather than the image being + normalized. This may improve stain normalization for sections of + a slide that are predominantly eosin (e.g. necrosis or low cellularity). + + When calculating max concentrations from the image context, + white pixels (255) will be masked. + + This function is a context manager used for temporarily setting the + image context. For example: + + .. code-block:: python + + with normalizer.context(slide): + normalizer.transform(target) + + If a slide (``sf.WSI``) is used for context, any existing QC filters + and regions of interest will be used to mask out background as white + pixels, and the masked thumbnail will be used for creating the + normalizer context. If no QC has been applied to the slide and the + slide does not have any Regions of Interest, then both otsu's + thresholding and Gaussian blur filtering will be applied + to the thumbnail for masking. + + Args: + I (np.ndarray, sf.WSI): Context to use for normalization, e.g. + a whole-slide image thumbnail, optionally masked with masked + areas set to (255, 255, 255). + + """ + self.set_context(context) + yield + self.clear_context() + + def set_context( + self, + context: Union[str, "sf.WSI", np.ndarray, "tf.Tensor", "torch.Tensor"] + ) -> bool: + """Set the whole-slide context for the stain normalizer. + + With contextual normalization, max concentrations are determined + from the context (whole-slide image) rather than the image being + normalized. This may improve stain normalization for sections of + a slide that are predominantly eosin (e.g. necrosis or low cellularity). + + When calculating max concentrations from the image context, + white pixels (255) will be masked. + + If a slide (``sf.WSI``) is used for context, any existing QC filters + and regions of interest will be used to mask out background as white + pixels, and the masked thumbnail will be used for creating the + normalizer context. If no QC has been applied to the slide and the + slide does not have any Regions of Interest, then both otsu's + thresholding and Gaussian blur filtering will be applied + to the thumbnail for masking. + + Args: + I (np.ndarray, sf.WSI): Context to use for normalization, e.g. + a whole-slide image thumbnail, optionally masked with masked + areas set to (255, 255, 255). + + """ + if hasattr(self.n, 'set_context'): + if isinstance(context, str): + image = np.asarray(sf.WSI(context, 500, 500).thumb(mpp=4)) + elif isinstance(context, sf.WSI): + image = context.masked_thumb(mpp=4, background='white') + else: + image = context # type: ignore + self.n.set_context(image) + return True + else: + return False + + def clear_context(self) -> None: + """Remove any previously set stain normalizer context.""" + if hasattr(self.n, 'clear_context'): + self.n.clear_context()
+ + +def autoselect( + method: str, + source: Optional[str] = None, + backend: Optional[str] = None, + **kwargs +) -> StainNormalizer: + """Select the best normalizer for a given method, and fit to a given source. + + If a normalizer method has a native implementation in the current backend + (Tensorflow or PyTorch), the native normalizer will be used. + If not, the default numpy implementation will be used. + + Currently, the PyTorch-native normalizers are NOT used by default, as they + are slower than the numpy implementations. Thus, with the PyTorch backend, + all normalizers will be the default numpy implementations. + + Args: + method (str): Normalization method. Options include 'macenko', + 'reinhard', 'reinhard_fast', 'reinhard_mask', 'reinhard_fast_mask', + 'vahadane', 'vahadane_spams', 'vahadane_sklearn', and 'augment'. + source (str, optional): Stain normalization preset or path to a source + image. Valid presets include 'v1', 'v2', and 'v3'. If None, will + use the default present ('v3'). Defaults to None. + backend (str): Backend to use for native normalizers. Options include + 'tensorflow', 'torch', and 'opencv'. If None, will use the current + backend, falling back to opencv/numpy if a native normalizer is + not available. Defaults to None. + + Returns: + StainNormalizer: Initialized StainNormalizer. + """ + if backend is None: + backend = sf.backend() + if backend == 'tensorflow': + import slideflow.norm.tensorflow + BackendNormalizer = sf.norm.tensorflow.TensorflowStainNormalizer + elif backend == 'torch': + import slideflow.norm.torch + BackendNormalizer = sf.norm.torch.TorchStainNormalizer # type: ignore + elif backend == 'opencv': + BackendNormalizer = StainNormalizer + else: + raise errors.UnrecognizedBackendError + + if method in BackendNormalizer.normalizers: + normalizer = BackendNormalizer(method, **kwargs) + else: + normalizer = StainNormalizer(method, **kwargs) # type: ignore + + if source is not None and source != 'dataset': + normalizer.fit(source) + + return normalizer +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/project/index.html b/docs/_modules/slideflow/project/index.html new file mode 100644 index 000000000..decd21aba --- /dev/null +++ b/docs/_modules/slideflow/project/index.html @@ -0,0 +1,4359 @@ + + + + + + + + + + + + slideflow.project — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.project

+"""Module for the ``Project`` class and its associated functions.
+
+The ``Project`` class supervises data organization and provides a high-level
+API for common functionality, such as tile extraction from whole
+slide images, model training and evaluation, feature calculation, and
+heatmap generation.
+"""
+import re
+import copy
+import csv
+import itertools
+import requests
+import shutil
+import json
+import multiprocessing
+import numpy as np
+import os
+import pickle
+import pandas as pd
+import tarfile
+import warnings
+from tqdm import tqdm
+from os.path import basename, exists, join, isdir, dirname
+from multiprocessing.managers import DictProxy
+from contextlib import contextmanager
+from statistics import mean
+from types import SimpleNamespace
+from typing import (TYPE_CHECKING, Any, Callable, Dict, List, Optional, Tuple,
+                    Union)
+
+import slideflow as sf
+from . import errors, project_utils
+from .util import log, path_to_name, path_to_ext
+from .dataset import Dataset
+from .model import ModelParams
+from .project_utils import (  # noqa: F401
+    auto_dataset, auto_dataset_allow_none, get_validation_settings,
+    get_first_nested_directory, get_matching_directory, BreastER, ThyroidBRS,
+    LungAdenoSquam
+)
+
+if TYPE_CHECKING:
+    from slideflow.model import DatasetFeatures, Trainer, BaseFeatureExtractor
+    from slideflow.slide import SlideReport
+    from slideflow import simclr, mil
+    from ConfigSpace import ConfigurationSpace, Configuration
+    from smac.facade.smac_bb_facade import SMAC4BB  # noqa: F401
+
+
+
[docs]class Project: + """Assists with project organization and execution of common tasks.""" + + def __init__( + self, root: str, + use_neptune: bool = False, + create: bool = False, + **kwargs + ) -> None: + """Load or create a project at a given directory. + + If a project does not exist at the given root directory, one can be + created if a project configuration was provided via keyword arguments. + + *Create a project:* + + .. code-block:: python + + import slideflow as sf + P = sf.Project('/project/path', name=..., ...) + + *Load an existing project:* + + .. code-block:: python + + P = sf.Project('/project/path') + + Args: + root (str): Path to project directory. + + Keyword Args: + name (str): Project name. Defaults to 'MyProject'. + annotations (str): Path to annotations CSV file. + Defaults to './annotations.csv' + dataset_config (str): Path to dataset configuration JSON file. + Defaults to './datasets.json'. + sources (list(str)): List of dataset sources to include in project. + Defaults to 'source1'. + models_dir (str): Path to directory in which to save models. + Defaults to './models'. + eval_dir (str): Path to directory in which to save evaluations. + Defaults to './eval'. + + Raises: + slideflow.errors.ProjectError: if project folder does not exist, + or the folder exists but kwargs are provided. + + """ + self.root = root + if sf.util.is_project(root) and kwargs: + raise errors.ProjectError(f"Project already exists at {root}") + elif sf.util.is_project(root): + self._load(root) + elif create: + log.info(f"Creating project at {root}...") + if not exists(root): + os.makedirs(root) + self._settings = project_utils._project_config(root, **kwargs) + self.save() + else: + raise errors.ProjectError( + f"Project not found at {root}. Create a project using " + "slideflow.Project(..., create=True), or with " + "slideflow.create_project(...)" + ) + + # Create directories, if not already made + if not exists(self.models_dir): + os.makedirs(self.models_dir) + if not exists(self.eval_dir): + os.makedirs(self.eval_dir) + + # Create blank annotations file if one does not exist + if not exists(self.annotations) and exists(self.dataset_config): + self.create_blank_annotations() + + # Neptune + self.use_neptune = use_neptune + + @classmethod + def from_prompt(cls, root: str, **kwargs: Any) -> "Project": + """Initialize a project using an interactive prompt. + + Creates a project folder and then prompts the user for + project settings, saving to "settings.json" in project directory. + + Args: + root (str): Path to project directory. + + """ + if not sf.util.is_project(root): + log.info(f'Setting up new project at "{root}"') + project_utils.interactive_project_setup(root) + obj = cls(root, **kwargs) + return obj + + def __repr__(self): # noqa D105 + if self.use_neptune: + tail = ", use_neptune={!r}".format(self.use_neptune) + else: + tail = '' + return "Project(root={!r}{})".format(self.root, tail) + + @property + def verbosity(self) -> int: + """Current logging verbosity level.""" + return sf.getLoggingLevel() + + @property + def annotations(self) -> str: + """Path to annotations file.""" + return self._read_relative_path(self._settings['annotations']) + + @annotations.setter + def annotations(self, val: str) -> None: + if not isinstance(val, str): + raise errors.ProjectError("'annotations' must be a path.") + self._settings['annotations'] = val + + @property + def dataset_config(self) -> str: + """Path to dataset configuration JSON file.""" + return self._read_relative_path(self._settings['dataset_config']) + + @dataset_config.setter + def dataset_config(self, val: str) -> None: + if not isinstance(val, str): + raise errors.ProjectError("'dataset_config' must be path to JSON.") + self._settings['dataset_config'] = val + + @property + def eval_dir(self) -> str: + """Path to evaluation directory.""" + if 'eval_dir' not in self._settings: + log.debug("Missing eval_dir in project settings, Assuming ./eval") + return self._read_relative_path('./eval') + else: + return self._read_relative_path(self._settings['eval_dir']) + + @eval_dir.setter + def eval_dir(self, val: str) -> None: + if not isinstance(val, str): + raise errors.ProjectError("'eval_dir' must be a path") + self._settings['eval_dir'] = val + + @property + def models_dir(self) -> str: + """Path to models directory.""" + return self._read_relative_path(self._settings['models_dir']) + + @models_dir.setter + def models_dir(self, val: str) -> None: + if not isinstance(val, str): + raise errors.ProjectError("'models_dir' must be a path") + self._settings['models_dir'] = val + + @property + def name(self) -> str: + """Descriptive project name.""" + return self._settings['name'] + + @name.setter + def name(self, val: str) -> None: + if not isinstance(val, str): + raise errors.ProjectError("'name' must be a str") + self._settings['name'] = val + + @property + def neptune_workspace(self) -> Optional[str]: + """Neptune workspace name.""" + if 'neptune_workspace' in self._settings: + return self._settings['neptune_workspace'] + elif 'NEPTUNE_WORKSPACE' in os.environ: + return os.environ['NEPTUNE_WORKSPACE'] + else: + return None + + @neptune_workspace.setter + def neptune_workspace(self, name: str) -> None: + """Neptune workspace name.""" + if not isinstance(name, str): + raise errors.ProjectError('Neptune workspace must be a string.') + self._settings['neptune_workspace'] = name + + @property + def neptune_api(self) -> Optional[str]: + """Neptune API token.""" + if 'neptune_api' in self._settings: + return self._settings['neptune_api'] + elif 'NEPTUNE_API_TOKEN' in os.environ: + return os.environ['NEPTUNE_API_TOKEN'] + else: + return None + + @neptune_api.setter + def neptune_api(self, api_token: str) -> None: + """Neptune API token.""" + if not isinstance(api_token, str): + raise errors.ProjectError('API token must be a string.') + self._settings['neptune_api'] = api_token + + @property + def sources(self) -> List[str]: + """List of dataset sources active in this project.""" + if 'sources' in self._settings: + return self._settings['sources'] + elif 'datasets' in self._settings: + log.debug("'sources' misnamed 'datasets' in project settings.") + return self._settings['datasets'] + else: + raise ValueError('Unable to find project dataset sources') + + @sources.setter + def sources(self, v: List[str]) -> None: + if not isinstance(v, list) or any([not isinstance(v, str) for v in v]): + raise errors.ProjectError("'sources' must be a list of str") + self._settings['sources'] = v + + def _load(self, path: str) -> None: + """Load a saved and pre-configured project from the specified path.""" + if sf.util.is_project(path): + self._settings = sf.util.load_json(join(path, 'settings.json')) + else: + raise errors.ProjectError('Unable to find settings.json.') + + @contextmanager + def _set_eval_dir(self, path: str): + _initial = self.eval_dir + self.eval_dir = path + try: + yield + finally: + self.eval_dir = _initial + + @contextmanager + def _set_models_dir(self, path: str): + _initial = self.models_dir + self.models_dir = path + try: + yield + finally: + self.models_dir = _initial + + def _read_relative_path(self, path: str) -> str: + """Convert relative path within project directory to global path.""" + return sf.util.relative_path(path, self.root) + + def _setup_labels( + self, + dataset: Dataset, + hp: ModelParams, + outcomes: List[str], + config: Dict, + splits: str, + eval_k_fold: Optional[int] = None + ) -> Tuple[Dataset, Dict, Union[Dict, List]]: + """Prepare dataset and labels.""" + # Assign labels into int + conf_labels = config['outcome_labels'] + if hp.model_type() == 'classification': + if len(outcomes) == 1 and outcomes[0] not in conf_labels: + outcome_label_to_int = { + outcomes[0]: { + v: int(k) for k, v in conf_labels.items() + } + } + else: + outcome_label_to_int = { + o: { + v: int(k) for k, v in conf_labels[o].items() + } for o in conf_labels + } + else: + outcome_label_to_int = None + + # Get patient-level labels + use_float = (hp.model_type() in ['regression', 'survival']) + labels, unique = dataset.labels( + outcomes, + use_float=use_float, + assign=outcome_label_to_int + ) + # Prepare labels for validation splitting + if hp.model_type() == 'classification' and len(outcomes) > 1: + def process_label(v): + return '-'.join(map(str, v)) if isinstance(v, list) else v + split_labels = {k: process_label(v) for k, v in labels.items()} + else: + split_labels = labels + + # If using a specific k-fold, load validation plan + if eval_k_fold: + log.info(f"Using k-fold iteration {eval_k_fold}") + _, eval_dts = dataset.split( + hp.model_type(), + split_labels, + val_strategy=config['validation_strategy'], + splits=join(self.root, splits), + val_fraction=config['validation_fraction'], + val_k_fold=config['validation_k_fold'], + k_fold_iter=eval_k_fold + ) + return eval_dts, labels, unique + + # Otherwise use all TFRecords + else: + return dataset, labels, unique + + def _prepare_trainer( + self, + model: str, + dataset: Dataset, + outcomes: Optional[Union[str, List[str]]] = None, + checkpoint: Optional[str] = None, + eval_k_fold: Optional[int] = None, + splits: str = "splits.json", + max_tiles: int = 0, + mixed_precision: bool = True, + allow_tf32: bool = False, + input_header: Optional[Union[str, List[str]]] = None, + load_method: str = 'weights', + custom_objects: Optional[Dict[str, Any]] = None, + ) -> Tuple["Trainer", Dataset]: + """Prepare a :class:`slideflow.model.Trainer` for eval or prediction. + + Args: + model (str): Path to model to evaluate. + dataset (:class:`slideflow.Dataset`): Dataset + from which to generate activations. + outcomes (str): Str or list of str. Annotation column + header specifying the outcome label(s). + checkpoint (str, optional): Path to cp.ckpt file, if evaluating + saved checkpoint. Defaults to None. + eval_k_fold (int, optional): K-fold iteration number to evaluate. + Defaults to None. If None, evaluate all tfrecords. + splits (str, optional): Filename of JSON file in which to log + training/validation splits. Looks for filename in project root. + Defaults to "splits.json". + max_tiles (int, optional): Maximum number of tiles from each slide + to evaluate. Defaults to 0 (include all tiles). + mixed_precision (bool, optional): Enable mixed precision. + Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + input_header (str, optional): Annotation column header to use as + additional input. Defaults to None. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model + with ``tf.keras.models.load_model()``. If 'weights', will read + the ``params.json`` configuration file, build the model + architecture, and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + custom_objects (dict, Optional): Dictionary mapping names + (strings) to custom classes or functions. Defaults to None. + + Returns: + A tuple containing + + :class:`slideflow.model.Trainer`: Trainer. + + :class:`slideflow.Dataset`: Evaluation dataset. + + """ + if eval_k_fold is not None and outcomes is None: + raise ValueError('`eval_k_fold` invalid when predicting.') + + # Load hyperparameters from saved model + config = sf.util.get_model_config(model) + hp = ModelParams() + hp.load_dict(config['hp']) + model_name = f"eval-{basename(model)}" + + # If not provided, detect outcomes from model config + predicting = (outcomes is None) + if predicting: + outcomes = config['outcomes'] + + assert outcomes is not None + outcomes = sf.util.as_list(outcomes) + + # Filter out slides that are blank in the outcome label, + # or blank in any of the input_header categories + filter_blank = [o for o in outcomes] + if input_header is not None and not isinstance(input_header, list): + input_header = [input_header] + if input_header is not None: + filter_blank += input_header + + # Set up outcome labels + if not predicting: + dataset = dataset.filter(filter_blank=filter_blank) + eval_dts, labels, unique = self._setup_labels( + dataset, hp, outcomes, config, splits, eval_k_fold=eval_k_fold + ) + else: + eval_dts = dataset + if sf.backend() == 'torch': + labels = config['outcome_labels'] + else: + labels = {} + unique = list(config['outcome_labels'].values()) + + # Set max tiles + eval_dts = eval_dts.clip(max_tiles) + + # Prepare additional slide-level input + if input_header: + _res = project_utils._setup_input_labels(eval_dts, input_header) + inpt_labels, feature_sizes, slide_inp = _res + else: + inpt_labels = None + feature_sizes = None + slide_inp = {} + + n_feat = 0 if feature_sizes is None else sum(feature_sizes) + if feature_sizes and n_feat != sum(config['input_feature_sizes']): + n_model_feat = sum(config['input_feature_sizes']) + raise ValueError( + f'Patient feature matrix (size {n_feat}) ' + f'is different from model (size {n_model_feat}).' + ) + + # Log model settings and hyperparameters + if hp.model_type() == 'classification': + outcome_labels = dict(zip(range(len(unique)), unique)) + else: + outcome_labels = None + + model_dir = sf.util.get_new_model_dir(self.eval_dir, model_name) + + # Set missing validation keys to NA + for v_end in ('strategy', 'fraction', 'k_fold'): + val_key = f'validation_{v_end}' + if val_key not in config: + config[val_key] = 'NA' + + eval_config = { + 'slideflow_version': sf.__version__, + 'project': self.name, + 'backend': sf.backend(), + 'git_commit': sf.__gitcommit__, + 'model_name': model_name, + 'model_path': model, + 'stage': 'evaluation', + 'img_format': config['img_format'], + 'tile_px': hp.tile_px, + 'tile_um': hp.tile_um, + 'model_type': hp.model_type(), + 'outcomes': outcomes, + 'input_features': input_header, + 'input_feature_sizes': feature_sizes, + 'input_feature_labels': inpt_labels, + 'outcome_labels': outcome_labels, + 'dataset_config': self.dataset_config, + 'sources': self.sources, + 'annotations': self.annotations, + 'validation_strategy': config['validation_strategy'], + 'validation_fraction': config['validation_fraction'], + 'validation_k_fold': config['validation_k_fold'], + 'k_fold_i': eval_k_fold, + 'filters': dataset.filters, + 'pretrain': None, + 'resume_training': None, + 'checkpoint': checkpoint, + 'hp': hp.to_dict(), + 'max_tiles': max_tiles, + 'min_tiles': dataset.min_tiles, + } + if 'norm_fit' in config: + eval_config.update({'norm_fit': config['norm_fit']}) + + # Build a model using the slide list as input + # and the annotations dictionary as output labels + trainer = sf.model.build_trainer( + hp, + outdir=model_dir, + labels=labels, + config=eval_config, + slide_input=slide_inp, + mixed_precision=mixed_precision, + allow_tf32=allow_tf32, + feature_names=input_header, + feature_sizes=feature_sizes, + outcome_names=outcomes, + use_neptune=self.use_neptune, + neptune_api=self.neptune_api, + neptune_workspace=self.neptune_workspace, + load_method=load_method, + custom_objects=custom_objects, + ) + + return trainer, eval_dts + + def _train_hp( + self, + *, + hp_name: str, + hp: ModelParams, + outcomes: List[str], + val_settings: SimpleNamespace, + ctx: multiprocessing.context.BaseContext, + dataset: Optional[sf.Dataset], + filters: Optional[Dict], + filter_blank: Optional[Union[str, List[str]]], + input_header: Optional[Union[str, List[str]]], + min_tiles: int, + max_tiles: int, + mixed_precision: bool, + allow_tf32: bool, + splits: str, + results_dict: Union[Dict, DictProxy], + training_kwargs: Dict, + balance_headers: Optional[Union[str, List[str]]], + process_isolate: bool = False, + **kwargs + ) -> None: + """Train a model(s) using the specified hyperparameters. + + Keyword Args: + hp_name (str): Name of hyperparameter combination being run. + hp (:class:`slideflow.ModelParams`): Model parameters. + outcomes (str or list(str)): Annotation outcome headers. + val_settings (:class:`types.SimpleNamspace`): Validation settings. + ctx (multiprocessing.Context): Multiprocessing context for sharing + results from isolated training processes. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + input_header (str or list(str)): Annotation col of additional + slide-level input. + min_tiles (int): Only includes tfrecords with >= min_tiles + max_tiles (int): Cap maximum tiles per tfrecord. + mixed_precision (bool): Train with mixed precision. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + splits (str): Location of splits file for logging/reading splits. + balance_headers (str, list(str)): Annotation col headers for + mini-batch balancing. + results_dict (dict): Multiprocessing-friendly dict for sending + results from isolated training processes + training_kwargs (dict): Keyword arguments for Trainer.train(). + + """ + # --- Prepare dataset --------------------------------------------- + # Filter out slides that are blank in the outcome label, + # or blank in any of the input_header categories + if filter_blank is not None and not isinstance(filter_blank, list): + filter_blank = [filter_blank] + if filter_blank: + filter_blank += [o for o in outcomes] + else: + filter_blank = [o for o in outcomes] + if input_header is not None and not isinstance(input_header, list): + input_header = [input_header] + if input_header is not None: + filter_blank += input_header + if dataset is None: + dataset = self.dataset(hp.tile_px, hp.tile_um) + else: + _compatible = sf.util.is_tile_size_compatible( + dataset.tile_px, + dataset.tile_um, + hp.tile_px, + hp.tile_um + ) + if not _compatible: + raise errors.IncompatibleTileSizeError( + "Dataset tile size (px={}, um={}) does not match provided " + "hyperparameters (px={}, um={})".format( + dataset.tile_px, dataset.tile_um, + hp.tile_px, hp.tile_um + ) + ) + dataset = dataset.filter( + filters=filters, + filter_blank=filter_blank, + min_tiles=min_tiles + ) + # --- Load labels ------------------------------------------------- + use_float = (hp.model_type() in ['regression', 'survival']) + labels, unique = dataset.labels(outcomes, use_float=use_float) + if hp.model_type() == 'classification' and len(outcomes) == 1: + outcome_labels = dict(zip(range(len(unique)), unique)) + elif hp.model_type() == 'classification': + assert isinstance(unique, dict) + outcome_labels = { + k: dict(zip(range(len(ul)), ul)) # type: ignore + for k, ul in unique.items() + } + else: + outcome_labels = dict(zip(range(len(outcomes)), outcomes)) + if hp.model_type() != 'regression' and len(outcomes) > 1: + log.info('Using multi-outcome approach for classification') + + # If multiple classification outcomes are used, + # create a merged variable for k-fold splitting + if hp.model_type() == 'classification' and len(outcomes) > 1: + split_labels = { + k: '-'.join(map(str, v)) # type: ignore + for k, v in labels.items() + } + else: + split_labels = labels # type: ignore + + # --- Prepare k-fold validation configuration --------------------- + results_log_path = os.path.join(self.root, 'results_log.csv') + k_header = val_settings.k_fold_header + if val_settings.k is not None and not isinstance(val_settings.k, list): + val_settings.k = [val_settings.k] + if val_settings.strategy == 'k-fold-manual': + _, unique_k = dataset.labels(k_header, format='name') + valid_k = [kf for kf in unique_k] + k_fold = len(valid_k) + log.info(f"Manual folds: {', '.join([str(ks) for ks in valid_k])}") + if val_settings.k: + valid_k = [kf for kf in valid_k if kf in val_settings.k] + elif val_settings.strategy in ('k-fold', + 'k-fold-preserved-site', + 'bootstrap'): + k_fold = val_settings.k_fold + if val_settings.k is None: + valid_k = list(range(1, k_fold+1)) + else: + valid_k = [ + kf for kf in range(1, k_fold+1) if kf in val_settings.k + ] + else: + k_fold = None + valid_k = [None] # type: ignore + + # Create model labels + label_string = '-'.join(outcomes) + model_name = f'{label_string}-{hp_name}' + if k_fold is None: + model_iterations = [model_name] + else: + model_iterations = [f'{model_name}-kfold{k}' for k in valid_k] + + s_args = SimpleNamespace( + model_name=model_name, + outcomes=outcomes, + k_header=k_header, + valid_k=valid_k, + split_labels=split_labels, + splits=splits, + labels=labels, + min_tiles=min_tiles, + max_tiles=max_tiles, + outcome_labels=outcome_labels, + filters=filters, + training_kwargs=training_kwargs, + mixed_precision=mixed_precision, + allow_tf32=allow_tf32, + ctx=ctx, + results_dict=results_dict, + bal_headers=balance_headers, + input_header=input_header, + process_isolate=process_isolate, + **kwargs + ) + + # --- Train on a specific K-fold -------------------------------------- + for k in valid_k: + s_args.k = k + self._train_split(dataset, hp, val_settings, s_args) + + # --- Record results -------------------------------------------------- + if (not val_settings.source + and (val_settings.strategy is None + or val_settings.strategy == 'none')): + log.info('No validation performed.') + else: + for mi in model_iterations: + if mi not in results_dict or 'epochs' not in results_dict[mi]: + log.error(f'Training failed for model {model_name}') + else: + sf.util.update_results_log( + results_log_path, + mi, + results_dict[mi]['epochs'] + ) + log.info(f'Training results saved: [green]{results_log_path}') + + def _train_split( + self, + dataset: Dataset, + hp: ModelParams, + val_settings: SimpleNamespace, + s_args: SimpleNamespace, + ) -> None: + """Train a model for a given training/validation split. + + Args: + dataset (:class:`slideflow.Dataset`): Dataset to split into + training and validation. + hp (:class:`slideflow.ModelParams`): Model parameters. + val_settings (:class:`types.SimpleNamspace`): Validation settings. + s_args (:class:`types.SimpleNamspace`): Training settings. + + """ + # Log current model name and k-fold iteration, if applicable + k_msg = '' + if s_args.k is not None: + k_msg = f' ({val_settings.strategy} #{s_args.k})' + if sf.getLoggingLevel() <= 20: + print() + log.info(f'Training model [bold]{s_args.model_name}[/]{k_msg}...') + log.info(f'Hyperparameters: {hp}') + if val_settings.dataset: + log.info('Val settings: <Dataset manually provided>') + else: + log.info( + f'Val settings: {json.dumps(vars(val_settings), indent=2)}' + ) + + # --- Set up validation data ------------------------------------------ + from_wsi = ('from_wsi' in s_args.training_kwargs + and s_args.training_kwargs['from_wsi']) + + # Use an external validation dataset if supplied + if val_settings.dataset: + train_dts = dataset + val_dts = val_settings.dataset + is_float = (hp.model_type() in ['regression', 'survival']) + val_labels, _ = val_dts.labels(s_args.outcomes, use_float=is_float) + s_args.labels.update(val_labels) + elif val_settings.source: + train_dts = dataset + val_dts = Dataset( + tile_px=hp.tile_px, + tile_um=hp.tile_um, + config=self.dataset_config, + sources=val_settings.source, + annotations=val_settings.annotations, + filters=val_settings.filters, + filter_blank=val_settings.filter_blank + ) + is_float = (hp.model_type() in ['regression', 'survival']) + val_labels, _ = val_dts.labels(s_args.outcomes, use_float=is_float) + s_args.labels.update(val_labels) + # Use manual k-fold assignments if indicated + elif val_settings.strategy == 'k-fold-manual': + t_filters = { + s_args.k_header: [j for j in s_args.valid_k if j != s_args.k] + } + train_dts = dataset.filter(t_filters) + val_dts = dataset.filter(filters={s_args.k_header: [s_args.k]}) + # No validation + elif val_settings.strategy == 'none': + train_dts = dataset + val_dts = None + # Otherwise, calculate k-fold splits + else: + if val_settings.strategy == 'k-fold-preserved-site': + site_labels = dataset.labels( + s_args.k_header, + format='name' + )[0] # type: Any + else: + site_labels = None + train_dts, val_dts = dataset.split( + hp.model_type(), + s_args.split_labels, + val_strategy=val_settings.strategy, + splits=join(self.root, s_args.splits), + val_fraction=val_settings.fraction, + val_k_fold=val_settings.k_fold, + k_fold_iter=s_args.k, + site_labels=site_labels, + from_wsi=from_wsi + ) + + # ---- Balance datasets -------------------------------------- + # Training + if s_args.bal_headers is None: + s_args.bal_headers = s_args.outcomes + if train_dts.prob_weights and hp.training_balance not in ('none', None): + log.warning( + "Training dataset already balanced; ignoring hyperparameter " + "training_balance={!r}".format(hp.training_balance) + ) + elif not from_wsi: + train_dts = train_dts.balance( + s_args.bal_headers, + hp.training_balance, + force=(hp.model_type() == 'classification') + ) + elif from_wsi and hp.training_balance not in ('none', None): + log.warning( + "Balancing / clipping is disabled when `from_wsi=True`" + ) + + # Validation + if val_dts and val_dts.prob_weights and hp.validation_balance not in ( + 'none', None + ): + log.warning( + "Validation dataset already balanced; ignoring hyperparameter " + "validation_balance={!r}".format(hp.validation_balance) + ) + elif val_dts and not from_wsi: + val_dts = val_dts.balance( + s_args.bal_headers, + hp.validation_balance, + force=(hp.model_type() == 'classification') + ) + elif val_dts and from_wsi and hp.validation_balance not in ( + 'none', None + ): + log.warning( + "Balancing / clipping is disabled when `from_wsi=True`" + ) + + # ---- Clip datasets ----------------------------------------- + # Training + if s_args.max_tiles and train_dts._clip: + log.warning( + "Training dataset already clipped; ignoring parameter " + "max_tiles={!r}".format(s_args.max_tiles) + ) + elif s_args.max_tiles and not from_wsi: + train_dts = train_dts.clip(s_args.max_tiles) + elif s_args.max_tiles and from_wsi: + log.warning( + "Clipping is disabled when `from_wsi=True`" + ) + + # Validation + if val_dts and s_args.max_tiles and val_dts._clip: + log.warning( + "Validation dataset already clipped; ignoring parameter " + "max_tiles={!r}".format(s_args.max_tiles) + ) + elif s_args.max_tiles and val_dts and not from_wsi: + val_dts = val_dts.clip(s_args.max_tiles) + elif s_args.max_tiles and val_dts and from_wsi: + log.warning( + "Clipping is disabled when `from_wsi=True`" + ) + + # ---- Determine tile counts --------------------------------------- + if from_wsi: + num_train = len(train_dts.slide_paths()) + num_val = 0 if not val_dts else len(val_dts.slide_paths()) + log.info( + f'Using {num_train} training slides, {num_val} validation' + ) + else: + num_train = len(train_dts.tfrecords()) + num_val = 0 if not val_dts else len(val_dts.tfrecords()) + log.info( + f'Using {num_train} training TFRecords, {num_val} validation' + ) + + # --- Prepare additional slide-level input ---------------------------- + if s_args.input_header: + _res = project_utils._setup_input_labels( + dataset, + s_args.input_header, + val_dts=val_dts + ) + inpt_labels, feature_sizes, slide_inp = _res + else: + inpt_labels = None + feature_sizes = None + slide_inp = None + + # --- Initialize model ------------------------------------------------ + # Using the project annotation file, assemble slides for training, + # as well as the slide annotations dictionary (output labels) + full_name = s_args.model_name + if s_args.k is not None: + full_name += f'-kfold{s_args.k}' + model_dir = sf.util.get_new_model_dir(self.models_dir, full_name) + + # Log model settings and hyperparameters + config = { + 'slideflow_version': sf.__version__, + 'project': self.name, + 'backend': sf.backend(), + 'git_commit': sf.__gitcommit__, + 'model_name': s_args.model_name, + 'full_model_name': full_name, + 'stage': 'training', + 'img_format': train_dts.img_format, + 'tile_px': hp.tile_px, + 'tile_um': hp.tile_um, + 'max_tiles': s_args.max_tiles, + 'min_tiles': s_args.min_tiles, + 'model_type': hp.model_type(), + 'outcomes': s_args.outcomes, + 'input_features': s_args.input_header, + 'input_feature_sizes': feature_sizes, + 'input_feature_labels': inpt_labels, + 'outcome_labels': s_args.outcome_labels, + 'dataset_config': self.dataset_config, + 'sources': self.sources, + 'annotations': self.annotations, + 'validation_strategy': val_settings.strategy, + 'validation_fraction': val_settings.fraction, + 'validation_k_fold': val_settings.k_fold, + 'k_fold_i': s_args.k, + 'filters': s_args.filters, + 'hp': hp.to_dict(), + 'training_kwargs': s_args.training_kwargs, + } + model_kwargs = { + 'hp': hp, + 'name': full_name, + 'feature_names': s_args.input_header, + 'feature_sizes': feature_sizes, + 'outcome_names': s_args.outcomes, + 'outdir': model_dir, + 'config': config, + 'slide_input': slide_inp, + 'labels': s_args.labels, + 'mixed_precision': s_args.mixed_precision, + 'allow_tf32': s_args.allow_tf32, + 'use_neptune': self.use_neptune, + 'neptune_api': self.neptune_api, + 'neptune_workspace': self.neptune_workspace, + 'load_method': s_args.load_method + } + if s_args.process_isolate: + process = s_args.ctx.Process(target=project_utils._train_worker, + args=((train_dts, val_dts), + model_kwargs, + s_args.training_kwargs, + s_args.results_dict, + self.verbosity)) + process.start() + log.debug(f'Spawning training process (PID: {process.pid})') + process.join() + else: + project_utils._train_worker( + (train_dts, val_dts), + model_kwargs, + s_args.training_kwargs, + s_args.results_dict, + self.verbosity + ) + + def add_source( + self, + name: str, + *, + slides: Optional[str] = None, + roi: Optional[str] = None, + tiles: Optional[str] = None, + tfrecords: Optional[str] = None, + path: Optional[str] = None + ) -> None: + r"""Add a dataset source to the dataset configuration file. + + Args: + name (str): Dataset source name. + + Keyword Args: + slides (str, optional): Path to directory containing slides. + Defaults to None. + roi (str, optional): Path to directory containing CSV ROIs. + Defaults to None. + tiles (str, optional): Path to directory for loose extracted tiles + images (\*.jpg, \*.png). Defaults to None. + tfrecords (str, optional): Path to directory for storing TFRecords + of tiles. Defaults to None. + path (str, optional): Path to dataset configuration file. + If not provided, uses project default. Defaults to None. + + """ + if not path: + path = self.dataset_config + project_utils.add_source( + name, + path=path, + slides=slides, + roi=(roi or join(self._read_relative_path('./roi'), name)), + tiles=tiles, + tfrecords=(tfrecords or join(self._read_relative_path('./tfrecords'), name)), + ) + if name not in self.sources: + self.sources += [name] + self.save() + + def associate_slide_names(self) -> None: + """Automatically associate patients with slides in the annotations.""" + dataset = self.dataset(tile_px=0, tile_um=0, verification=None) + dataset.update_annotations_with_slidenames(self.annotations) + + def cell_segmentation( + self, + diam_um: float, + dest: Optional[str] = None, + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + sources: Union[str, List[str]], + **kwargs + ) -> None: + """Perform cell segmentation on slides, saving segmentation masks. + + Cells are segmented with + `Cellpose <https://www.nature.com/articles/s41592-020-01018-x>`_ from + whole-slide images, and segmentation masks are saved in the ``masks/`` + subfolder within the project root directory. + + .. note:: + + Cell segmentation requires installation of the ``cellpose`` package + available via pip: + + .. code-block:: bash + + pip install cellpose + + Args: + diam_um (float, optional): Cell segmentation diameter, in microns. + dest (str): Destination in which to save cell segmentation masks. + If None, will save masks in ``{project_root}/masks`` + Defaults to None. + + Keyword args: + batch_size (int): Batch size for cell segmentation. Defaults to 8. + cp_thresh (float): Cell probability threshold. All pixels with + value above threshold kept for masks, decrease to find more and + larger masks. Defaults to 0. + diam_mean (int, optional): Cell diameter to detect, in pixels + (without image resizing). If None, uses Cellpose defaults + (17 for the 'nuclei' model, 30 for all others). + downscale (float): Factor by which to downscale generated masks + after calculation. Defaults to None (keep masks at original + size). + flow_threshold (float): Flow error threshold (all cells with errors + below threshold are kept). Defaults to 0.4. + gpus (int, list(int)): GPUs to use for cell segmentation. + Defaults to 0 (first GPU). + interp (bool): Interpolate during 2D dynamics. Defaults to True. + qc (str): Slide-level quality control method to use before + performing cell segmentation. Defaults to "Otsu". + model (str, :class:`cellpose.models.Cellpose`): Cellpose model to + use for cell segmentation. May be any valid cellpose model. + Defaults to 'cyto2'. + mpp (float): Microns-per-pixel at which cells should be segmented. + Defaults to 0.5. + num_workers (int, optional): Number of workers. + Defaults to 2 * num_gpus. + save_centroid (bool): Save mask centroids. Increases memory + utilization slightly. Defaults to True. + save_flow (bool): Save flow values for the whole-slide image. + Increases memory utilization. Defaults to False. + sources (List[str]): List of dataset sources to include from + configuration file. + tile (bool): Tiles image to decrease GPU/CPU memory usage. + Defaults to True. + verbose (bool): Verbose log output at the INFO level. + Defaults to True. + window_size (int): Window size at which to segment cells across + a whole-slide image. Defaults to 256. + + Returns: + None + """ + if dest is None: + dest = join(self.root, 'masks') + if not exists(dest): + os.makedirs(dest) + dataset = self.dataset( + None, + None, + filters=filters, + filter_blank=filter_blank, + verification='slides', + sources=sources, + ) + dataset.cell_segmentation(diam_um, dest, **kwargs) + + def create_blank_annotations( + self, + filename: Optional[str] = None + ) -> None: + """Create an empty annotations file. + + Args: + filename (str): Annotations file destination. If not provided, + will use project default. + + """ + if filename is None: + filename = self.annotations + if exists(filename): + raise errors.AnnotationsError( + f"Error creating annotations {filename}; file already exists" + ) + if not exists(self.dataset_config): + raise errors.AnnotationsError( + f"Dataset config {self.dataset_config} missing." + ) + dataset = Dataset( + config=self.dataset_config, + sources=self.sources, + tile_px=None, + tile_um=None, + annotations=None + ) + all_paths = dataset.slide_paths(apply_filters=False) + slides = [path_to_name(s) for s in all_paths] + with open(filename, 'w') as csv_outfile: + csv_writer = csv.writer(csv_outfile, delimiter=',') + header = ['patient', 'dataset', 'category'] + csv_writer.writerow(header) + for slide in slides: + csv_writer.writerow([slide, '', '']) + log.info(f"Wrote annotations file to [green]{filename}") + + def create_hp_sweep( + self, + filename: str = 'sweep.json', + label: Optional[str] = None, + **kwargs: Any + ) -> None: + """Prepare a grid-search hyperparameter sweep, saving to a config file. + + To initiate a grid-search sweep using the created JSON file, pass + this file to the ``params`` argument of ``Project.train()``: + + >>> P.train('outcome', params='sweep.json', ...) + + Args: + filename (str, optional): Filename for hyperparameter sweep. + Overwrites existing files. Saves in project root directory. + Defaults to "sweep.json". + label (str, optional): Label to use when naming models in sweep. + Defaults to None. + **kwargs: Parameters to include in the sweep. Parameters may either + be fixed or provided as lists. + + """ + non_epoch_kwargs = {k: v for k, v in kwargs.items() if k != 'epochs'} + pdict = copy.deepcopy(non_epoch_kwargs) + args = list(pdict.keys()) + for arg in args: + if not isinstance(pdict[arg], list): + pdict[arg] = [pdict[arg]] + argsv = list(pdict.values()) + sweep = list(itertools.product(*argsv)) + label = '' if not label else f'{label}-' + hp_list = [] + for i, params in enumerate(sweep): + full_params = dict(zip(args, list(params))) + if 'epochs' in kwargs: + full_params['epochs'] = kwargs['epochs'] + mp = ModelParams(**full_params) + hp_list += [{f'{label}HPSweep{i}': mp.to_dict()}] + sf.util.write_json(hp_list, os.path.join(self.root, filename)) + log.info(f'Wrote hp sweep (len {len(sweep)}) to [green]{filename}') + + @auto_dataset + def evaluate( + self, + model: str, + outcomes: Union[str, List[str]], + *, + dataset: Dataset, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + checkpoint: Optional[str] = None, + eval_k_fold: Optional[int] = None, + splits: str = "splits.json", + max_tiles: int = 0, + mixed_precision: bool = True, + allow_tf32: bool = False, + input_header: Optional[Union[str, List[str]]] = None, + load_method: str = 'weights', + custom_objects: Optional[Dict[str, Any]] = None, + **kwargs: Any + ) -> Dict: + """Evaluate a saved model on a given set of tfrecords. + + Args: + model (str): Path to model to evaluate. + outcomes (str): Str or list of str. Annotation column + header specifying the outcome label(s). + + Keyword Args: + dataset (:class:`slideflow.Dataset`, optional): Dataset + to evaluate. If not supplied, will evaluate all project + tfrecords at the tile_px/tile_um matching the supplied model, + optionally using provided filters and filter_blank. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Minimum number of tiles a slide must + have to be included in evaluation. Defaults to 0. + checkpoint (str, optional): Path to cp.ckpt file, if evaluating a + saved checkpoint. Defaults to None. + eval_k_fold (int, optional): K-fold iteration number to evaluate. + Defaults to None. If None, will evaluate all tfrecords + irrespective of K-fold. + splits (str, optional): Filename of JSON file in which to log + train/val splits. Looks for filename in project root directory. + Defaults to "splits.json". + max_tiles (int, optional): Maximum number of tiles from each slide + to evaluate. Defaults to 0. If zero, will include all tiles. + mixed_precision (bool, optional): Enable mixed precision. + Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + input_header (str, optional): Annotation column header to use as + additional input. Defaults to None. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model + with ``tf.keras.models.load_model()``. If 'weights', will read + the ``params.json`` configuration file, build the model + architecture, and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + save_predictions (bool or str, optional): Save tile, slide, and + patient-level predictions at each evaluation. May be 'csv', + 'feather', or 'parquet'. If False, will not save predictions. + Defaults to 'parquet'. + custom_objects (dict, Optional): Dictionary mapping names + (strings) to custom classes or functions. Defaults to None. + **kwargs: Additional keyword arguments to the `Trainer.evaluate()` + function. + + Returns: + Dict: Dictionary of keras training results, nested by epoch. + + """ + log.info(f'Evaluating model at [green]{model}') + trainer, eval_dts = self._prepare_trainer( + model=model, + dataset=dataset, + outcomes=outcomes, + checkpoint=checkpoint, + eval_k_fold=eval_k_fold, + splits=splits, + max_tiles=max_tiles, + input_header=input_header, + mixed_precision=mixed_precision, + allow_tf32=allow_tf32, + load_method=load_method, + custom_objects=custom_objects, + ) + + # Load the model + if isinstance(model, str): + trainer.load(model, training=True) + if checkpoint: + if trainer.feature_sizes: + n_features = sum(trainer.feature_sizes) + else: + n_features = 0 + trainer.model = trainer.hp.build_model( + labels=trainer.labels, + num_slide_features=n_features + ) + trainer.model.load_weights(checkpoint) + + # Evaluate + return trainer.evaluate(eval_dts, **kwargs) + + def evaluate_mil( + self, + model: str, + outcomes: Union[str, List[str]], + dataset: Dataset, + bags: Union[str, List[str]], + config: Optional["mil.TrainerConfig"] = None, + *, + outdir: Optional[str] = None, + **kwargs + ) -> pd.DataFrame: + r"""Evaluate a multi-instance learning model. + + Saves results for the evaluation in the ``mil_eval`` project folder, + including predictions (parquet format), attention (Numpy format for + each slide), and attention heatmaps (if ``attention_heatmaps=True``). + + Logs classifier metrics (AUROC and AP) to the console. + + Args: + model (str): Path to MIL model. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + dataset (:class:`slideflow.Dataset`): Dataset. + bags (str): Either a path to directory with \*.pt files, or a list + of paths to individual \*.pt files. Each file should contain + exported feature vectors, with each file containing all tile + features for one patient. + config (:class:`slideflow.mil.TrainerConfig`): + Training configuration, as obtained by + :func:`slideflow.mil.mil_config()`. + + Keyword args: + exp_label (str): Experiment label, used for naming the subdirectory + in the ``{project root}/mil`` folder, where training history + and the model will be saved. + attention_heatmaps (bool): Calculate and save attention heatmaps. + Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + Returns: + pd.DataFrame: Dataframe of predictions. + """ + from .mil import eval_mil + + if outdir is None: + outdir = join(self.root, 'mil_eval') + + return eval_mil( + model, + dataset=dataset, + outcomes=outcomes, + bags=bags, + config=config, + outdir=outdir, + **kwargs + ) + + def extract_cells( + self, + tile_px: int, + tile_um: Union[int, str], + masks_path: Optional[str] = None, + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + **kwargs: Any + ) -> Dict[str, "SlideReport"]: + """Extract images of cells from whole-slide images. + + Image tiles are extracted from cells, with a tile at each cell + centroid. Requires that cells have already been segmented with + ``Project.cell_segmentation()``. This function otherwise is similar + to :meth:`slideflow.Project.extract_tiles`, with tiles saved in + TFRecords by default. + + Args: + tile_px (int): Size of tiles to extract at cell centroids (pixels). + tile_um (int or str): Size of tiles to extract, in microns (int) or + magnification (str, e.g. "20x"). + masks_path (str, optional): Location of saved masks. If None, will + look in project default (subfolder '/masks'). Defaults to None. + + Keyword Args: + apply_masks (bool): Apply cell segmentation masks to the extracted + tiles. Defaults to True. + **kwargs (Any): All other keyword arguments are passed to + :meth:`Project.extract_tiles()`. + + Returns: + Dictionary mapping slide paths to each slide's SlideReport + (:class:`slideflow.slide.report.SlideReport`) + """ + if masks_path is None: + masks_path = join(self.root, 'masks') + dataset = self.dataset( + tile_px, + tile_um, + filters=filters, + filter_blank=filter_blank, + verification='slides' + ) + return dataset.extract_cells(masks_path=masks_path, **kwargs) + + def extract_tiles( + self, + tile_px: int, + tile_um: Union[int, str], + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + **kwargs: Any + ) -> Dict[str, "SlideReport"]: + """Extract tiles from slides. + + Preferred use is calling :meth:`slideflow.Dataset.extract_tiles`. + + Args: + tile_px (int): Size of tiles to extract, in pixels. + tile_um (int or str): Size of tiles to extract, in microns (int) or + magnification (str, e.g. "20x"). + + Keyword Args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + save_tiles (bool, optional): Save tile images in loose format. + Defaults to False. + save_tfrecords (bool): Save compressed image data from + extracted tiles into TFRecords in the corresponding TFRecord + directory. Defaults to True. + source (str, optional): Name of dataset source from which to select + slides for extraction. Defaults to None. If not provided, will + default to all sources in project. + stride_div (int): Stride divisor for tile extraction. + A stride of 1 will extract non-overlapping tiles. + A stride_div of 2 will extract overlapping tiles, with a stride + equal to 50% of the tile width. Defaults to 1. + enable_downsample (bool): Enable downsampling for slides. + This may result in corrupted image tiles if downsampled slide + layers are corrupted or incomplete. Defaults to True. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and skip the slide if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + roi_filter_method (str or float): Method of filtering tiles with + ROIs. Either 'center' or float (0-1). If 'center', tiles are + filtered with ROIs based on the center of the tile. If float, + tiles are filtered based on the proportion of the tile inside + the ROI, and ``roi_filter_method`` is interpreted as a + threshold. If the proportion of a tile inside the ROI is + greater than this number, the tile is included. For example, + if ``roi_filter_method=0.7``, a tile that is 80% inside of an + ROI will be included, and a tile that is 50% inside of an ROI + will be excluded. Defaults to 'center'. + skip_extracted (bool): Skip slides that have already + been extracted. Defaults to True. + tma (bool): Reads slides as Tumor Micro-Arrays (TMAs). + Deprecated argument; all slides are now read as standard WSIs. + randomize_origin (bool): Randomize pixel starting + position during extraction. Defaults to False. + buffer (str, optional): Slides will be copied to this directory + before extraction. Defaults to None. Using an SSD or ramdisk + buffer vastly improves tile extraction speed. + q_size (int): Size of queue when using a buffer. + Defaults to 2. + qc (str, optional): 'otsu', 'blur', 'both', or None. Perform blur + detection quality control - discarding tiles with detected + out-of-focus regions or artifact - and/or otsu's method. + Increases tile extraction time. Defaults to None. + report (bool): Save a PDF report of tile extraction. + Defaults to True. + normalizer (str, optional): Normalization strategy. + Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + whitespace_fraction (float, optional): Range 0-1. Discard tiles + with this fraction of whitespace. If 1, will not perform + whitespace filtering. Defaults to 1. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. + If 1, will not perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this threshold are + considered grayspace. + img_format (str, optional): 'png' or 'jpg'. Defaults to 'jpg'. + Image format to use in tfrecords. PNG (lossless) for fidelity, + JPG (lossy) for efficiency. + shuffle (bool, optional): Shuffle tiles prior to storage in + tfrecords. Defaults to True. + num_threads (int, optional): Number of worker processes for each + tile extractor. When using cuCIM slide reading backend, + defaults to the total number of available CPU cores, using the + 'fork' multiprocessing method. With Libvips, this defaults to + the total number of available CPU cores or 32, whichever is + lower, using 'spawn' multiprocessing. + qc_blur_radius (int, optional): Quality control blur radius for + out-of-focus area detection. Used if qc=True. Defaults to 3. + qc_blur_threshold (float, optional): Quality control blur threshold + for detecting out-of-focus areas. Only used if qc=True. + Defaults to 0.1 + qc_filter_threshold (float, optional): Float between 0-1. Tiles + with more than this proportion of blur will be discarded. + Only used if qc=True. Defaults to 0.6. + qc_mpp (float, optional): Microns-per-pixel indicating image + magnification level at which quality control is performed. + Defaults to mpp=4 (effective magnification 2.5 X) + dry_run (bool, optional): Determine tiles that would be extracted, + but do not export any images. Defaults to None. + max_tiles (int, optional): Only extract this many tiles per slide. + Defaults to None. + + Returns: + Dictionary mapping slide paths to each slide's SlideReport + (:class:`slideflow.slide.report.SlideReport`) + + """ + dataset = self.dataset( + tile_px, + tile_um, + filters=filters, + filter_blank=filter_blank, + verification='slides' + ) + return dataset.extract_tiles(**kwargs) + + def gan_train( + self, + dataset: Dataset, + *, + model: str = 'stylegan3', + outcomes: Optional[Union[str, List[str]]] = None, + exp_label: Optional[str] = None, + mirror: bool = True, + metrics: Optional[Union[str, List[str]]] = None, + dry_run: bool = False, + normalizer: Optional[str] = None, + normalizer_source: Optional[str] = None, + tile_labels: Optional[str] = None, + crop: Optional[int] = None, + resize: Optional[int] = None, + **kwargs + ) -> None: + """Train a GAN network. + + Examples + Train StyleGAN2 from a Slideflow dataset. + + >>> P = sf.Project('/project/path') + >>> dataset = P.dataset(tile_px=512, tile_um=400) + >>> P.gan_train(dataset=dataset, exp_label="MyExperiment", ...) + + Train StyleGAN2 as a class-conditional network. + + >>> P.gan_train(..., outcomes='class_label') + + Train using a pretrained network. + + >>> P.gan_train(..., resume='/path/to/network.pkl') + + Train with multiple GPUs. + + >>> P.gan_train(..., gpus=4) + + Args: + dataset (:class:`slideflow.Dataset`): Training dataset. + + Keyword Args: + allow_tf32 (bool): Allow internal use of Tensorflow-32. + Option only available for StyleGAN2. Defaults to True. + aug (str): Augmentation mode. Options include 'ada', + 'noaug', 'fixed'. Defaults to 'ada'. + augpipe (str): Augmentation pipeline. Options include + 'blit', 'geom', 'color', 'filter', 'noise', 'cutout', 'bg', + 'bgc', 'bgcfnc'. Only available for StyleGAN2. + Defaults to 'bgcfnc'. + batch (int, optional): Override batch size set by `cfg`. + cfg (str): StyleGAN2 base configuration. Options include + 'auto', 'stylegan2', 'paper256', 'paper512', 'paper1024', and + 'cifar'. Defaults to 'auto'. + dry_run (bool): Set up training but do not execute. + Defaults to False. + exp_label (str, optional): Experiment label. Defaults to None. + freezed (int): Freeze this many discriminator layers. + Defaults to 0. + fp32 (bool, optional): Disable mixed-precision training. Defaults + to False. + gamma (float, optional): Override R1 gamma from configuration + (set with `cfg`). + gpus (int): Number GPUs to train on in parallel. Defaults + to 1. + kimg (int): Override training duration in kimg (thousand + images) set by `cfg`. Most configurations default to 25,000 + kimg (25 million images). + lazy_resume (bool). Allow lazy loading from saved pretrained + networks, for example to load a non-conditional network + when training a conditional network. Defaults to False. + mirror (bool): Randomly flip/rotate images during + training. Defaults to True. + metrics (str, list(str), optional): Metrics to calculate during + training. Options include 'fid50k', 'is50k', 'ppl_zfull', + 'ppl_wfull', 'ppl_zend', 'ppl2_wend', 'ls', and 'pr50k3'. + Defaults to None. + model (str): Architecture to train. Valid model architectures + include "stylegan2" and "stylegan3". Defaults to "stylegan3". + nhwc (bool): Use NWHC memory format with FP16. Defaults to False. + nobench (bool): Disable cuDNN benchmarking. Defaults to False. + outcomes (str, list(str), optional): Class conditioning outcome + labels for training a class-conditioned GAN. If not provided, + trains an unconditioned GAN. Defaults to None. + tile_labels (str, optional): Path to pandas dataframe with + tile-level labels. The dataframe should be indexed by tile name, + where the name of the tile follows the format: + [slide name]-[tile x coordinate]-[tile y coordinate], e.g.: + ``slide1-251-666``. The dataframe should have a single column + with the name 'label'. Labels can be categorical or continuous. + If categorical, the labels should be onehot encoded. + crop (int, optional): Randomly crop images to this target size + during training. This permits training a smaller network + (e.g. 256 x 256) on larger images (e.g. 299 x 299). + Defaults to None. + resize (int, optional): Resize images to this target size + during training. This permits training a smaller network + (e.g. 256 x 256) on larger images (e.g. 299 x 299). + If both ``crop`` and ``resize`` are provided, cropping + will be performed first. Defaults to None. + resume (str): Load previous network. Options include + 'noresume' , 'ffhq256', 'ffhq512', 'ffhqq1024', 'celebahq256', + 'lsundog256', <file>, or <url>. Defaults to 'noresume'. + snap (int): Snapshot interval for saving network and + example images. Defaults to 50 ticks. + + """ + # Validate the method and import the appropriate submodule + supported_models = ('stylegan2', 'stylegan3') + if model not in supported_models: + raise ValueError(f"Unknown method '{model}'. Valid methods " + f"include: {', '.join(supported_models)}") + try: + if model == 'stylegan2': + from slideflow.gan.stylegan2 import stylegan2 as network + elif model == 'stylegan3': + from slideflow.gan.stylegan3 import stylegan3 as network # type: ignore + except ImportError: + raise ImportError("StyleGAN functions require 'slideflow-noncommercial'. " + "Please install with 'pip install slideflow-noncommercial'") + if metrics is not None: + log.warn( + "StyleGAN2 metrics are not fully implemented for Slideflow." + ) + + # Setup directories + gan_root = join(self.root, 'gan') + if not exists(gan_root): + os.makedirs(gan_root) + if exp_label is None: + exp_label = 'gan_experiment' + gan_dir = sf.util.get_new_model_dir(gan_root, exp_label) + + # Write GAN configuration + config_loc = join(gan_dir, 'slideflow_config.json') + config = dict( + project_path=self.root, + tile_px=dataset.tile_px, + tile_um=dataset.tile_um, + model_type='classification', + outcome_label_headers=outcomes, + filters=dataset._filters, + filter_blank=dataset._filter_blank, + min_tiles=dataset._min_tiles, + tile_labels=tile_labels, + crop=crop, + resize=resize + ) + if normalizer: + config['normalizer_kwargs'] = dict( + normalizer=normalizer, + normalizer_source=normalizer_source + ) + sf.util.write_json(config, config_loc) + + # Train the GAN + network.train.train( + ctx=None, + outdir=gan_dir, + dry_run=dry_run, + slideflow=config_loc, + cond=(outcomes is not None or tile_labels is not None), + mirror=mirror, + metrics=metrics, + **kwargs) + + def gan_generate( + self, + network_pkl: str, + out: str, + seeds: List[int], + **kwargs + ) -> None: + """Generate images from a trained GAN network. + + Examples + Save images as ``.png`` for seeds 0-100. + + >>> network_pkl = '/path/to/trained/gan.pkl' + >>> P.gan_generate( + ... network_pkl, + ... out='/dir', + ... format='jpg', + ... seeds=range(100)) + + Save images in TFRecord format. + + >>> P.gan_generate(... out='target.tfrecords') + + Save images of class '0' for a class-conditional GAN. + + >>> P.gan_generate(..., class_idx=0) + + Resize GAN images (trained at 512 px / 400 um) to match a target + tile size (299 px / 302 um). + + >>> P.gan_generate( + ... ..., + ... gan_px=512, + ... gan_um=400, + ... target_px=299, + ... target_um=302) + + Args: + network_pkl (str): Path to a trained StyleGAN2 network (``.pkl``) + out (str): Directory in which to save generated images. + seeds (list(int)): Seeds for which images will be generated. + + Keyword args: + format (str, optional): Image format, either 'jpg' or 'png'. + Defaults to 'png'. + truncation_psi (float, optional): Truncation PSI. Defaults to 1. + noise_mode (str, optional): Either 'const', 'random', or 'none'. + Defaults to 'const'. + class_idx (int, optional): Class index to generate, for class- + conditional networks. Defaults to None. + save_projection (bool, optional): Save weight projection for each + generated image as an `.npz` file in the out directory. + Defaults to False. + resize (bool, optional): Crop/resize images to a target + micron/pixel size. Defaults to False. + gan_um (int, optional): Size of GAN images in microns. Used for + cropping/resizing images to a target size. Defaults to None. + gan_px (int, optional): Size of GAN images in pixels. Used for + cropping/resizing images to a target size. Defaults to None. + target_um (int, optional): Crop/resize GAN images to this micron + size. Defaults to None. + target_px (int, optional): Crop/resize GAN images to this pixel + size. Defaults to None. + + """ + from slideflow.gan.stylegan2 import stylegan2 + + stylegan2.generate.generate_images( + network_pkl, + outdir=out, + seeds=seeds, + **kwargs + ) + + @auto_dataset_allow_none + def generate_features( + self, + model: str, + dataset: Optional[Dataset] = None, + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + max_tiles: int = 0, + outcomes: Optional[List[str]] = None, + **kwargs: Any + ) -> sf.DatasetFeatures: + """Calculate layer activations. + + See :ref:`Layer activations <dataset_features>` for more information. + + Args: + model (str): Path to model + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate activations. If not supplied, calculate + activations for all tfrecords compatible with the model, + optionally using provided filters and filter_blank. + + Keyword Args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Only include slides with this minimum + number of tiles. Defaults to 0. + max_tiles (int, optional): Only include maximum of this many tiles + per slide. Defaults to 0 (all tiles). + outcomes (list, optional): Column header(s) in annotations file. + Used for category-level comparisons. Defaults to None. + layers (list(str)): Layers from which to generate activations. + Defaults to 'postconv'. + export (str): Path to CSV file. Save activations in CSV format. + Defaults to None. + cache (str): Path to PKL file. Cache activations at this location. + Defaults to None. + include_preds (bool): Generate and store logit predictions along + with layer activations. Defaults to True. + batch_size (int): Batch size to use when calculating activations. + Defaults to 32. + + Returns: + :class:`slideflow.DatasetFeatures` + + """ + if dataset is None: + raise ValueError( + 'Argument "dataset" is required when "model" is ' + 'an imagenet-pretrained model, or otherwise not a ' + 'saved Slideflow model.' + ) + + # Prepare dataset and annotations + dataset = dataset.clip(max_tiles) + if outcomes is not None: + labels = dataset.labels(outcomes, format='name')[0] + else: + labels = None + df = sf.DatasetFeatures(model=model, + dataset=dataset, + annotations=labels, + **kwargs) + return df + + @auto_dataset_allow_none + def generate_feature_bags( + self, + model: Union[str, "BaseFeatureExtractor"], + dataset: Optional[Dataset] = None, + outdir: str = 'auto', + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 16, + max_tiles: int = 0, + **kwargs: Any + ) -> str: + """Generate bags of tile-level features for slides for use with MIL models. + + By default, features are exported to the ``pt_files`` folder + within the project root directory. + + Args: + model (str): Path to model from which to generate activations. + May provide either this or "pt_files" + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate activations. If not supplied, calculate + activations for all tfrecords compatible with the model, + optionally using provided filters and filter_blank. + outdir (str, optional): Save exported activations in .pt format. + Defaults to 'auto' (project directory). + + Keyword Args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Only include slides with this minimum + number of tiles. Defaults to 16. + max_tiles (int, optional): Only include maximum of this many tiles + per slide. Defaults to 0 (all tiles). + layers (list): Which model layer(s) generate activations. + If ``model`` is a saved model, this defaults to 'postconv'. + Not used if ``model`` is pretrained feature extractor. + Defaults to None. + force_regenerate (bool): Forcibly regenerate activations + for all slides even if .pt file exists. Defaults to False. + min_tiles (int, optional): Minimum tiles per slide. Skip slides + not meeting this threshold. Defaults to 16. + batch_size (int): Batch size during feature calculation. + Defaults to 32. + slide_batch_size (int): Interleave feature calculation across + this many slides. Higher values may improve performance + but require more memory. Defaults to 16. + num_gpus (int): Number of GPUs to use for feature extraction. + Defaults to 0. + **kwargs: Additional keyword arguments are passed to + :class:`slideflow.DatasetFeatures`. + + Returns: + Path to directory containing exported .pt files + + """ + # Check if the model exists and has a valid parameters file + if isinstance(model, str) and exists(model) and dataset is None: + log.debug(f"Auto-building dataset from provided model {model}") + config = sf.util.get_model_config(model) + dataset = self.dataset( + tile_px=config['tile_px'], + tile_um=config['tile_um'], + min_tiles=min_tiles + ) + elif dataset is None: + raise ValueError( + 'Argument "dataset" is required when "model" is ' + 'an imagenet-pretrained model, or otherwise not a ' + 'saved Slideflow model.' + ) + + # Ensure min_tiles and max_tiles is applied to the dataset. + # max_tiles has already been applied via @auto_dataset decorator. + dataset = dataset.filter(min_tiles=min_tiles) + + # Prepare output directory + if outdir.lower() == 'auto': + # Check if the model is an architecture name + # (for using an Imagenet pretrained model) + if isinstance(model, str) and sf.model.is_extractor(model): + outdir = join(self.root, 'pt_files', model) + # Check if the model is a trained model + elif isinstance(model, str) and exists(model): + config = sf.util.get_model_config(model) + if 'k_fold_i' in config: + _end = f"_kfold{config['k_fold_i']}" + else: + _end = '' + outdir = join( + self.root, 'pt_files', config['model_name'] + _end + ) + # Otherwise, it's a pretrained feature extractor + # and the subdirectory can be named by its tag. + else: + from slideflow.model.base import BaseFeatureExtractor + if isinstance(model, BaseFeatureExtractor): + outdir = join(self.root, 'pt_files', model.tag) + + # Generate feature bags. + dataset.generate_feature_bags(model, outdir, **kwargs) + + return outdir + + @auto_dataset + def generate_heatmaps( + self, + model: str, + *, + dataset: Dataset, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + outdir: Optional[str] = None, + resolution: str = 'low', + batch_size: int = 32, + roi_method: str = 'auto', + num_threads: Optional[int] = None, + img_format: str = 'auto', + skip_completed: bool = False, + verbose: bool = True, + **kwargs: Any + ) -> None: + """Create predictive heatmap overlays on a set of slides. + + By default, heatmaps are saved in the ``heatmaps/`` folder + in the project root directory. + + Args: + model (str): Path to Tensorflow model. + + Keyword Args: + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate predictions. If not supplied, will + generate predictions for all project tfrecords at the + tile_px/tile_um matching the model, optionally using provided + filters and filter_blank. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Minimum tiles per slide. Skip slides + not meeting this threshold. Defaults to 8. + outdir (path, optional): Directory in which to save heatmap images. + resolution (str, optional): Heatmap resolution. Defaults to 'low'. + "low" uses a stride equal to tile width. + "medium" uses a stride equal 1/2 tile width. + "high" uses a stride equal to 1/4 tile width. + batch_size (int, optional): Batch size during heatmap calculation. + Defaults to 64. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + num_threads (int, optional): Number of workers threads for each + tile extractor. Defaults to the total number of available + CPU threads. + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. + skip_completed (bool, optional): Skip heatmaps for slides that + already have heatmaps in target directory. + show_roi (bool): Show ROI on heatmaps. + interpolation (str): Interpolation strategy for predictions. + Defaults to None. + Includes all matplotlib imshow interpolation options. + logit_cmap: Function or a dict used to create heatmap colormap. + If None (default), separate heatmaps are generated for each + category, with color representing category prediction. + Each image tile will generate a list of preds of length O, + If logit_cmap is a function, then the logit predictions will + be passed, where O is the number of label categories. + and the function is expected to return [R, G, B] values. + If the logit_cmap is a dictionary, it should map 'r', 'g', and + 'b' to label indices; the prediction for these label categories + will be mapped to corresponding colors. Thus, the corresponding + color will only reflect predictions of up to three labels. + Example (this would map predictions for label 0 to red, 3 to + green, etc): {'r': 0, 'g': 3, 'b': 1 } + verbose (bool): Show verbose output. Defaults to True. + vmin (float): Minimimum value to display on heatmap. Defaults to 0. + vcenter (float): Center value for color display on heatmap. + Defaults to 0.5. + vmax (float): Maximum value to display on heatmap. Defaults to 1. + + """ + # Prepare arguments for subprocess + args = SimpleNamespace(**locals()) + del args.self + + # Prepare dataset + config = sf.util.get_model_config(model) + args.rois = dataset.rois() + + # Set resolution / stride + resolutions = {'low': 1, 'medium': 2, 'high': 4} + try: + stride_div = resolutions[resolution] + except KeyError: + raise ValueError(f"Invalid resolution '{resolution}'.") + args.stride_div = stride_div + args.verbosity = self.verbosity # Set logging level in subprocess + args.img_format = img_format + + # Attempt to auto-detect supplied model name + model_name = os.path.basename(model) + if 'model_name' in config: + model_name = config['model_name'] + + # Make output directory + outdir = outdir if outdir else join(self.root, 'heatmaps', model_name) + if not exists(outdir): + os.makedirs(outdir) + args.outdir = outdir + + # Verbose output + if verbose: + n_poss_slides = len(dataset.slides()) + n_slides = len(dataset.slide_paths()) + log.info("Generating heatmaps for {} slides.".format(n_slides)) + log.info("Model: [green]{}".format(model)) + log.info("Tile px: {}".format(config['tile_px'])) + log.info("Tile um: {}".format(config['tile_um'])) + + # Any function loading a slide must be kept in an isolated process, + # as loading >1 slide in a single process causes instability. + # I suspect this is a libvips or openslide issue but I haven't been + # able to identify the root cause. Isolating processes when multiple + # slides are to be processed sequentially is a functional workaround. + for slide in dataset.slide_paths(): + name = path_to_name(slide) + if (skip_completed and exists(join(outdir, f'{name}-custom.png'))): + log.info(f'Skipping completed heatmap for slide {name}') + return + + ctx = multiprocessing.get_context('spawn') + process = ctx.Process(target=project_utils._heatmap_worker, + args=(slide, args, kwargs)) + process.start() + process.join() + + def generate_mosaic( + self, + df: "DatasetFeatures", + dataset: Optional[Dataset] = None, + *, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + outcomes: Optional[Union[str, List[str]]] = None, + map_slide: Optional[str] = None, + show_prediction: Optional[Union[int, str]] = None, + predict_on_axes: Optional[List[int]] = None, + max_tiles: int = 0, + umap_cache: Optional[str] = None, + use_float: bool = False, + low_memory: bool = False, + use_norm: bool = True, + umap_kwargs: Dict = {}, + **kwargs: Any + ) -> sf.Mosaic: + """Generate a mosaic map. + + See :ref:`Mosaic maps <mosaic_map>` for more information. + + Args: + df (:class:`slideflow.DatasetFeatures`): Dataset. + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate mosaic. If not supplied, will generate + mosaic for all tfrecords at the tile_px/tile_um matching + the supplied model, optionally using filters/filter_blank. + + Keyword Args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + outcomes (list, optional): Column name in annotations file from + which to read category labels. + map_slide (str, optional): None (default), 'centroid' or 'average'. + If provided, will map slides using slide-level calculations, + either mapping centroid tiles if 'centroid', or calculating + node averages across tiles in a slide and mapping slide-level + node averages, if 'average'. + show_prediction (int or str, optional): May be either int or str, + corresponding to label category. Predictions for this category + will be displayed on the exported UMAP plot. + max_tiles (int, optional): Limits tiles taken from each slide. + Defaults to 0. + umap_cache (str, optional): Path to PKL file in which to save/cache + UMAP coordinates. Defaults to None. + use_float (bool, optional): Interpret labels as continuous instead + of categorical. Defaults to False. + umap_kwargs (dict, optional): Dictionary of keyword arguments to + pass to the UMAP function. + low_memory (bool, optional): Limit memory during UMAP calculations. + Defaults to False. + use_norm (bool, optional): Display image tiles using the normalizer + used during model training (if applicable). Detected from + a model's metadata file (params.json). Defaults to True. + figsize (Tuple[int, int], optional): Figure size. Defaults to + (200, 200). + num_tiles_x (int): Specifies the size of the mosaic map grid. + expanded (bool): Deprecated argument. + + Returns: + :class:`slideflow.Mosaic`: Mosaic object. + + """ + # Set up paths + stats_root = join(self.root, 'stats') + mosaic_root = join(self.root, 'mosaic') + if not exists(stats_root): + os.makedirs(stats_root) + if not exists(mosaic_root): + os.makedirs(mosaic_root) + + # Prepare dataset & model + if isinstance(df.model, str): + config = sf.util.get_model_config(df.model) + else: + raise ValueError( + "Unable to auto-create Mosaic from DatasetFeatures created " + "from a loaded Tensorflow/PyTorch model. Please use a " + "DatasetFeatures object created from a saved Slideflow model, " + "or manually create a mosaic with `sf.Mosaic`.") + if dataset is None: + tile_px, tile_um = config['hp']['tile_px'], config['hp']['tile_um'] + dataset = self.dataset(tile_px=tile_px, tile_um=tile_um) + else: + dataset._assert_size_matches_hp(config['hp']) + tile_px = dataset.tile_px + + # Filter and clip dataset + dataset = dataset.filter(filters=filters, filter_blank=filter_blank) + dataset = dataset.clip(max_tiles) + + # Get TFrecords, and prepare a list for focus, if requested + tfr = dataset.tfrecords() + n_slides = len([t for t in tfr if path_to_name(t) in df.slides]) + log.info(f'Generating mosaic from {n_slides} slides') + + # If a header category is supplied and we are not showing predictions, + # then assign slide labels from annotations + model_type = config['model_type'] + if model_type == 'regression': + use_float = True + if outcomes and (show_prediction is None): + labels, _ = dataset.labels(outcomes, + use_float=use_float, + format='name') + else: + labels = {} # type: ignore + + # If showing predictions, try to automatically load prediction labels + if (show_prediction is not None) and (not use_float): + outcome_labels = config['outcome_labels'] + model_type = model_type if model_type else config['model_type'] + log.info(f'Loaded pred labels found at [green]{df.model}') + + # Create mosaic map from UMAP of layer activations + umap = sf.SlideMap.from_features( + df, + map_slide=map_slide, + low_memory=low_memory, + **umap_kwargs + ) + if umap_cache: + umap.save_coordinates(umap_cache) + # If displaying centroid AND predictions, show slide-level predictions + # rather than tile-level predictions + if (map_slide == 'centroid') and show_prediction is not None: + log.info('Showing slide-level predictions at point of centroid') + + # If not model has not been assigned, assume classification model + model_type = model_type if model_type else 'classification' + + # Get predictions + if model_type == 'classification': + s_pred = df.softmax_predict() + s_perc = df.softmax_percent() + else: + s_pred = s_perc = df.softmax_mean() # type: ignore + + # If show_prediction is provided (either a number or string), + # then display ONLY the prediction for the provided category + if type(show_prediction) == int: + log.info(f'Showing preds for {show_prediction} as colormap') + labels = { + k: v[show_prediction] for k, v in s_perc.items() + } + show_prediction = None + elif type(show_prediction) == str: + log.info(f'Showing preds for {show_prediction} as colormap') + reversed_labels = {v: k for k, v in outcome_labels.items()} + if show_prediction not in reversed_labels: + raise ValueError(f"Unknown category '{show_prediction}'") + labels = { + k: v[int(reversed_labels[show_prediction])] + for k, v in s_perc.items() + } + show_prediction = None + elif use_float: + # Displaying linear predictions needs to be implemented here + raise NotImplementedError( + "Showing slide preds not supported for regression models." + ) + # Otherwise, show_prediction is assumed to be just "True", + # in which case show categorical predictions + else: + try: + labels = { + k: outcome_labels[v] for k, v in s_pred.items() + } + except KeyError: + # Try interpreting prediction label keys as strings + labels = { + k: outcome_labels[str(v)] for k, v in s_pred.items() + } + + if labels: + umap.label_by_slide(labels) + if show_prediction and (map_slide != 'centroid'): + umap.label('predictions', translate=outcome_labels) + umap.filter(dataset.slides()) + + mosaic = sf.Mosaic( + umap, + tfrecords=dataset.tfrecords(), + normalizer=(df.normalizer if use_norm else None), + **kwargs + ) + return mosaic + + def generate_mosaic_from_annotations( + self, + header_x: str, + header_y: str, + *, + dataset: Dataset, + model: Optional[str] = None, + outcomes: Optional[Union[str, List[str]]] = None, + max_tiles: int = 100, + use_optimal_tile: bool = False, + cache: Optional[str] = None, + batch_size: int = 32, + **kwargs: Any + ) -> sf.Mosaic: + """Generate a mosaic map with manually supplied x/y coordinates. + + Slides are mapped with slide-level annotations, with x-axis determined + from ``header_x``, y-axis from ``header_y``. If + ``use_optimal_tile=False`` and no model is provided, the first image + tile in each TFRecord will be displayed. If optimal_tile is True, layer + activations for all tiles in each slide are calculated using the + provided model, and the tile nearest to centroid is used. + + Args: + header_x (str): Annotations file header with X-axis coords. + header_y (str): Annotations file header with Y-axis coords. + + Keyword Args: + dataset (:class:`slideflow.Dataset`): Dataset object. + model (str, optional): Path to model to use when + generating layer activations. + Defaults to None. + If not provided, mosaic will not be calculated or saved. + If provided, saved in project mosaic directory. + outcomes (list(str)): Column name(s) in annotations file from which + to read category labels. + max_tiles (int, optional): Limits the number of tiles taken from + each slide. Defaults to 0. + use_optimal_tile (bool, optional): Use model to calculate layer + activations for all tiles in each slide, and choosing tile + nearest centroid for each slide for display. + cache (str, optional): Path to PKL file to cache node + activations. Defaults to None. + batch_size (int, optional): Batch size for model. Defaults to 64. + figsize (Tuple[int, int], optional): Figure size. Defaults to + (200, 200). + num_tiles_x (int): Specifies the size of the mosaic map grid. + expanded (bool): Deprecated argument. + + Returns: + slideflow.Mosaic + + """ + # Setup paths + stats_root = join(self.root, 'stats') + mosaic_root = join(self.root, 'mosaic') + if not exists(stats_root): + os.makedirs(stats_root) + if not exists(mosaic_root): + os.makedirs(mosaic_root) + + # Filter dataset to exclude slides blank in the x and y header columns + dataset = dataset.filter(filter_blank=[header_x, header_y]) + dataset = dataset.clip(max_tiles) + + # We are assembling a list of slides from the TFRecords path list, + # because we only want to use slides that have a corresponding TFRecord + # (some slides did not have a large enough ROI for tile extraction + # & some slides may be in the annotations but are missing a slide) + slides = [path_to_name(tfr) for tfr in dataset.tfrecords()] + labels, _ = dataset.labels([header_x, header_y], use_float=True) + umap_x = np.array([labels[slide][0] # type: ignore + for slide in slides]) + umap_y = np.array([labels[slide][1] # type: ignore + for slide in slides]) + + if use_optimal_tile and model is None: + raise ValueError("Optimal tile calculation requires a model.") + elif use_optimal_tile: + # Calculate most representative tile in each TFRecord for display + assert model is not None + df = sf.DatasetFeatures(model=model, + dataset=dataset, + batch_size=batch_size, + cache=cache) + opt_ind, _ = sf.stats.calculate_centroid(df.activations) + + # Restrict mosaic to only slides that had enough tiles to + # calculate an optimal index from centroid + success_slides = list(opt_ind.keys()) + sf.util.multi_warn( + slides, + lambda x: x not in success_slides, + 'Unable to calculate optimal tile for {}, skipping' + ) + umap_x = np.array([ + labels[slide][0] # type: ignore + for slide in success_slides + ]) + umap_y = np.array([ + labels[slide][1] # type: ignore + for slide in success_slides + ]) + umap_slides = np.array(success_slides) + umap_tfr_idx = np.array([ + opt_ind[slide] for slide in success_slides + ]) + else: + # Take the first tile from each slide/TFRecord + umap_slides = np.array(slides) + umap_tfr_idx = np.zeros(len(slides)) + + umap = sf.SlideMap.from_xy( + x=umap_x, + y=umap_y, + slides=umap_slides, + tfr_index=umap_tfr_idx, + ) + if outcomes is not None: + slide_to_category, _ = dataset.labels(outcomes, format='name') + umap.label_by_slide(slide_to_category) + + mosaic = sf.Mosaic( + umap, + tfrecords=dataset.tfrecords(), + tile_select='centroid' if use_optimal_tile else 'first', + **kwargs + ) + return mosaic + + def generate_tfrecord_heatmap( + self, + tfrecord: str, + tile_px: int, + tile_um: Union[int, str], + tile_dict: Dict[int, float], + filename: Optional[str] = None + ) -> None: + """Create a tfrecord-based WSI heatmap. + + Uses a dictionary of tile values for heatmap display, saving to project + root directory. + + Args: + tfrecord (str): Path to tfrecord + tile_dict (dict): Dictionary mapping tfrecord indices to a + tile-level value for display in heatmap format + tile_px (int): Tile width in pixels + tile_um (int or str): Tile width in microns (int) or magnification + (str, e.g. "20x"). + filename (str, optional): Destination path to save heatmap. + Defaults to saving as ``{slide_name}.png`` in the project + root directory. + + Returns: + None + + """ + dataset = self.dataset(tile_px=tile_px, tile_um=tile_um) + if filename is None: + filename = join(self.root, sf.util.path_to_name(tfrecord) + '.png') + dataset.tfrecord_heatmap(tfrecord, tile_dict, filename) + + def inspect_tfrecords(self): + """Inspect TFRecords in the project dataset configuration.""" + from rich import print as rprint + + config = sf.util.load_json(self.dataset_config) + rprint("[b]Dataset sources:[/]") + for source in self.sources: + rprint(". {}".format(source)) + if source not in config: + rprint(" {}: Source not found in dataset" + " configuration".format(source)) + continue + if 'tfrecords' not in config[source]: + rprint(" {}: TFRecords directory not set".format(source)) + continue + tfr_path = config[source]['tfrecords'] + if not exists(tfr_path): + rprint(" {}: TFRecords directory not found".format(source)) + continue + subdirs = [f for f in os.listdir(tfr_path) + if isdir(join(tfr_path, f))] + for subdir in subdirs: + # Check if this is a valid subdir with a tile size label + # (e.g. "256px_10um" or "256px_20x") + if re.match(r'\d+px_\d+(um|x)$', subdir): + px_str, um_str = subdir.split('_') + _tile_px = px_str.split('px')[0] + _tile_um = um_str.split('um')[0] if 'um' in um_str else um_str.split('x')[0] + tfr_files = [f for f in os.listdir(join(tfr_path, subdir)) + if f.endswith('.tfrecords')] + rprint(" tile_px={}, tile_um={}: {} TFRecords".format( + _tile_px, _tile_um, len(tfr_files) + )) + + def dataset( + self, + tile_px: Optional[int] = None, + tile_um: Optional[Union[int, str]] = None, + *, + verification: Optional[str] = 'both', + **kwargs: Any + ) -> Dataset: + """Return a :class:`slideflow.Dataset` object using project settings. + + Args: + tile_px (int): Tile size in pixels + tile_um (int or str): Tile size in microns (int) or magnification + (str, e.g. "20x"). + + Keyword Args: + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Min tiles a slide must have. + Defaults to 0. + config (str, optional): Path to dataset configuration JSON file. + Defaults to project default. + sources (str, list(str), optional): Dataset sources to use from + configuration. Defaults to project default. + verification (str, optional): 'tfrecords', 'slides', or 'both'. + If 'slides', verify all annotations are mapped to slides. + If 'tfrecords', check that TFRecords exist and update manifest. + Defaults to 'both'. + + """ + if 'config' not in kwargs: + kwargs['config'] = self.dataset_config + if 'sources' not in kwargs: + kwargs['sources'] = self.sources + try: + if self.annotations and exists(self.annotations): + annotations = self.annotations + else: + annotations = None + dataset = Dataset( + tile_px=tile_px, + tile_um=tile_um, + annotations=annotations, + **kwargs + ) + except FileNotFoundError: + raise errors.DatasetError('No datasets configured.') + if verification in ('both', 'slides'): + log.debug("Verifying slide annotations...") + dataset.verify_annotations_slides() + if verification in ('both', 'tfrecords'): + log.debug("Verifying tfrecords...") + dataset.update_manifest() + return dataset + + @auto_dataset + def predict( + self, + model: str, + *, + dataset: Dataset, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + checkpoint: Optional[str] = None, + eval_k_fold: Optional[int] = None, + splits: str = "splits.json", + max_tiles: int = 0, + batch_size: int = 32, + format: str = 'csv', + input_header: Optional[Union[str, List[str]]] = None, + mixed_precision: bool = True, + allow_tf32: bool = False, + load_method: str = 'weights', + custom_objects: Optional[Dict[str, Any]] = None, + **kwargs: Any + ) -> Dict[str, pd.DataFrame]: + """Generate model predictions on a set of tfrecords. + + Args: + model (str): Path to model to evaluate. + + Keyword Args: + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate predictions. If not supplied, will + generate predictions for all project tfrecords at the + tile_px/tile_um matching the model, optionally using provided + filters and filter_blank. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Min tiles a slide must have + to be included. Defaults to 0. + checkpoint (str, optional): Path to cp.ckpt file, if evaluating a + saved checkpoint. Defaults to None. + eval_k_fold (int, optional): K-fold iteration number to evaluate. + If None, will evaluate all tfrecords irrespective of K-fold. + Defaults to None. + splits (str, optional): Filename of JSON file in which to log + training/validation splits. Looks for filename in project root + directory. Defaults to "splits.json". + max_tiles (int, optional): Maximum number of tiles from each slide + to evaluate. If zero, will include all tiles. Defaults to 0. + batch_size (int, optional): Batch size to use during prediction. + Defaults to 32. + format (str, optional): Format in which to save predictions. + Either 'csv', 'feather', or 'parquet'. Defaults to 'parquet'. + input_header (str, optional): Annotation column header to use as + additional input. Defaults to None. + mixed_precision (bool, optional): Enable mixed precision. + Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model + with ``tf.keras.models.load_model()``. If 'weights', will read + the ``params.json`` configuration file, build the model + architecture, and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + custom_objects (dict, Optional): Dictionary mapping names + (strings) to custom classes or functions. Defaults to None. + + Returns: + Dictionary of predictions dataframes, with the keys 'tile', + 'slide', and 'patient'. + + """ + # Perform evaluation + log.info('Predicting model results') + trainer, eval_dts = self._prepare_trainer( + model=model, + dataset=dataset, + checkpoint=checkpoint, + eval_k_fold=eval_k_fold, + splits=splits, + max_tiles=max_tiles, + input_header=input_header, + mixed_precision=mixed_precision, + allow_tf32=allow_tf32, + load_method=load_method, + custom_objects=custom_objects, + ) + + # Load the model + if isinstance(model, str): + trainer.load(model, training=False) + if checkpoint: + if trainer.feature_sizes: + n_features = sum(trainer.feature_sizes) + else: + n_features = 0 + trainer.model = trainer.hp.build_model( + labels=trainer.labels, + num_slide_features=n_features + ) + trainer.model.load_weights(checkpoint) + + # Predict + results = trainer.predict( + dataset=eval_dts, + batch_size=batch_size, + format=format, + **kwargs + ) + return results + + def predict_ensemble( + self, + model: str, + k: Optional[int] = None, + epoch: Optional[int] = None, + **kwargs + ) -> None: + """Evaluate an ensemble of models on a given set of tfrecords. + + Args: + model (str): Path to ensemble model to evaluate. + + Keyword Args: + k (int, optional): The k-fold number to be considered + to run the prediction. By default it sets to the first k-fold + present in the ensemble folder. + epoch (int, optional): The epoch number to be considered + to run the prediction. By default it sets to the first epoch + present in the selected k-fold folder. + **kwargs (Any): All keyword arguments accepted by + :meth:`slideflow.Project.predict()` + + """ + if not exists(model): + raise OSError(f"Path {model} not found") + + config = sf.util.get_ensemble_model_config(model) + outcomes = f"{'-'.join(config['outcomes'])}" + model_name = f"eval-ensemble-{outcomes}" + main_eval_dir = sf.util.get_new_model_dir(self.eval_dir, model_name) + + member_paths = sorted([ + join(model, x) for x in os.listdir(model) + if isdir(join(model, x)) + ]) + # Generate predictions from each ensemble member, + # and merge predictions into a single dataframe. + for member_id, member_path in enumerate(member_paths): + if k: + _k_path = get_matching_directory(member_path, f'kfold{k}') + else: + _k_path = get_first_nested_directory(member_path) + if epoch: + prediction_path = get_matching_directory( + _k_path, f'epoch{epoch}' + ) + else: + prediction_path = get_first_nested_directory(_k_path) + + # Update the current evaluation directory. + member_eval_dir = sf.util.get_new_model_dir( + main_eval_dir, + f"ensemble_{member_id+1}" + ) + with self._set_eval_dir(member_eval_dir): + self.predict(prediction_path, **kwargs) + # If this is the first ensemble member, copy the slide manifest + # and params.json file into the ensemble prediction folder. + if member_id == 0: + _, path = sf.util.get_valid_model_dir(self.eval_dir) + shutil.copyfile( + join(self.eval_dir, path[0], "slide_manifest.csv"), + join(main_eval_dir, "slide_manifest.csv") + ) + params = sf.util.load_json( + join(self.eval_dir, path[0], "params.json") + ) + params['ensemble_epochs'] = params['hp']['epochs'] + del params['hp'] + sf.util.write_json( + params, + join(main_eval_dir, "ensemble_params.json") + ) + + # Create (or add to) the ensemble dataframe. + for level in ('slide', 'tile'): + project_utils.add_to_ensemble_dataframe( + ensemble_path=main_eval_dir, + kfold_path=join(self.eval_dir, path[0]), + level=level, + member_id=member_id + ) + # Create new ensemble columns and rename fixed columns. + for level in ('tile', 'slide'): + project_utils.update_ensemble_dataframe_headers( + ensemble_path=main_eval_dir, + level=level, + ) + + @auto_dataset + def predict_wsi( + self, + model: str, + outdir: str, + *, + dataset: Dataset, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + stride_div: int = 1, + enable_downsample: bool = True, + roi_method: str = 'auto', + source: Optional[str] = None, + img_format: str = 'auto', + randomize_origin: bool = False, + **kwargs: Any + ) -> None: + """Generate a map of predictions across a whole-slide image. + + Args: + model (str): Path to model from which to generate predictions. + outdir (str): Directory for saving WSI predictions in .pkl format. + + Keyword Args: + dataset (:class:`slideflow.Dataset`, optional): Dataset + from which to generate activations. If not supplied, will + calculate activations for all tfrecords at the tile_px/tile_um + matching the supplied model. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + min_tiles (int, optional): Min tiles a slide must have + to be included. Defaults to 0. + stride_div (int, optional): Stride divisor for extracting tiles. + A stride of 1 will extract non-overlapping tiles. + A stride_div of 2 will extract overlapping tiles, with a stride + equal to 50% of the tile width. Defaults to 1. + enable_downsample (bool, optional): Enable downsampling for slides. + This may result in corrupted image tiles if downsampled slide + layers are corrupted or incomplete. Defaults to True. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + source (list, optional): Name(s) of dataset sources from which to + get slides. If None, will use all. + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. + randomize_origin (bool, optional): Randomize pixel starting + position during extraction. Defaults to False. + whitespace_fraction (float, optional): Range 0-1. Defaults to 1. + Discard tiles with this fraction of whitespace. + If 1, will not perform whitespace filtering. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. + If 1, will not perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this are grayspace. + + """ + log.info('Generating WSI prediction / activation maps...') + if not exists(outdir): + os.makedirs(outdir) + + if source: + sources = sf.util.as_list(source) + else: + sources = self.sources + if dataset.tile_px is None or dataset.tile_um is None: + raise errors.DatasetError( + "Dataset must have non-zero tile_px and tile_um" + ) + # Prepare dataset & model + if img_format == 'auto': + config = sf.util.get_model_config(model) + img_format = config['img_format'] + + # Log extraction parameters + sf.slide.log_extraction_params(**kwargs) + + for source in sources: + log.info(f'Working on dataset source [bold]{source}') + if dataset._roi_set(source): + roi_dir = dataset.sources[source]['roi'] + else: + roi_dir = None + + # Prepare list of slides for extraction + slide_list = dataset.slide_paths(source=source) + log.info(f'Generating predictions for {len(slide_list)} slides') + + # Verify slides and estimate total number of tiles + log.info('Verifying slides...') + total_tiles = 0 + from rich.progress import track + for slide_path in track(slide_list, transient=True): + try: + slide = sf.WSI(slide_path, + dataset.tile_px, + dataset.tile_um, + stride_div, + roi_dir=roi_dir, + roi_method=roi_method) + except errors.SlideError as e: + log.error(e) + else: + n_est = slide.estimated_num_tiles + log.debug(f"Estimated tiles for {slide.name}: {n_est}") + total_tiles += n_est + finally: + del slide + log.info(f'Total estimated tiles: {total_tiles}') + + # Predict for each WSI + for slide_path in slide_list: + log.info(f'Working on slide {path_to_name(slide_path)}') + try: + wsi = sf.WSI(slide_path, + dataset.tile_px, + dataset.tile_um, + stride_div, + enable_downsample=enable_downsample, + roi_dir=roi_dir, + roi_method=roi_method, + origin='random' if randomize_origin else (0,0)) + except errors.SlideLoadError as e: + log.error(e) + continue + except errors.MissingROIError as e: + log.error(e) + continue + try: + interface = sf.model.Features(model, include_preds=False) + wsi_grid = interface(wsi, img_format=img_format) + + with open(join(outdir, wsi.name+'.pkl'), 'wb') as file: + pickle.dump(wsi_grid, file) + + except errors.TileCorruptionError: + log.error(f'[green]{path_to_name(slide_path)}[/] is ' + 'corrupt; skipping slide') + continue + + def save(self) -> None: + """Save current project configuration as ``settings.json``.""" + sf.util.write_json(self._settings, join(self.root, 'settings.json')) + + def _get_smac_runner( + self, + outcomes: Union[str, List[str]], + params: sf.ModelParams, + metric: Union[str, Callable], + n_replicates: int, + train_kwargs: Any + ) -> Callable: + """Build a SMAC3 optimization runner. + + Args: + outcomes (str, List[str]): Outcome label annotation header(s). + params (sf.ModelParams): Model parameters for training. + metric (str or Callable): Metric to monitor for optimization. + May be callable function or a str. If a callable function, must + accept the epoch results dict and return a float value. If + a str, must be a valid metric, such as 'tile_auc', + 'patient_auc', 'r_squared', etc. + train_kwargs (dict): Dict of keyword arguments used for the + Project.train() function call. + + Raises: + errors.SMACError: If training does not return the given metric. + + Returns: + Callable: tae_runner for SMAC optimization. + + """ + + def smac_runner(config): + """SMAC tae_runner function.""" + # Load hyperparameters from SMAC configuration, handling "None". + c = dict(config) + if 'normalizer' in c and c['normalizer'].lower() == 'none': + c['normalizer'] = None + if ('normalizer_source' in c + and c['normalizer_source'].lower() == 'none'): + c['normalizer_source'] = None + + all_results = [] + for _ in range(n_replicates): + # Train model(s). + pretty = json.dumps(c, indent=2) + log.info(f"Training model with config={pretty}") + params.load_dict(c) + _prior_logging_level = sf.getLoggingLevel() + sf.setLoggingLevel(40) + results = self.train( + outcomes=outcomes, + params=params, + **train_kwargs + ) + sf.setLoggingLevel(_prior_logging_level) + + # Interpret results. + model_name = list(results.keys())[0] + last_epoch = sorted(list(results[model_name]['epochs'].keys()), key=lambda x: int(x.replace("epoch", "")))[-1] + if len(results[model_name]['epochs']) > 1: + log.warning(f"Ambiguous epoch for SMAC. Using '{last_epoch}'.") + epoch_results = results[model_name]['epochs'][last_epoch] + + # Determine metric for optimization. + if callable(metric): + result = metric(epoch_results) + elif metric not in epoch_results: + raise errors.SMACError(f"Metric '{metric}' not returned from " + "training, unable to optimize.") + else: + if outcomes not in epoch_results[metric]: + raise errors.SMACError( + f"Unable to interpret metric {metric} (epoch results: " + f"{epoch_results})") + result = 1 - mean(epoch_results[metric][outcomes]) + all_results.append(result) + + # Average results across iterations + log.info("[green]Result ({})[/]: {:.4f}".format( + 'custom' if callable(metric) else f'1-{metric}', + result + )) + return mean(all_results) + + return smac_runner + + def smac_search( + self, + outcomes: Union[str, List[str]], + params: ModelParams, + smac_configspace: "ConfigurationSpace", + exp_label: str = "SMAC", + smac_limit: int = 10, + smac_metric: str = 'tile_auc', + smac_replicates: int = 1, + save_checkpoints: bool = False, + save_model: bool = False, + save_predictions: Union[bool, str] = False, + **train_kwargs: Any + ) -> Tuple["Configuration", pd.DataFrame]: + """Train a model using SMAC3 Bayesian hyperparameter optimization. + + See :ref:`Bayesian optimization <bayesian_optimization>` + for more information. + + .. note:: + + The hyperparameter optimization is performed with + `SMAC3 <https://automl.github.io/SMAC3/master/>`_ and requires the + ``smac`` package available from pip. + + Args: + outcomes (str, List[str]): Outcome label annotation header(s). + params (ModelParams): Model parameters for training. + smac_configspace (ConfigurationSpace): ConfigurationSpace to + determine the SMAC optimization. + smac_limit (int): Max number of models to train during + optimization. Defaults to 10. + smac_metric (str, optional): Metric to monitor for optimization. + May either be a callable function or a str. If a callable + function, must accept the epoch results dict and return a + float value. If a str, must be a valid metric, such as + 'tile_auc', 'patient_auc', 'r_squared', etc. + Defaults to 'tile_auc'. + save_checkpoints (bool): Save model checkpoints. Defaults to False. + save_model (bool): Save each trained model. Defaults to False. + save_predictions (bool or str, optional): Save tile, slide, and + patient-level predictions at each evaluation. May be 'csv', + 'feather', or 'parquet'. If False, will not save predictions. + Defaults to False. + + Returns: + Tuple: + + Configuration: Optimal hyperparameter configuration returned + by SMAC4BB.optimize(). + + pd.DataFrame: History of hyperparameters resulting metrics. + + """ + from smac.facade.smac_bb_facade import SMAC4BB # noqa: F811 + from smac.scenario.scenario import Scenario + + # Perform SMAC search in a single model folder. + smac_path = sf.util.get_new_model_dir(self.models_dir, exp_label) + _initial_models_dir = self.models_dir + self.models_dir = smac_path + + # Create SMAC scenario. + scenario = Scenario( + {'run_obj': 'quality', # Optimize quality (alternatively: runtime) + 'runcount-limit': smac_limit, # Max # of function evaluations + 'cs': smac_configspace}, + {'output_dir': self.models_dir}) + train_kwargs['save_checkpoints'] = save_checkpoints + train_kwargs['save_model'] = save_model + train_kwargs['save_predictions'] = save_predictions + smac = SMAC4BB( + scenario=scenario, + tae_runner=self._get_smac_runner( + outcomes=outcomes, + params=params, + metric=smac_metric, + train_kwargs=train_kwargs, + n_replicates=smac_replicates, + ) + ) + + # Log. + log.info("Performing Bayesian hyperparameter optimization with SMAC") + log.info( + "=== SMAC config ==========================================\n" + "[bold]Options:[/]\n" + f"Metric: {smac_metric}\n" + f"Limit: {smac_limit}\n" + f"Model replicates: {smac_replicates}\n" + "[bold]Base parameters:[/]\n" + f"{params}\n\n" + "[bold]Configuration space:[/]\n" + f"{smac_configspace}\n" + "==========================================================" + ) + + # Optimize. + best_config = smac.optimize() + log.info(f"Best configuration after SMAC optimization: {best_config}") + + # Process history and write to dataframe. + configs = smac.runhistory.get_all_configs() + history = pd.DataFrame([c.get_dictionary() for c in configs]) + history['metric'] = [smac.runhistory.get_cost(c) for c in configs] + history.to_csv(join(self.models_dir, 'run_history.csv'), index=False) + self.models_dir = _initial_models_dir + return best_config, history + + def train( + self, + outcomes: Union[str, List[str]], + params: Union[str, + ModelParams, + List[ModelParams], + Dict[str, ModelParams]], + *, + dataset: Optional[sf.Dataset] = None, + exp_label: Optional[str] = None, + filters: Optional[Dict] = None, + filter_blank: Optional[Union[str, List[str]]] = None, + input_header: Optional[Union[str, List[str]]] = None, + min_tiles: int = 0, + max_tiles: int = 0, + splits: str = "splits.json", + mixed_precision: bool = True, + allow_tf32: bool = False, + load_method: str = 'weights', + balance_headers: Optional[Union[str, List[str]]] = None, + process_isolate: bool = False, + **training_kwargs: Any + ) -> Dict: + """Train model(s). + + Models are trained using a given set of parameters, outcomes, + and (optionally) slide-level inputs. + + See :ref:`Training <training>` for more information. + + Examples + Method 1 (hyperparameter sweep from a configuration file): + + >>> P.train('outcome', params='sweep.json', ...) + + Method 2 (manually specified hyperparameters): + + >>> hp = sf.ModelParams(...) + >>> P.train('outcome', params=hp, ...) + + Method 3 (list of hyperparameters): + + >>> hp = [sf.ModelParams(...), sf.ModelParams(...)] + >>> P.train('outcome', params=hp, ...) + + Method 4 (dict of hyperparameters): + + >>> hp = {'HP0': sf.ModelParams(...), ...} + >>> P.train('outcome', params=hp, ...) + + Args: + outcomes (str or list(str)): Outcome label annotation header(s). + params (:class:`slideflow.ModelParams`, list, dict, or str): + Model parameters for training. May provide one ``ModelParams``, + a list, or dict mapping model names to params. If multiple + params are provided, will train models for each. If JSON file + is provided, will interpret as a hyperparameter sweep. See + examples below for use. + + Keyword Args: + exp_label (str, optional): Experiment label to add model names. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + filter_blank (list(str) or str, optional): Skip slides that have + blank values in these patient annotation columns. + Defaults to None. + input_header (list, optional): List of annotation column headers to + use as additional slide-level model input. Defaults to None. + min_tiles (int): Minimum number of tiles a slide must have to + include in training. Defaults to 0. + max_tiles (int): Only use up to this many tiles from each slide for + training. Defaults to 0 (include all tiles). + splits (str, optional): Filename of JSON file in which to log + train/val splits. Looks for filename in project root directory. + Defaults to "splits.json". + mixed_precision (bool, optional): Enable mixed precision. + Defaults to True. + allow_tf32 (bool): Allow internal use of Tensorfloat-32 format. + Defaults to False. + load_method (str): Either 'full' or 'weights'. Method to use + when loading a Tensorflow model. If 'full', loads the model + with ``tf.keras.models.load_model()``. If 'weights', will read + the ``params.json`` configuration file, build the model + architecture, and then load weights from the given model with + ``Model.load_weights()``. Loading with 'full' may improve + compatibility across Slideflow versions. Loading with 'weights' + may improve compatibility across hardware & environments. + balance_headers (str or list(str)): Annotation header(s) specifying + labels on which to perform mini-batch balancing. If performing + category-level balancing and this is set to None, will default + to balancing on outcomes. Defaults to None. + val_strategy (str): Validation dataset selection strategy. Options + include bootstrap, k-fold, k-fold-manual, + k-fold-preserved-site, fixed, and none. Defaults to 'k-fold'. + val_k_fold (int): Total number of K if using K-fold validation. + Defaults to 3. + val_k (int): Iteration of K-fold to train, starting at 1. Defaults + to None (training all k-folds). + val_k_fold_header (str): Annotations file header column for + manually specifying k-fold or for preserved-site cross + validation. Only used if validation strategy is 'k-fold-manual' + or 'k-fold-preserved-site'. Defaults to None for k-fold-manual + and 'site' for k-fold-preserved-site. + val_fraction (float): Fraction of dataset to use for validation + testing, if strategy is 'fixed'. + val_source (str): Dataset source to use for validation. Defaults to + None (same as training). + val_annotations (str): Path to annotations file for validation + dataset. Defaults to None (same as training). + val_filters (dict): Filters to use for validation dataset. + See :meth:`slideflow.Dataset.filter` for more information. + Defaults to None (same as training). + checkpoint (str, optional): Path to cp.ckpt from which to load + weights. Defaults to None. + pretrain (str, optional): Either 'imagenet' or path to Tensorflow + model from which to load weights. Defaults to 'imagenet'. + multi_gpu (bool): Train using multiple GPUs when available. + Defaults to False. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + resume_training (str, optional): Path to model to continue training. + Only valid in Tensorflow backend. Defaults to None. + starting_epoch (int): Start training at the specified epoch. + Defaults to 0. + steps_per_epoch_override (int): If provided, will manually set the + number of steps in an epoch. Default epoch length is the number + of total tiles. + save_predictions (bool or str, optional): Save tile, slide, and + patient-level predictions at each evaluation. May be 'csv', + 'feather', or 'parquet'. If False, will not save predictions. + Defaults to 'parquet'. + save_model (bool, optional): Save models when evaluating at + specified epochs. Defaults to True. + validate_on_batch (int): Perform validation every N batches. + Defaults to 0 (only at epoch end). + validation_batch_size (int): Validation dataset batch size. + Defaults to 32. + use_tensorboard (bool): Add tensorboard callback for realtime + training monitoring. Defaults to True. + validation_steps (int): Number of steps of validation to perform + each time doing a mid-epoch validation check. Defaults to 200. + + Returns: + Dict with model names mapped to train_acc, val_loss, and val_acc + + """ + # Prepare outcomes + if not isinstance(outcomes, list): + outcomes = [outcomes] + if len(outcomes) > 1: + log.info(f'Training with {len(outcomes)} outcomes') + log.info(f'Outcomes: {", ".join(outcomes)}') + + # Prepare hyperparameters + if isinstance(params, str): + if exists(params): + hp_dict = sf.model.read_hp_sweep(params) + elif exists(join(self.root, params)): + hp_dict = sf.model.read_hp_sweep(join(self.root, params)) + else: + raise errors.ModelParamsError(f"Unable to find file {params}") + elif isinstance(params, ModelParams): + hp_dict = {'HP0': params} + elif isinstance(params, list): + if not all([isinstance(hp, ModelParams) for hp in params]): + raise errors.ModelParamsError( + 'If params is a list, items must be sf.ModelParams' + ) + hp_dict = {f'HP{i}': hp for i, hp in enumerate(params)} + elif isinstance(params, dict): + if not all([isinstance(hp, str) for hp in params.keys()]): + raise errors.ModelParamsError( + 'If params is a dict, keys must be of type str' + ) + all_hp = params.values() + if not all([isinstance(hp, ModelParams) for hp in all_hp]): + raise errors.ModelParamsError( + 'If params is a dict, values must be sf.ModelParams' + ) + hp_dict = params + else: + raise ValueError(f"Unable to interpret params value {params}") + + # Get default validation settings from kwargs + val_kwargs = { + k[4:]: v for k, v in training_kwargs.items() if k[:4] == 'val_' + } + training_kwargs = { + k: v for k, v in training_kwargs.items() if k[:4] != 'val_' + } + val_settings = get_validation_settings(**val_kwargs) + _invalid = ( + 'k-fold-manual', + 'k-fold-preserved-site', + 'k-fold', + 'bootstrap' + ) + if (val_settings.strategy in _invalid) and val_settings.source: + _m = f'{val_settings.strategy} invalid with val_source != None' + raise ValueError(_m) + + # Next, prepare the multiprocessing manager (needed to free VRAM after + # training and keep track of results) + manager = multiprocessing.Manager() + results_dict = manager.dict() + ctx = multiprocessing.get_context('spawn') + + # === Train with a set of hyperparameters ============================= + for hp_name, hp in hp_dict.items(): + if exp_label: + hp_name = f'{exp_label}-{hp_name}' + self._train_hp( + hp_name=hp_name, + hp=hp, + outcomes=outcomes, + val_settings=val_settings, + ctx=ctx, + dataset=dataset, + filters=filters, + filter_blank=filter_blank, + input_header=input_header, + min_tiles=min_tiles, + max_tiles=max_tiles, + mixed_precision=mixed_precision, + allow_tf32=allow_tf32, + splits=splits, + balance_headers=balance_headers, + training_kwargs=training_kwargs, + results_dict=results_dict, + load_method=load_method, + process_isolate=process_isolate + ) + # Print summary of all models + log.info('Training complete; validation accuracies:') + for model in results_dict: + if 'epochs' not in results_dict[model]: + continue + ep_res = results_dict[model]['epochs'] + epochs = [e for e in ep_res if 'epoch' in e] + try: + last = max([int(e.split('epoch')[-1]) for e in epochs]) + final_train_metrics = ep_res[f'epoch{last}']['train_metrics'] + except ValueError: + pass + else: + log.info(f'[green]{model}[/] training metrics:') + for m in final_train_metrics: + log.info(f'{m}: {final_train_metrics[m]}') + if 'val_metrics' in ep_res[f'epoch{last}']: + final_val_metrics = ep_res[f'epoch{last}']['val_metrics'] + log.info(f'[green]{model}[/] validation metrics:') + for m in final_val_metrics: + log.info(f'{m}: {final_val_metrics[m]}') + return dict(results_dict) + + def train_ensemble( + self, + outcomes: Union[str, List[str]], + params: Union[ModelParams, + List[ModelParams], + Dict[str, ModelParams]], + n_ensembles: Optional[int] = None, + **kwargs + ) -> List[Dict]: + """Train an ensemble of model(s). + + Trains models using a given set of parameters and outcomes by calling + the train function ``n_ensembles`` of times. + + Args: + outcomes (str or list(str)): Outcome label annotation header(s). + params (:class:`slideflow.ModelParams`, list or dict): + Model parameters for training. May provide one `ModelParams`, + a list, or dict mapping model names to params. If multiple + params are provided, will train an hyper deep ensemble models + for them, otherwise a deep ensemble model. + + Keyword Args: + n_ensembles (int, optional): Total models needed in the ensemble. + Defaults to 5. + **kwargs: All keyword arguments accepted by + :meth:`slideflow.Project.train` + + Returns: + List of dictionaries of length ``n_ensembles``, containing training + results for each member of the ensemble. + + """ + # Prepare output directory for saving ensemble members + if isinstance(outcomes, list): + ensemble_name = f"{'-'.join(outcomes)}-ensemble" + else: + ensemble_name = f"{outcomes}-ensemble" + ensemble_path = sf.util.get_new_model_dir( + self.models_dir, ensemble_name + ) + ensemble_results = [] + + # Process model params arguments + if isinstance(params, ModelParams): + hyper_deep = False + if n_ensembles is None: + raise TypeError( + "Keyword argument 'n_ensembles' is required if 'params' is" + " not a list of ModelParams." + ) + elif isinstance(params, list): + hyper_deep = True + if not all([isinstance(hp, ModelParams) for hp in params]): + raise errors.ModelParamsError( + 'If params is a list, items must be sf.ModelParams' + ) + hp_list = params + n_ensembles = len(hp_list) + elif isinstance(params, dict): + hyper_deep = True + if not all([isinstance(hp, str) for hp in params.keys()]): + raise errors.ModelParamsError( + 'If params is a dict, keys must be of type str' + ) + all_hp = params.values() + if not all([isinstance(hp, ModelParams) for hp in all_hp]): + raise errors.ModelParamsError( + 'If params is a dict, values must be sf.ModelParams' + ) + hp_list = [hp for hp in params.values()] + n_ensembles = len(hp_list) + print("The hyperparameter name to ensemble member mapping is:") + for e, n in enumerate(params.keys()): + print(f" - {n} : ensemble_{e+1}") + else: + raise ValueError(f"Unable to interpret params value {params}") + + # Check for same epoch value + if hyper_deep: + for hp in hp_list: + if hp.epochs != hp_list[0].epochs: + raise errors.ModelParamsNotFoundError( + "All hyperparameters must have the same epoch value" + ) + + # Parse validation settings + val_kwargs = {k[4:]: v for k, v in kwargs.items() if k[:4] == 'val_'} + val_settings = get_validation_settings(**val_kwargs) + print(f"Val settings: {json.dumps(vars(val_settings), indent=2)}") + + if not hyper_deep: + print(f"\nHyperparameters: {params}") + for i in range(n_ensembles): + print(f"Training Ensemble {i+1} of {n_ensembles}") + # Create the ensemble member folder, which will hold each + # k-fold model for the given ensemble member. + with sf.util.logging_level(30): + member_path = sf.util.get_new_model_dir( + ensemble_path, + f"ensemble_{i+1}") + if hyper_deep: + print(f"\nHyperparameters: {hp_list[i]}") + + with self._set_models_dir(member_path): + if hyper_deep: + hp = hp_list[i] + result = self.train(outcomes, hp, **kwargs) + ensemble_results.append(result) + else: + result = self.train(outcomes, params, **kwargs) + ensemble_results.append(result) + + # Copy the slide manifest and params.json file + # into the parent ensemble folder. + _, member_models = sf.util.get_valid_model_dir(member_path) + if len(member_models): + try: + shutil.copyfile( + join(member_path, member_models[0], "slide_manifest.csv"), + join(ensemble_path, "slide_manifest.csv")) + + params_data = sf.util.load_json( + join(member_path, member_models[0], "params.json") + ) + params_data['ensemble_epochs'] = params_data['hp']['epochs'] + del params_data['hp'] + params_data['hyper_deep_ensemble'] = hyper_deep + sf.util.write_json( + params_data, join(ensemble_path, "ensemble_params.json") + ) + + except OSError: + log.error( + "Unable to find ensemble slide manifest and params.json" + ) + else: + log.error("Unable to find ensemble slide manifest and params.json") + + # Merge predictions from each ensemble. + if "save_predictions" in kwargs: + if not kwargs['save_predictions']: + return ensemble_results + project_utils.ensemble_train_predictions(ensemble_path) + return ensemble_results + + def train_simclr( + self, + simclr_args: "simclr.SimCLR_Args", + train_dataset: Dataset, + val_dataset: Optional[Dataset] = None, + *, + exp_label: Optional[str] = None, + outcomes: Optional[Union[str, List[str]]] = None, + dataset_kwargs: Optional[Dict[str, Any]] = None, + normalizer: Optional[Union[str, "sf.norm.StainNormalizer"]] = None, + normalizer_source: Optional[str] = None, + **kwargs + ) -> None: + """Train SimCLR model. + + Models are saved in ``simclr`` folder in the project root directory. + + See :ref:`simclr_ssl` for more information. + + Args: + simclr_args (slideflow.simclr.SimCLR_Args, optional): SimCLR + arguments, as provided by :func:`slideflow.simclr.get_args()`. + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + Defaults to None. + + Keyword Args: + exp_label (str, optional): Experiment label to add model names. + outcomes (str, optional): Annotation column which specifies the + outcome, for optionally training a supervised head. + Defaults to None. + dataset_kwargs: All other keyword arguments for + :meth:`slideflow.Dataset.tensorflow` + **kwargs: All other keyword arguments for + :meth:`slideflow.simclr.run_simclr()` + + """ + from slideflow import simclr + + # Set up SimCLR experiment data directory + if exp_label is None: + exp_label = 'simclr' + if not exists(join(self.root, 'simclr')): + os.makedirs(join(self.root, 'simclr')) + outdir = sf.util.create_new_model_dir( + join(self.root, 'simclr'), exp_label + ) + + # Get base SimCLR args/settings if not provided + if not simclr_args: + simclr_args = simclr.get_args() + assert isinstance(simclr_args, simclr.SimCLR_Args) + + # Create dataset builder, which SimCLR will use to create + # the input pipeline for training + builder = simclr.DatasetBuilder( + train_dts=train_dataset, + val_dts=val_dataset, + labels=outcomes, + dataset_kwargs=dataset_kwargs, + normalizer=normalizer, + normalizer_source=normalizer_source + ) + simclr.run_simclr(simclr_args, builder, model_dir=outdir, **kwargs) + + def train_mil( + self, + config: "mil.TrainerConfig", + train_dataset: Dataset, + val_dataset: Dataset, + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + exp_label: Optional[str] = None, + outdir: Optional[str] = None, + **kwargs + ): + r"""Train a multi-instance learning model. + + Args: + config (:class:`slideflow.mil.TrainerConfig`): + Training configuration, as obtained by + :func:`slideflow.mil.mil_config()`. + train_dataset (:class:`slideflow.Dataset`): Training dataset. + val_dataset (:class:`slideflow.Dataset`): Validation dataset. + outcomes (str): Outcome column (annotation header) from which to + derive category labels. + bags (str): Either a path to directory with \*.pt files, or a list + of paths to individual \*.pt files. Each file should contain + exported feature vectors, with each file containing all tile + features for one patient. + + Keyword args: + exp_label (str): Experiment label, used for naming the subdirectory + in the ``{project root}/mil`` folder, where training history + and the model will be saved. + attention_heatmaps (bool): Calculate and save attention heatmaps + on the validation dataset. Defaults to False. + interpolation (str, optional): Interpolation strategy for smoothing + attention heatmaps. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + """ + from .mil import train_mil + + if outdir is None: + outdir = join(self.root, 'mil') + + return train_mil( + config, + train_dataset, + val_dataset, + outcomes, + bags, + outdir=outdir, + exp_label=exp_label, + **kwargs + )
+ +# ----------------------------------------------------------------------------- + + +def load(root: str, **kwargs) -> "Project": + """Load a project at the given root directory. + + Args: + root (str): Path to project. + + Returns: + slideflow.Project + + """ + return Project(root, **kwargs) + + +def create( + root: str, + cfg: Optional[Union[str, Dict]] = None, + *, + download: bool = False, + md5: bool = False, + **kwargs +) -> "Project": + """Create a project at the existing folder from a given configuration. + + Supports both manual project creation via keyword arguments, and setting + up a project through a specified configuration. The configuration may be + a dictionary or a path to a JSON file containing a dictionary. It must + have the key 'annotations', which includes a path to an annotations file, + and may optionally have the following arguments: + + - **name**: Name for the project and dataset. + - **rois**: Path to .tar.gz file containing compressed ROIs. + - **slides**: Path in which slides will be stored. + - **tiles**: Path in which extracted tiles will be stored. + - **tfrecords**: Path in which TFRecords will be stored. + + .. code-block:: python + + import slideflow as sf + + P = sf.create_project( + root='path', + annotations='file.csv', + slides='path', + tfrecords='path' + ) + + Annotations files are copied into the created project folder. + + Alternatively, you can create a project using a prespecified configuration, + of which there are three available: + + - ``sf.project.LungAdenoSquam()`` + - ``sf.project.ThyroidBRS()`` + - ``sf.project.BreastER()`` + + When creating a project from a configuration, setting ``download=True`` + will download the annoations file and slides from The Cancer Genome Atlas + (TCGA). + + .. code-block:: python + + import slideflow as sf + + project = sf.create_project( + root='path', + cfg=sf.project.LungAdenoSquam(), + download=True + ) + + Args: + root (str): Path at which the Project will be set up. + cfg (dict, str, optional): Path to configuration file (JSON), or a + dictionary, containing the key "annotations", and optionally with + the keys "name", "rois", "slides", "tiles", or "tfrecords". + Defaults to None. + + Keyword Args: + download (bool): Download any missing slides from the Genomic Data + Commons (GDC) automatically, using slide names stored in the + annotations file. + md5 (bool): Perform MD5 hash verification for all slides using + the GDC (TCGA) MD5 manifest, which will be downloaded. + name (str): Set the project name. This has higher priority than any + supplied configuration, which will be ignored. + slides (str): Set the destination folder for slides. This has higher + priority than any supplied configuration, which will be ignored. + tiles (str): Set the destination folder for tiles. This has higher + priority than any supplied configuration, which will be ignored. + tfrecords (str): Set the destination for TFRecords. This has higher + priority than any supplied configuration, which will be ignored. + roi_dest (str): Set the destination folder for ROIs. + dataset_config (str): Path to dataset configuration JSON file for the + project. Defaults to './datasets.json'. + sources (list(str)): List of dataset sources to include in project. + Defaults to 'MyProject'. + models_dir (str): Path to directory in which to save models. + Defaults to './models'. + eval_dir (str): Path to directory in which to save evaluations. + Defaults to './eval'. + + Returns: + slideflow.Project + """ + cfg_names = ( + 'annotations', 'name', 'slides', 'tiles', 'tfrecords', 'roi_dest' + ) + proj_kwargs = {k: v for k, v in kwargs.items() if k not in cfg_names} + kwargs = {k: v for k, v in kwargs.items() if k in cfg_names} + + # Initial verification + if sf.util.is_project(root): + raise OSError(f"A project already exists at {root}") + if isinstance(cfg, dict): + cfg = sf.util.EasyDict(cfg) + if isinstance(cfg, str): + cfg_path = cfg + cfg = sf.util.EasyDict(sf.util.load_json(cfg)) + + # Resolve relative paths in configuration file + if 'annotations' in cfg and exists(join(dirname(cfg_path), + cfg.annotations)): + cfg.annotations = join(dirname(cfg_path), cfg.annotations) + if 'rois' in cfg and exists(join(dirname(cfg_path), cfg.rois)): + cfg.rois = join(dirname(cfg_path), cfg.rois) + elif cfg is None: + cfg = sf.util.EasyDict(kwargs) + elif issubclass(type(cfg), project_utils._ProjectConfig): + cfg = sf.util.EasyDict(cfg.to_dict()) + + if 'name' not in cfg: + cfg.name = "MyProject" + if 'slides' not in cfg: + cfg.slides = join(root, 'slides') + if 'tiles' not in cfg: + cfg.tiles = join(root, 'tiles') + if 'tfrecords' not in cfg: + cfg.tfrecords = join(root, 'tfrecords') + cfg.roi_dest = join(cfg.slides, 'rois') + + # Overwrite any project configuration with user-specified keyword arguments + cfg.update(kwargs) + + # Set up project at the given directory. + log.info(f"Setting up project at {root}") + if 'annotations' in cfg: + if root.startswith('.'): + proj_kwargs['annotations'] = join('.', basename(cfg.annotations)) + else: + proj_kwargs['annotations'] = join(root, basename(cfg.annotations)) + + P = sf.Project(root, **proj_kwargs, create=True) + + # Download annotations, if a URL. + if 'annotations' in cfg and cfg.annotations.startswith('http'): + log.info(f"Downloading {cfg.annotations}") + r = requests.get(cfg.annotations) + open(proj_kwargs['annotations'], 'wb').write(r.content) + if cfg.annotations_md5 != sf.util.md5(proj_kwargs['annotations']): + raise errors.ChecksumError( + "Remote annotations URL failed MD5 checksum." + ) + elif 'annotations' in cfg and not cfg.annotations.startswith('.'): + try: + shutil.copy(cfg.annotations, root) + except shutil.SameFileError: + pass + + # Set up the dataset source. + + source_already_exists = False + if 'sources' in proj_kwargs and exists(P.dataset_config): + _dataset_config = sf.util.load_json(P.dataset_config) + if isinstance(proj_kwargs['sources'], str): + source_already_exists = proj_kwargs['sources'] in _dataset_config + else: + source_already_exists = all( + [s in _dataset_config for s in proj_kwargs['sources']] + ) + + if (('sources' not in proj_kwargs or proj_kwargs['sources'] is not None) + and not source_already_exists): + + # Create a new dataset source if it does not already exist. + P.add_source( + cfg.name, + slides=cfg.slides, + roi=cfg.roi_dest, + tiles=cfg.tiles, + tfrecords=cfg.tfrecords) + + # Set up ROIs, if provided. + if 'rois' in cfg and not exists(cfg.roi_dest): + os.makedirs(cfg.roi_dest) + if 'rois' in cfg and exists(cfg.rois) and os.path.isdir(cfg.rois): + # Search the folder for CSV files + # and copy to the project ROI directory. + to_copy = [r for r in os.listdir(cfg.rois) + if path_to_ext(r) == 'csv'] + log.info("Copying {} ROIs from {} to {}.".format( + len(to_copy), + cfg.rois, + cfg.roi_dest + )) + for roi in to_copy: + shutil.copy(join(cfg.rois, roi), cfg.roi_dest) + elif 'rois' in cfg and exists(cfg.rois) and os.path.isfile(cfg.rois): + # Assume ROIs is a tarfile - extract at destination. + log.info(f"Extrating ROIs from tarfile at {cfg.rois}.") + roi_file = tarfile.open(cfg.rois) + roi_file.extractall(cfg.roi_dest) + + # Create blank annotations file, if not provided. + if not exists(P.annotations): + P.create_blank_annotations() + + # Download slides from GDC (TCGA), if specified. + if download: + df = sf.util.get_gdc_manifest() + slide_manifest = dict(zip(df.filename.values, df.id.values)) + if not exists(cfg.slides): + os.makedirs(cfg.slides) + to_download = [s for s in P.dataset().slides() + if not exists(join(cfg.slides, f'{s}.svs'))] + for i, slide in enumerate(to_download): + sf.util.download_from_tcga( + slide_manifest[slide+".svs"], + dest=cfg.slides, + message=f"Downloading {i+1} of {len(to_download)}...") + + # Perform MD5 hash verification of slides using the GDC manifest. + if md5: + df = sf.util.get_gdc_manifest() + md5_manifest = dict(zip(df.filename.values, df.md5.values)) + + slides_with_md5 = [s for s in os.listdir(cfg.slides) + if s in md5_manifest] + failed_md5 = [] + for slide in tqdm(slides_with_md5): + if sf.util.md5(join(cfg.slides, slide)) != md5_manifest[slide]: + log.info(f"Slide {slide} failed MD5 verification") + failed_md5 += [slide] + if not failed_md5: + log.info( + f"All {len(slides_with_md5)} slides passed MD5 verification." + ) + else: + log.warn( + f"Warning: {len(failed_md5)} slides failed MD5 verification:" + f"{', '.join(failed_md5)}" + ) + + return P +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/simclr/simclr/tf2/data/index.html b/docs/_modules/slideflow/simclr/simclr/tf2/data/index.html new file mode 100644 index 000000000..6d52eba42 --- /dev/null +++ b/docs/_modules/slideflow/simclr/simclr/tf2/data/index.html @@ -0,0 +1,704 @@ + + + + + + + + + + + + slideflow.simclr.simclr.tf2.data — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.simclr.simclr.tf2.data

+# coding=utf-8
+# Copyright 2020 The SimCLR Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific simclr governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Data pipeline."""
+
+import functools
+import slideflow as sf
+from slideflow import log as logging
+
+from . import data_util
+import tensorflow.compat.v2 as tf
+import tensorflow_datasets as tfds
+
+
+
[docs]class DatasetBuilder: + + def __init__(self, train_dts=None, val_dts=None, test_dts=None, *, labels=None, + val_kwargs=None, steps_per_epoch_override=None, normalizer=None, + normalizer_source=None, dataset_kwargs=None): + """Build a training/validation dataset pipeline for SimCLR. + + Args: + train_dts (sf.Dataset, optional): Training dataset. + val_dts (sf.Dataset, optional): Optional validation dataset. + test_dts (sf.Dataset, optional): Optional held-out test set. + + Keyword args: + labels (str or dict): Labels for training the supervised head. + Can be a name of an outcome (str) or a dict mapping slide names + to labels. + val_kwargs (dict, optional): Optional keyword arguments for + generating a validation dataset from ``train_dts`` via + ``train_dts.split()``. Incompatible with ``val_dts``. + steps_per_epoch_override (int, optional): Override the number + of steps per epoch. + dataset_kwargs (dict, optional): Keyword arguments passed to the + :meth:`slideflow.Dataset.tensorflow` method when creating + the input pipeline. + + """ + if train_dts is None and val_dts is None and test_dts is None: + raise ValueError("Must supply either train_dts, val_dts, or test_dts.") + if val_kwargs is not None and val_dts is not None: + raise ValueError("Cannot supply val_kwargs if val_dts is not None") + if val_kwargs is not None and train_dts is None: + raise ValueError("Cannot supply val_kwargs if train_dts is None") + + if isinstance(labels, dict): + self.labels = labels + elif isinstance(labels, str): + self.labels = {} + if train_dts is not None: + self.labels.update(train_dts.labels(labels)[0]) + if val_dts is not None: + self.labels.update(val_dts.labels(labels)[0]) + if test_dts is not None: + self.labels.update(test_dts.labels(labels)[0]) + elif labels is not None: + raise ValueError( + f"Unrecognized type {type(labels)} for argument labels: " + "expected dict or str" + ) + else: + self.labels = None + if val_kwargs is not None: + if self.labels is None: + raise ValueError( + "Unable to automatically generate training/validation " + "splits using keyword arguments (val_kwargs) " + "if labels are not provided." + ) + self.train_dts, self.val_dts = train_dts.split( + labels=self.labels, + **val_kwargs + ) + else: + self.train_dts = train_dts + self.val_dts = val_dts + self.test_dts = test_dts + if steps_per_epoch_override: + train_tiles = steps_per_epoch_override + elif self.train_dts: + train_tiles = self.train_dts.num_tiles + else: + train_tiles = 0 + + if isinstance(normalizer, str): + self.normalizer = sf.norm.autoselect(normalizer, + source=normalizer_source, + backend='tensorflow') + else: + self.normalizer = normalizer + self.num_classes = 0 if self.labels is None else len(set(list(self.labels.values()))) + self.dataset_kwargs = dict() if dataset_kwargs is None else dataset_kwargs + self.info = data_util.EasyDict( + features=data_util.EasyDict( + label=data_util.EasyDict(num_classes=self.num_classes) + ), + splits=data_util.EasyDict( + train=data_util.EasyDict(num_examples=train_tiles), + validation=data_util.EasyDict(num_examples=(0 if not self.val_dts else self.val_dts.num_tiles)), + test=data_util.EasyDict(num_examples=(0 if not self.test_dts else self.test_dts.num_tiles)) + )) + + def as_dataset(self, split, read_config, shuffle_files, as_supervised, **kwargs): + logging.info(f"Dataset split requested: {split}") + if split == 'train': + dts = self.train_dts + elif split == 'validation': + dts = self.val_dts + elif split == 'test': + dts = self.test_dts + else: + raise ValueError(f"Unrecognized split {split}, expected 'train' " + "'validation', or 'test'.") + if dts is None: + raise ValueError(f'Builder not configured for phase "{split}".') + + return dts.tensorflow( + labels=self.labels, + num_shards=read_config.input_context.num_input_pipelines, + shard_idx=read_config.input_context.input_pipeline_id, + standardize=False, + infinite=(split == 'train'), + **self.dataset_kwargs, + **kwargs + ) + + def build_dataset(self, *args, **kwargs): + """Builds a distributed dataset. + + Args: + batch_size (int): Global batch size across devices. + is_training (bool): If this is for training. + simclr_args (SimCLR_Args): SimCLR arguments. + strategy (tf.distribute.Strategy, optional): Distribution strategy. + cache_dataset (bool): Cache dataset. + + Returns: + Distributed Tensorflow dataset, with SimCLR preprocessing applied. + """ + return build_distributed_dataset(self, *args, **kwargs)
+ + +def build_input_fn(builder, global_batch_size, is_training, + simclr_args, cache_dataset=False): + """Build input function. + + Args: + builder: Either DatasetBuilder, or a TFDS builder for specified dataset. + global_batch_size: Global batch size. + is_training: Whether to build in training mode. + simCLR_args: SimCLR arguments, as provided by :func:`slideflow.simclr.get_args`. + + Returns: + A function that accepts a dict of params and returns a tuple of images and + features, to be used as the input_fn in TPUEstimator. + """ + + def _input_fn(input_context): + """Inner input function.""" + batch_size = input_context.get_per_replica_batch_size(global_batch_size) + logging.info('Global batch size: %d', global_batch_size) + logging.info('Per-replica batch size: %d', batch_size) + preprocess_fn_pretrain = get_preprocess_fn( + is_training, + is_pretrain=True, + image_size=simclr_args.image_size, + color_jitter_strength=simclr_args.color_jitter_strength, + normalizer=(builder.normalizer if is_training else None), + normalizer_augment=simclr_args.stain_augment) + preprocess_fn_finetune = get_preprocess_fn( + is_training, + is_pretrain=False, + image_size=simclr_args.image_size, + color_jitter_strength=simclr_args.color_jitter_strength, + normalizer=(builder.normalizer if is_training else None), + normalizer_augment=simclr_args.stain_augment) + num_classes = builder.info.features['label'].num_classes + + def map_fn(image, label, *args): + """Produces multiple transformations of the same batch.""" + if is_training and simclr_args.train_mode == 'pretrain': + xs = [] + for _ in range(2): # Two transformations + xs.append(preprocess_fn_pretrain(image)) + image = tf.concat(xs, -1) + else: + image = preprocess_fn_finetune(image) + if num_classes: + label = tf.one_hot(label, num_classes) + return detuple(image, label, args) + + logging.info('num_input_pipelines: %d', input_context.num_input_pipelines) + + # Perform stain normalization within sf.Dataset.tensorflow() + # If this is for inference. + if builder.normalizer and not is_training: + dts_kw = dict(normalizer=builder.normalizer) + else: + dts_kw = {} + dataset = builder.as_dataset( + split=simclr_args.train_split if is_training else simclr_args.eval_split, + shuffle_files=is_training, + as_supervised=True, + # Passing the input_context to TFDS makes TFDS read different parts + # of the dataset on different workers. We also adjust the interleave + # parameters to achieve better performance. + read_config=tfds.ReadConfig( + interleave_cycle_length=32, + interleave_block_length=1, + input_context=input_context), + **dts_kw) + if cache_dataset: + dataset = dataset.cache() + if is_training: + options = tf.data.Options() + options.experimental_deterministic = False + options.experimental_slack = True + dataset = dataset.with_options(options) + buffer_multiplier = 50 if simclr_args.image_size <= 32 else 10 + dataset = dataset.shuffle(batch_size * buffer_multiplier) + dataset = dataset.repeat(-1) + dataset = dataset.map( + map_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) + dataset = dataset.batch(batch_size, drop_remainder=is_training) + dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) + return dataset + + return _input_fn + + +def build_distributed_dataset(builder, batch_size, is_training, simclr_args, + strategy=None, cache_dataset=False): + if strategy is None: + strategy = tf.distribute.get_strategy() + input_fn = build_input_fn( + builder, batch_size, is_training, simclr_args, cache_dataset=cache_dataset + ) + return strategy.distribute_datasets_from_function(input_fn) + + +def get_preprocess_fn(is_training, is_pretrain, image_size, + color_jitter_strength=1.0, normalizer=None, + normalizer_augment=True, center_crop=True): + """Get function that accepts an image and returns a preprocessed image.""" + # Disable test cropping for small images (e.g. CIFAR) + if not center_crop or image_size <= 32: + test_crop = False + else: + test_crop = True + return functools.partial( + data_util.preprocess_image, + height=image_size, + width=image_size, + color_jitter_strength=color_jitter_strength, + is_training=is_training, + color_distort=is_pretrain, + test_crop=test_crop, + normalizer=normalizer, + normalizer_augment=normalizer_augment) + +# ----------------------------------------------------------------------------- + +def detuple(image, label, args): + """Detuple optional arguments for return. + + Adds support for returning args via wildcard in Python 3.7. The following: + + .. code-block:: python + + return image, label, *args + + can be made cross-compatible with Python 3.7 and higher by using: + + .. code-block:: python + + return detuple(image, label, args) + + """ + if len(args): + return tuple([image, label] + list(args)) + else: + return image, label +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/simclr/simclr/tf2/index.html b/docs/_modules/slideflow/simclr/simclr/tf2/index.html new file mode 100644 index 000000000..11f383b3e --- /dev/null +++ b/docs/_modules/slideflow/simclr/simclr/tf2/index.html @@ -0,0 +1,937 @@ + + + + + + + + + + + + slideflow.simclr.simclr.tf2 — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.simclr.simclr.tf2

+# coding=utf-8
+# Copyright 2020 The SimCLR Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific simclr governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The main training pipeline."""
+
+import json
+import math
+import os
+
+from tqdm import tqdm
+from slideflow import log as logging
+from . import data as data_lib
+from . import metrics
+from . import model as model_lib
+from . import objective as obj_lib
+from . import utils as utils_lib
+
+import tensorflow.compat.v2 as tf
+import tensorflow_datasets as tfds
+
+# -----------------------------------------------------------------------------
+
+def build_saved_model(
+    model,
+    include_projection_head=True,
+    include_supervised_head=True
+):
+  """Returns a tf.Module for saving to SavedModel."""
+
+  class SimCLRModel(tf.Module):
+    """Saved model for exporting to hub."""
+
+    def __init__(self, model):
+      self.model = model
+      # This can't be called `trainable_variables` because `tf.Module` has
+      # a getter with the same name.
+      self.trainable_variables_list = model.trainable_variables
+
+    @tf.function
+    def __call__(self, inputs, trainable):
+      self.model(inputs, training=trainable)
+      return utils_lib.get_salient_tensors_dict(
+        include_projection_head, include_supervised_head
+      )
+
+  module = SimCLRModel(model)
+  input_spec = tf.TensorSpec(shape=[None, None, None, 3], dtype=tf.float32)
+  module.__call__.get_concrete_function(input_spec, trainable=True)
+  module.__call__.get_concrete_function(input_spec, trainable=False)
+  return module
+
+
+def save(model, destination, simclr_args, global_step=None, named_by_step=False):
+  """Export as SavedModel for finetuning and inference."""
+  is_supervised = ((simclr_args.train_mode == 'finetune'
+                    or simclr_args.lineareval_while_pretraining)
+                   and simclr_args.num_classes > 0)
+  saved_model = build_saved_model(model, include_supervised_head=is_supervised)
+  if named_by_step:
+    checkpoint_export_dir = destination + f'_step{global_step}'
+  else:
+    checkpoint_export_dir = destination
+  if tf.io.gfile.exists(checkpoint_export_dir):
+    tf.io.gfile.rmtree(checkpoint_export_dir)
+  tf.saved_model.save(saved_model, checkpoint_export_dir)
+  with open(os.path.join(checkpoint_export_dir, 'args.json'), "w") as data_file:
+    json.dump(simclr_args.to_dict(), data_file, indent=1)
+
+
+
[docs]def load(path, as_pretrained: bool = False): + """Load a SavedModel or checkpoint for inference. + + Args: + path (str): Path to saved model. + + Returns: + Tensorflow SimCLR model. + """ + args = utils_lib.load_model_args(path) + if as_pretrained: + args.train_mode = 'pretrain' + model = model_lib.SimCLR(**args.model_kwargs) + step = tf.Variable(0, dtype=tf.int64) + checkpoint = tf.train.Checkpoint(model=model, global_step=step) + if path.endswith('.ckpt'): + path = path.split('.ckpt')[0] + checkpoint.restore(path).expect_partial() + return model
+ + +def try_restore_from_checkpoint( + model, + global_step, + optimizer, + model_dir, + checkpoint_path, + keep_checkpoint_max=5, + zero_init_logits_layer=False, + ): + """Restores the latest ckpt if it exists, otherwise check checkpoint_path""" + checkpoint = tf.train.Checkpoint( + model=model, global_step=global_step, optimizer=optimizer) + checkpoint_manager = tf.train.CheckpointManager( + checkpoint, + directory=model_dir, + max_to_keep=keep_checkpoint_max) + latest_ckpt = checkpoint_manager.latest_checkpoint + if latest_ckpt: + # Restore model weights, global step, optimizer states + logging.info('Restoring from latest checkpoint: %s', latest_ckpt) + checkpoint_manager.checkpoint.restore(latest_ckpt).expect_partial() + elif checkpoint_path: + # Restore model weights only, but not global step and optimizer states + logging.info('Restoring from given checkpoint: %s', checkpoint_path) + checkpoint_manager2 = tf.train.CheckpointManager( + tf.train.Checkpoint(model=model), + directory=model_dir, + max_to_keep=keep_checkpoint_max) + checkpoint_manager2.checkpoint.restore(checkpoint_path).expect_partial() + if zero_init_logits_layer: + model = checkpoint_manager2.checkpoint.model + output_layer_parameters = model.supervised_head.trainable_weights + logging.info('Initializing output layer parameters %s to zero', + [x.op.name for x in output_layer_parameters]) + for x in output_layer_parameters: + x.assign(tf.zeros_like(x)) + + return checkpoint_manager + + +def checkpoint_to_saved_model(ckpt, args, dest, global_step=0): + model = model_lib.SimCLR(**args.model_kwargs) + checkpoint = tf.train.Checkpoint( + model=model, + global_step=tf.Variable(0, dtype=tf.int64) + ) + checkpoint.restore(ckpt).expect_partial() + save(model, dest, args, global_step=global_step) + +# ----------------------------------------------------------------------------- + +def perform_evaluation( + model, + builder, + eval_steps, + ckpt, + strategy, + model_dir, + cache_dataset, + args, +): + """Perform evaluation.""" + if args.train_mode == 'pretrain' and not args.lineareval_while_pretraining: + logging.info('Skipping eval during pretraining without linear eval.') + return + elif not builder.num_classes: + logging.info('Skipping eval during pretraining; no labels supplied.') + # Build input pipeline. + ds = data_lib.build_distributed_dataset( + builder, args.eval_batch_size, False, args, strategy, + cache_dataset=cache_dataset + ) + summary_writer = tf.summary.create_file_writer(model_dir) + + # Build metrics. + with strategy.scope(): + regularization_loss = tf.keras.metrics.Mean('eval/regularization_loss') + label_top_1_accuracy = tf.keras.metrics.Accuracy( + 'eval/label_top_1_accuracy') + label_top_5_accuracy = tf.keras.metrics.TopKCategoricalAccuracy( + 5, 'eval/label_top_5_accuracy') + all_metrics = [ + regularization_loss, label_top_1_accuracy, label_top_5_accuracy + ] + + # Restore checkpoint. + logging.info('Restoring from %s', ckpt) + checkpoint = tf.train.Checkpoint( + model=model, global_step=tf.Variable(0, dtype=tf.int64)) + checkpoint.restore(ckpt).expect_partial() + global_step = checkpoint.global_step + logging.info('Performing eval at step %d', global_step.numpy()) + + def single_step(features, labels): + _, supervised_head_outputs = model(features, training=False) + assert supervised_head_outputs is not None + outputs = supervised_head_outputs + l = labels['labels'] + metrics.update_finetune_metrics_eval(label_top_1_accuracy, + label_top_5_accuracy, outputs, l) + reg_loss = model_lib.add_weight_decay( + model, args.optimizer, args.weight_decay, adjust_per_optimizer=True + ) + regularization_loss.update_state(reg_loss) + + with strategy.scope(): + + @tf.function + def run_single_step(iterator): + images, labels = next(iterator) + features, labels = images, {'labels': labels} + strategy.run(single_step, (features, labels)) + + iterator = iter(ds) + for i in range(eval_steps): + run_single_step(iterator) + logging.info('Completed eval for %d / %d steps', i + 1, eval_steps) + logging.info('Finished eval for %s', ckpt) + + # Write summaries + cur_step = global_step.numpy() + logging.info('Writing summaries for %d step', cur_step) + with summary_writer.as_default(): + metrics.log_and_write_metrics_to_summary(all_metrics, cur_step) + summary_writer.flush() + + # Record results as JSON. + result_json_path = os.path.join(model_dir, 'result.json') + result = {metric.name: metric.result().numpy() for metric in all_metrics} + result['global_step'] = global_step.numpy() + logging.info(result) + with tf.io.gfile.GFile(result_json_path, 'w') as f: + json.dump({k: float(v) for k, v in result.items()}, f) + result_json_path = os.path.join( + model_dir, 'result_%d.json'%result['global_step']) + with tf.io.gfile.GFile(result_json_path, 'w') as f: + json.dump({k: float(v) for k, v in result.items()}, f) + flag_json_path = os.path.join(model_dir, 'args.json') + with tf.io.gfile.GFile(flag_json_path, 'w') as f: + serializable_flags = {} + + for key, val in vars(args).items(): + # Some flag value types e.g. datetime.timedelta are not json serializable, + # filter those out. + if utils_lib.json_serializable(val): + serializable_flags[key] = val + json.dump(serializable_flags, f, indent=1) + + # Export as SavedModel for finetuning and inference. + save( + model, + os.path.join(model_dir, 'saved_model'), + simclr_args=args, + global_step=result['global_step'], + named_by_step=True + ) + return result + + +
[docs]def run_simclr( + args, + builder=None, + model_dir=None, + cache_dataset=False, + checkpoint_path=None, + use_tpu=False, + tpu_name=None, + tpu_zone=None, + gcp_project=None, +): + """Train a SimCLR model. + + Args: + simCLR_args (SimpleNamespace): SimCLR arguments, as provided by + :func:`slideflow.simclr.get_args`. + builder (DatasetBuilder, optional): Builder for preparing SimCLR input + pipelines. If None, will build using TensorflowDatasets and + `simclr_args.dataset`. + model_dir (str): Model directory for training. + cache_dataset (bool): Whether to cache the entire dataset in memory. If + the dataset is ImageNet, this is a very bad idea, but for smaller datasets + it can improve performance + checkpoint_path (str): Loading from the given checkpoint for fine-tuning if + a finetuning checkpoint does not already exist in model_dir + use_tpu (bool): Whether to run on TPU. + tpu_name (str): The Cloud TPU to use for training. This should be either the + name used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 + url + tpu_zone (str): GCE zone where the Cloud TPU is located in. If not + specified, we will attempt to automatically detect the GCE project from + metadata + gcp_project (str): Project name for the Cloud TPU-enabled project. If not + specified, we will attempt to automatically detect the GCE project from + metadata + + """ + logging.debug("Building SimCLR dataset") + if builder is None: + builder = tfds.builder(args.dataset, data_dir=args.data_dir) + builder.download_and_prepare() + num_train_examples = builder.info.splits[args.train_split].num_examples + num_eval_examples = builder.info.splits[args.eval_split].num_examples + args.num_classes = builder.info.features['label'].num_classes + + train_steps = model_lib.get_train_steps(num_train_examples, args.train_steps, + args.train_epochs, args.train_batch_size) + eval_steps = args.eval_steps or int( + math.ceil(num_eval_examples / args.eval_batch_size)) + epoch_steps = int(round(num_train_examples / args.train_batch_size)) + + logging.info(f"SimCLR Args: {json.dumps(args.to_dict(), indent=1)}") + logging.info('# train examples: %d', num_train_examples) + logging.info('# train_steps: %d', train_steps) + logging.info('# eval examples: %d', num_eval_examples) + logging.info('# eval steps: %d', eval_steps) + + checkpoint_steps = ( + args.checkpoint_steps or (args.checkpoint_epochs * epoch_steps)) + + topology = None + if use_tpu: + logging.debug("Configuring TPUs") + if tpu_name: + cluster = tf.distribute.cluster_resolver.TPUClusterResolver( + tpu_name, zone=tpu_zone, project=gcp_project) + else: + cluster = tf.distribute.cluster_resolver.TPUClusterResolver(args.master) + tf.config.experimental_connect_to_cluster(cluster) + topology = tf.tpu.experimental.initialize_tpu_system(cluster) + logging.info('Topology:') + logging.info('num_tasks: %d', topology.num_tasks) + logging.info('num_tpus_per_task: %d', topology.num_tpus_per_task) + strategy = tf.distribute.TPUStrategy(cluster) + + else: + # For (multiple) GPUs. + logging.debug("Configuring distributed dataset with MirroredStrategy") + strategy = tf.distribute.MirroredStrategy() + logging.info('Running using MirroredStrategy on %d replicas', + strategy.num_replicas_in_sync) + + with strategy.scope(): + model = model_lib.SimCLR(**args.model_kwargs) + + if args.mode == 'eval': + logging.debug("Performing evaluation") + for ckpt in tf.train.checkpoints_iterator( + model_dir, min_interval_secs=15): + result = perform_evaluation( + model, builder, eval_steps, ckpt, strategy, + model_dir, cache_dataset, args + ) + if result['global_step'] >= train_steps: + logging.info('Eval complete. Exiting...') + return + else: + logging.debug("Setting up file writer for logs") + summary_writer = tf.summary.create_file_writer(model_dir) + if not os.path.exists(model_dir): + os.makedirs(model_dir) + with open(os.path.join(model_dir, 'args.json'), "w") as data_file: + json.dump(args.to_dict(), data_file, indent=1) + with strategy.scope(): + # Build input pipeline. + logging.debug("Setting up distributed dataset") + ds = data_lib.build_distributed_dataset(builder, args.train_batch_size, + True, args, strategy) + + # Build LR schedule and optimizer. + learning_rate = model_lib.WarmUpAndCosineDecay( + learning_rate=args.learning_rate, + num_examples=num_train_examples, + warmup_epochs=args.warmup_epochs, + train_batch_size=args.train_batch_size, + learning_rate_scaling=args.learning_rate_scaling, + train_steps=args.train_steps, + train_epochs=args.train_epochs + ) + optimizer = model_lib.build_optimizer( + learning_rate=learning_rate, + optimizer=args.optimizer, + momentum=args.momentum, + weight_decay=args.weight_decay + ) + + # Build metrics. + all_metrics = [] # For summaries. + weight_decay_metric = tf.keras.metrics.Mean('train/weight_decay') + total_loss_metric = tf.keras.metrics.Mean('train/total_loss') + all_metrics.extend([weight_decay_metric, total_loss_metric]) + if args.train_mode == 'pretrain': + contrast_loss_metric = tf.keras.metrics.Mean('train/contrast_loss') + contrast_acc_metric = tf.keras.metrics.Mean('train/contrast_acc') + contrast_entropy_metric = tf.keras.metrics.Mean( + 'train/contrast_entropy') + all_metrics.extend([ + contrast_loss_metric, contrast_acc_metric, contrast_entropy_metric + ]) + if args.train_mode == 'finetune' or args.lineareval_while_pretraining: + supervised_loss_metric = tf.keras.metrics.Mean('train/supervised_loss') + supervised_acc_metric = tf.keras.metrics.Mean('train/supervised_acc') + all_metrics.extend([supervised_loss_metric, supervised_acc_metric]) + + # Restore checkpoint if available. + logging.debug("Attempting to restore from checkpoint") + checkpoint_manager = try_restore_from_checkpoint( + model, optimizer.iterations, optimizer, model_dir, checkpoint_path, + keep_checkpoint_max=args.keep_checkpoint_max, + zero_init_logits_layer=args.zero_init_logits_layer) + + steps_per_loop = min(checkpoint_steps, train_steps) + + def single_step(features, labels): + with tf.GradientTape() as tape: + # Log summaries on the last step of the training loop to match + # logging frequency of other scalar summaries. + # + # Notes: + # 1. Summary ops on TPUs get outside compiled so they do not affect + # performance. + # 2. Summaries are recorded only on replica 0. So effectively this + # summary would be written once per host when should_record == True. + # 3. optimizer.iterations is incremented in the call to apply_gradients. + # So we use `iterations + 1` here so that the step number matches + # those of scalar summaries. + # 4. We intentionally run the summary op before the actual model + # training so that it can run in parallel. + should_record = tf.equal((optimizer.iterations + 1) % steps_per_loop, 0) + with tf.summary.record_if(should_record): + # Only log augmented images for the first tower. + tf.summary.image( + 'image', features[:, :, :, :3], step=optimizer.iterations + 1) + projection_head_outputs, supervised_head_outputs = model( + features, training=True) + loss = None + if projection_head_outputs is not None: + outputs = projection_head_outputs + con_loss, logits_con, labels_con = obj_lib.add_contrastive_loss( + outputs, + hidden_norm=args.hidden_norm, + temperature=args.temperature, + strategy=strategy) + if loss is None: + loss = con_loss + else: + loss += con_loss + metrics.update_pretrain_metrics_train(contrast_loss_metric, + contrast_acc_metric, + contrast_entropy_metric, + con_loss, logits_con, + labels_con) + if supervised_head_outputs is not None: + outputs = supervised_head_outputs + l = labels['labels'] + if (args.train_mode == 'pretrain' + and args.lineareval_while_pretraining + and args.num_classes): + l = tf.concat([l, l], 0) + sup_loss = obj_lib.add_supervised_loss(labels=l, logits=outputs) + if loss is None: + loss = sup_loss + else: + loss += sup_loss + metrics.update_finetune_metrics_train(supervised_loss_metric, + supervised_acc_metric, sup_loss, + l, outputs) + weight_decay = model_lib.add_weight_decay( + model, args.optimizer, args.weight_decay, adjust_per_optimizer=True + ) + weight_decay_metric.update_state(weight_decay) + loss += weight_decay + total_loss_metric.update_state(loss) + # The default behavior of `apply_gradients` is to sum gradients from all + # replicas so we divide the loss by the number of replicas so that the + # mean gradient is applied. + loss = loss / strategy.num_replicas_in_sync + grads = tape.gradient(loss, model.trainable_variables) + optimizer.apply_gradients(zip(grads, model.trainable_variables)) + + with strategy.scope(): + + @tf.function + def train_single_step(iterator): + # Drop the "while" prefix created by tf.while_loop which otherwise + # gets prefixed to every variable name. This does not affect training + # but does affect the checkpoint conversion script. + # TODO(b/161712658): Remove this. + with tf.name_scope(''): + images, labels = next(iterator) + features, labels = images, {'labels': labels} + strategy.run(single_step, (features, labels)) + + def train_multiple_steps(iterator): + for _ in tqdm(range(steps_per_loop)): + train_single_step(iterator) + + global_step = optimizer.iterations + cur_step = global_step.numpy() + iterator = iter(ds) + logging.debug("Beginning training") + while cur_step < train_steps: + # Calls to tf.summary.xyz lookup the summary writer resource which is + # set by the summary writer's context manager. + with summary_writer.as_default(): + train_multiple_steps(iterator) + cur_step = global_step.numpy() + checkpoint_manager.save(cur_step) + logging.info('Completed: %d / %d steps', cur_step, train_steps) + metrics.log_and_write_metrics_to_summary(all_metrics, cur_step) + tf.summary.scalar( + 'learning_rate', + learning_rate(tf.cast(global_step, dtype=tf.float32)), + global_step) + summary_writer.flush() + for metric in all_metrics: + metric.reset_states() + logging.info('Training complete...') + + if args.mode == 'train_then_eval': + perform_evaluation(model, builder, eval_steps, + checkpoint_manager.latest_checkpoint, strategy, + model_dir, cache_dataset, args) + else: + # Export as SavedModel for finetuning and inference. + save( + model, + os.path.join(model_dir, 'saved_model'), + args, + global_step=global_step)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/simclr/simclr/tf2/model/index.html b/docs/_modules/slideflow/simclr/simclr/tf2/model/index.html new file mode 100644 index 000000000..fe8f81b93 --- /dev/null +++ b/docs/_modules/slideflow/simclr/simclr/tf2/model/index.html @@ -0,0 +1,751 @@ + + + + + + + + + + + + slideflow.simclr.simclr.tf2.model — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.simclr.simclr.tf2.model

+# coding=utf-8
+# Copyright 2020 The SimCLR Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific simclr governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Model specification for SimCLR."""
+
+import math
+
+import tensorflow.compat.v2 as tf
+from . import data_util
+from . import lars_optimizer
+from . import resnet
+
+
+def build_optimizer(learning_rate, optimizer, momentum, weight_decay):
+  """Returns the optimizer."""
+  if optimizer == 'momentum':
+    return tf.keras.optimizers.SGD(learning_rate, momentum, nesterov=True)
+  elif optimizer == 'adam':
+    return tf.keras.optimizers.Adam(learning_rate)
+  elif optimizer == 'lars':
+    return lars_optimizer.LARSOptimizer(
+        learning_rate,
+        momentum=momentum,
+        weight_decay=weight_decay,
+        exclude_from_weight_decay=[
+            'batch_normalization', 'bias', 'head_supervised'
+        ])
+  else:
+    raise ValueError('Unknown optimizer {}'.format(optimizer))
+
+
+def add_weight_decay(model, optimizer, weight_decay, adjust_per_optimizer=True):
+  """Compute weight decay."""
+  if adjust_per_optimizer and 'lars' in optimizer:
+    # Weight decay are taking care of by optimizer for these cases.
+    # Except for supervised head, which will be added here.
+    l2_losses = [
+        tf.nn.l2_loss(v)
+        for v in model.trainable_variables
+        if 'head_supervised' in v.name and 'bias' not in v.name
+    ]
+    if l2_losses:
+      return weight_decay * tf.add_n(l2_losses)
+    else:
+      return 0
+
+  # TODO(srbs): Think of a way to avoid name-based filtering here.
+  l2_losses = [
+      tf.nn.l2_loss(v)
+      for v in model.trainable_weights
+      if 'batch_normalization' not in v.name
+  ]
+  loss = weight_decay * tf.add_n(l2_losses)
+  return loss
+
+
+def get_train_steps(num_examples, train_steps, train_epochs, train_batch_size):
+  """Determine the number of training steps."""
+  return train_steps or (
+      num_examples * train_epochs // train_batch_size + 1)
+
+
+class WarmUpAndCosineDecay(tf.keras.optimizers.schedules.LearningRateSchedule):
+  """Applies a warmup schedule on a given learning rate decay schedule."""
+
+  def __init__(
+    self,
+    learning_rate,
+    num_examples,
+    *,
+    warmup_epochs=10,
+    train_batch_size=512,
+    learning_rate_scaling='linear',
+    train_steps=0,
+    train_epochs=100,
+    name=None
+  ):
+    super(WarmUpAndCosineDecay, self).__init__()
+    self.base_learning_rate = learning_rate
+    self.num_examples = num_examples
+    self._name = name
+    self.warmup_epochs = warmup_epochs
+    self.train_batch_size = train_batch_size
+    self.learning_rate_scaling = learning_rate_scaling
+    self.train_steps = train_steps
+    self.train_epochs = train_epochs
+
+  def __call__(self, step):
+    with tf.name_scope(self._name or 'WarmUpAndCosineDecay'):
+      warmup_steps = int(
+          round(self.warmup_epochs * self.num_examples //
+                self.train_batch_size))
+      if self.learning_rate_scaling == 'linear':
+        scaled_lr = self.base_learning_rate * self.train_batch_size / 256.
+      elif self.learning_rate_scaling == 'sqrt':
+        scaled_lr = self.base_learning_rate * math.sqrt(self.train_batch_size)
+      else:
+        raise ValueError('Unknown learning rate scaling {}'.format(
+            self.learning_rate_scaling))
+      learning_rate = (
+          step / float(warmup_steps) * scaled_lr if warmup_steps else scaled_lr)
+
+      # Cosine decay learning rate schedule
+      total_steps = get_train_steps(self.num_examples, self.train_steps,
+        self.train_epochs, self.train_batch_size)
+      # TODO(srbs): Cache this object.
+      cosine_decay = tf.keras.experimental.CosineDecay(
+          scaled_lr, total_steps - warmup_steps)
+      learning_rate = tf.where(step < warmup_steps, learning_rate,
+                               cosine_decay(step - warmup_steps))
+
+      return learning_rate
+
+  def get_config(self):
+    return {
+        'base_learning_rate': self.base_learning_rate,
+        'num_examples': self.num_examples,
+    }
+
+
+class LinearLayer(tf.keras.layers.Layer):
+
+  def __init__(
+    self,
+    num_classes,
+    use_bias=True,
+    use_bn=False,
+    name='linear_layer',
+    **kwargs
+  ):
+    # Note: use_bias is ignored for the dense layer when use_bn=True.
+    # However, it is still used for batch norm.
+    super(LinearLayer, self).__init__(**kwargs)
+    self.num_classes = num_classes
+    self.use_bias = use_bias
+    self.use_bn = use_bn
+    self._name = name
+    if self.use_bn:
+      self.bn_relu = resnet.BatchNormRelu(relu=False, center=use_bias)
+
+  def build(self, input_shape):
+    # TODO(srbs): Add a new SquareDense layer.
+    if callable(self.num_classes):
+      num_classes = self.num_classes(input_shape)
+    else:
+      num_classes = self.num_classes
+    self.dense = tf.keras.layers.Dense(
+        num_classes,
+        kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01),
+        use_bias=self.use_bias and not self.use_bn)
+    super(LinearLayer, self).build(input_shape)
+
+  def call(self, inputs, training):
+    assert inputs.shape.ndims == 2, inputs.shape
+    inputs = self.dense(inputs)
+    if self.use_bn:
+      inputs = self.bn_relu(inputs, training=training)
+    return inputs
+
+
+class ProjectionHead(tf.keras.layers.Layer):
+
+  def __init__(
+    self,
+    proj_out_dim,
+    proj_head_mode='nonlinear',
+    num_proj_layers=3,
+    ft_proj_selector=0,
+    **kwargs
+  ):
+    self.linear_layers = []
+    if proj_head_mode == 'none':
+      pass  # directly use the output hiddens as hiddens
+    elif proj_head_mode == 'linear':
+      self.linear_layers = [
+          LinearLayer(
+              num_classes=proj_out_dim, use_bias=False, use_bn=True, name='l_0')
+      ]
+    elif proj_head_mode == 'nonlinear':
+      for j in range(num_proj_layers):
+        if j != num_proj_layers - 1:
+          # for the middle layers, use bias and relu for the output.
+          self.linear_layers.append(
+              LinearLayer(
+                  num_classes=lambda input_shape: int(input_shape[-1]),
+                  use_bias=True,
+                  use_bn=True,
+                  name='nl_%d' % j))
+        else:
+          # for the final layer, neither bias nor relu is used.
+          self.linear_layers.append(
+              LinearLayer(
+                  num_classes=proj_out_dim,
+                  use_bias=False,
+                  use_bn=True,
+                  name='nl_%d' % j))
+    else:
+      raise ValueError('Unknown head projection mode {}'.format(
+          proj_head_mode))
+    super(ProjectionHead, self).__init__(**kwargs)
+
+    self.proj_head_mode = proj_head_mode
+    self.num_proj_layers = num_proj_layers
+    self.ft_proj_selector = ft_proj_selector
+
+  def call(self, inputs, training):
+    if self.proj_head_mode == 'none':
+      return inputs  # directly use the output hiddens as hiddens
+    hiddens_list = [tf.identity(inputs, 'proj_head_input')]
+    if self.proj_head_mode == 'linear':
+      assert len(self.linear_layers) == 1, len(self.linear_layers)
+      return hiddens_list.append(self.linear_layers[0](hiddens_list[-1],
+                                                       training))
+    elif self.proj_head_mode == 'nonlinear':
+      for j in range(self.num_proj_layers):
+        hiddens = self.linear_layers[j](hiddens_list[-1], training)
+        if j != self.num_proj_layers - 1:
+          # for the middle layers, use bias and relu for the output.
+          hiddens = tf.nn.relu(hiddens)
+        hiddens_list.append(hiddens)
+    else:
+      raise ValueError('Unknown head projection mode {}'.format(
+          self.proj_head_mode))
+    # The first element is the output of the projection head.
+    # The second element is the input of the finetune head.
+    proj_head_output = tf.identity(hiddens_list[-1], 'proj_head_output')
+    return proj_head_output, hiddens_list[self.ft_proj_selector]
+
+
+class SupervisedHead(tf.keras.layers.Layer):
+
+  def __init__(self, num_classes, name='head_supervised', **kwargs):
+    super(SupervisedHead, self).__init__(name=name, **kwargs)
+    self.linear_layer = LinearLayer(num_classes)
+
+  def call(self, inputs, training):
+    inputs = self.linear_layer(inputs, training)
+    inputs = tf.identity(inputs, name='logits_sup')
+    return inputs
+
+
+
[docs]class SimCLR(tf.keras.models.Model): + """Resnet model with projection or supervised layer.""" + + def __init__( + self, + num_classes, + resnet_depth=50, + width_multiplier=1, + sk_ratio=0., + se_ratio=0., + image_size=224, + batch_norm_decay=0.9, + train_mode='pretrain', + lineareval_while_pretraining=True, + fine_tune_after_block=-1, + use_blur=True, + proj_out_dim=128, + proj_head_mode='nonlinear', + num_proj_layers=3, + ft_proj_selector=0, + **kwargs +): + super(SimCLR, self).__init__(**kwargs) + self.resnet_model = resnet.resnet( + train_mode=train_mode, + width_multiplier=width_multiplier, + resnet_depth=resnet_depth, + cifar_stem=image_size <= 32, + sk_ratio=sk_ratio, + se_ratio=se_ratio, + batch_norm_decay=batch_norm_decay, + fine_tune_after_block=fine_tune_after_block + ) + self._projection_head = ProjectionHead( + proj_out_dim, + proj_head_mode=proj_head_mode, + num_proj_layers=num_proj_layers, + ft_proj_selector=ft_proj_selector + ) + if ((train_mode == 'finetune' or lineareval_while_pretraining) and num_classes): + self.supervised_head = SupervisedHead(num_classes) + self.train_mode = train_mode + self.fine_tune_after_block = fine_tune_after_block + self.use_blur = use_blur + self.image_size = image_size + self.lineareval_while_pretraining = lineareval_while_pretraining + self.num_classes = num_classes + + def __call__(self, inputs, training): + features = inputs + if training and self.train_mode == 'pretrain': + if self.fine_tune_after_block > -1: + raise ValueError('Does not support layer freezing during pretraining,' + 'should set fine_tune_after_block<=-1 for safety.') + if inputs.shape[3] is None: + raise ValueError('The input channels dimension must be statically known ' + f'(got input shape {inputs.shape})') + num_transforms = inputs.shape[3] // 3 + num_transforms = tf.repeat(3, num_transforms) + # Split channels, and optionally apply extra batched augmentation. + features_list = tf.split( + features, num_or_size_splits=num_transforms, axis=-1) + if self.use_blur and training and self.train_mode == 'pretrain': + features_list = data_util.batch_random_blur(features_list, + self.image_size, + self.image_size) + features = tf.concat(features_list, 0) # (num_transforms * bsz, h, w, c) + + # Base network forward pass. + hiddens = self.resnet_model(features, training=training) + + # Add heads. + projection_head_outputs, supervised_head_inputs = self._projection_head( + hiddens, training) + + if self.train_mode == 'finetune': + supervised_head_outputs = self.supervised_head(supervised_head_inputs, + training) + return None, supervised_head_outputs + elif (self.train_mode == 'pretrain' + and self.lineareval_while_pretraining + and self.num_classes): + # When performing pretraining and linear evaluation together we do not + # want information from linear eval flowing back into pretraining network + # so we put a stop_gradient. + supervised_head_outputs = self.supervised_head( + tf.stop_gradient(supervised_head_inputs), training) + return projection_head_outputs, supervised_head_outputs + else: + return projection_head_outputs, None
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/simclr/simclr/tf2/utils/index.html b/docs/_modules/slideflow/simclr/simclr/tf2/utils/index.html new file mode 100644 index 000000000..e40097430 --- /dev/null +++ b/docs/_modules/slideflow/simclr/simclr/tf2/utils/index.html @@ -0,0 +1,653 @@ + + + + + + + + + + + + slideflow.simclr.simclr.tf2.utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.simclr.simclr.tf2.utils

+"""Utility functions."""
+
+import json
+import tensorflow as tf
+import json
+from os.path import dirname, join, exists
+from slideflow import log
+
+# -----------------------------------------------------------------------------
+
+
[docs]class SimCLR_Args: + def __init__( + self, + learning_rate=0.075, + learning_rate_scaling='sqrt', + warmup_epochs=10, + weight_decay=1e-4, + batch_norm_decay=0.9, + train_batch_size=512, + train_split='train', + train_epochs=100, + train_steps=0, + eval_steps=0, + eval_batch_size=256, + checkpoint_epochs=1, + checkpoint_steps=0, + eval_split='validation', + dataset='imagenet2012', + mode='train', + train_mode='pretrain', + lineareval_while_pretraining=True, + zero_init_logits_layer=False, + fine_tune_after_block=-1, + master=None, + data_dir=None, + optimizer='lars', + momentum=0.9, + keep_checkpoint_max=5, + temperature=0.1, + hidden_norm=True, + proj_head_mode='nonlinear', + proj_out_dim=128, + num_proj_layers=3, + ft_proj_selector=0, + global_bn=True, + width_multiplier=1, + resnet_depth=50, + sk_ratio=0., + se_ratio=0., + image_size=224, + color_jitter_strength=1.0, + use_blur=True, + num_classes=None, + stain_augment=True, + ) -> None: + """SimCLR arguments. + + A class containg all default - if not overwritten at initialization - + SimCLR arguments. + + Keyword Args: + learning_rate (float): Initial learning rate per batch size of 256. + learning_rate_scaling (str): How to scale the learning rate as a + function of batch size. 'linear' or 'sqrt'. + warmup_epochs (int): Number of epochs of warmup. + weight_decay (float): Amount of weight decay to use. + batch_norm_decay (float): Batch norm decay parameter. + train_batch_size (int): Batch size for training. + train_split (str): Split for training + train_epoch (int): Number of epochs to train for. + train_step (int): Number of steps to train for. If provided, overrides + train_epochs. + eval_steps (int): Number of steps to eval for. If not provided, evals + over entire dataset. + eval_batch_size (int): Batch size for eval. + checkpoint_epochs (int): Number of epochs between + checkpoints/summaries. + checkpoint_steps (int): Number of steps between checkpoints/summaries. + If provided, overrides checkpoint_epochs. + eval_split (str): Split for evaluation. + dataset (str): Name of a dataset. + mode (str): Whether to perform training or evaluation. 'train', + 'eval', or 'train_then_eval' + train_mode (str): The train mode controls different objectives and + trainable components. + lineareval_while_pretraining (bool): Whether to finetune supervised + head while pretraining. 'pretrain' or 'finetune' + zero_init_logits_layer (bool): If True, zero initialize layers after + avg_pool for supervised learning. + fine_tune_after_block (int): The layers after which block that we will + fine-tune. -1 means fine-tuning everything. 0 means fine-tuning + after stem block. 4 means fine-tuning just the linear head. + master (str): Address/name of the TensorFlow master to use. + By default, use an in-process master. + data_dir (str): Directory where dataset is stored. + optimizer (str): Optimizer to use. 'momentum', 'adam', 'lars' + momentum (float): Momentum parameter. + keep_checkpoint_max (int): Maximum number of checkpoints to keep. + temperature (float): Temperature parameter for contrastive loss. + hidden_norm (bool): Temperature parameter for contrastive loss. + proj_head_mode (str): How the head projection is done. 'none', + 'linear', 'nonlinear' + proj_out_dim (int): Number of head projection dimension. + num_proj_layers (int): Number of non-linear head layers. + ft_proj_selector (int): Which layer of the projection head to use + during fine-tuning. 0 means no projection head, and -1 means the + final layer. + global_bn (bool): Whether to aggregate BN statistics across + distributed cores. + width_multiplier (int): Multiplier to change width of network. + resnet_depth (int): Depth of ResNet. + sk_ratio (float): If it is bigger than 0, it will enable SK. + Recommendation: 0.0625. + se_ratio (float): If it is bigger than 0, it will enable SE. + image_size (int): Input image size. + color_jitter_strength (float): The strength of color jittering. + use_blur (bool): Whether or not to use Gaussian blur for augmentation + during pretraining. + num_classes (int): Number of classes for the supervised head. + """ + for argname, argval in dict(locals()).items(): + setattr(self, argname, argval) + + def to_dict(self): + return {k:v for k,v in vars(self).items() + if k not in ('model_kwargs', 'self')} + + def __repr__(self): + return '{}(\n{}\n)'.format( + self.__class__.__name__, + ',\n'.join(' {}={!r}'.format(k, v) for k, v in self.to_dict().items()) + ) + + @property + def model_kwargs(self): + return { + k: getattr(self, k) + for k in ('num_classes', 'resnet_depth', 'width_multiplier', + 'sk_ratio', 'se_ratio', 'image_size', 'batch_norm_decay', + 'train_mode', 'use_blur', 'proj_out_dim', 'proj_head_mode', + 'lineareval_while_pretraining', 'fine_tune_after_block', + 'num_proj_layers', 'ft_proj_selector') + }
+ +# ----------------------------------------------------------------------------- + +
[docs]def get_args(**kwargs): + """Configure a ``SimCLR_Args`` object for training SimCLR. + + Keyword args: + **kwargs: Please see the :class:`slideflow.simclr.SimCLR_Args` documentation + for information on available parameters. + + Returns: + slideflow.simclr.SimCLR_Args + + """ + return SimCLR_Args(**kwargs)
+ + +
[docs]def load_model_args(model_path, ignore_missing=False): + """Load args.json associated with a given SimCLR model or checkpoint. + + Args: + model_path (str): Path to SimCLR model or checkpoint. + + Returns: + Dictionary of contents of args.json file. If file is not found and + `ignore_missing` is False, will return None. If `ignore_missing` is + True, will raise an OSError. + + Raises: + OSError: If args.json cannot be found and `ignore_missing` is False. + """ + for flag_path in (join(model_path, 'args.json'), + join(dirname(model_path), 'args.json')): + if exists(flag_path): + with open(flag_path, 'r') as f: + return SimCLR_Args(**json.load(f)) + if ignore_missing: + return None + else: + raise OSError(f"Unable to find args.json for SimCLR model {model_path}")
+ +# ----------------------------------------------------------------------------- + +def json_serializable(val): + try: + json.dumps(val) + return True + except TypeError: + return False + + +def get_salient_tensors_dict(include_projection_head, include_supervised_head): + """Returns a dictionary of tensors.""" + graph = tf.compat.v1.get_default_graph() + result = {} + for i in range(1, 5): + result['block_group%d' % i] = graph.get_tensor_by_name( + 'resnet/block_group%d/block_group%d:0' % (i, i)) + result['initial_conv'] = graph.get_tensor_by_name( + 'resnet/initial_conv/Identity:0') + result['initial_max_pool'] = graph.get_tensor_by_name( + 'resnet/initial_max_pool/Identity:0') + result['final_avg_pool'] = graph.get_tensor_by_name('resnet/final_avg_pool:0') + if include_supervised_head: + result['logits_sup'] = graph.get_tensor_by_name( + 'head_supervised/logits_sup:0') + if include_projection_head: + result['proj_head_input'] = graph.get_tensor_by_name( + 'projection_head/proj_head_input:0') + result['proj_head_output'] = graph.get_tensor_by_name( + 'projection_head/proj_head_output:0') + return result + +def _restore_latest_or_from_pretrain(checkpoint_manager, args, checkpoint_path): + """Restores the latest ckpt if training already. + + Or restores from checkpoint_path if in finetune mode. + + Args: + checkpoint_manager: tf.traiin.CheckpointManager. + """ + latest_ckpt = checkpoint_manager.latest_checkpoint + if latest_ckpt: + # The model is not build yet so some variables may not be available in + # the object graph. Those are lazily initialized. To suppress the warning + # in that case we specify `expect_partial`. + log.info('Restoring from %s', latest_ckpt) + checkpoint_manager.checkpoint.restore(latest_ckpt).expect_partial() + elif args.train_mode == 'finetune': + # Restore from pretrain checkpoint. + assert checkpoint_path, 'Missing pretrain checkpoint.' + log.info('Restoring from %s', checkpoint_path) + checkpoint_manager.checkpoint.restore(checkpoint_path).expect_partial() + # TODO(iamtingchen): Can we instead use a zeros initializer for the + # supervised head? + if args.zero_init_logits_layer: + model = checkpoint_manager.checkpoint.model + output_layer_parameters = model.supervised_head.trainable_weights + log.info('Initializing output layer parameters %s to zero', + [x.op.name for x in output_layer_parameters]) + for x in output_layer_parameters: + x.assign(tf.zeros_like(x)) +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/qc/gaussian/index.html b/docs/_modules/slideflow/slide/qc/gaussian/index.html new file mode 100644 index 000000000..1b253440f --- /dev/null +++ b/docs/_modules/slideflow/slide/qc/gaussian/index.html @@ -0,0 +1,538 @@ + + + + + + + + + + + + slideflow.slide.qc.gaussian — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.qc.gaussian

+"""Gaussian filter QC algorithm."""
+
+import numpy as np
+import slideflow as sf
+import skimage
+from slideflow import errors
+from typing import Union, Optional
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Gaussian: + + def __init__( + self, + mpp: Optional[float] = None, + sigma: int = 3, + threshold: float = 0.02 + ) -> None: + """Prepare Gaussian filtering algorithm for filtering a slide. + + This method is used to remove out-of-focus areas and pen marks. + + This QC method works by obtaining a thumbnail of a slide, and converting + the image into grayspace. A gaussian filter with a given sigma + (default=3) is calculated using scikit-image. Areas with blur below + the given threshold (default=0.02) are filtered out. + + Examples + Apply Gaussian filtering to a slide. + + .. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + wsi = sf.WSI(...) + gaussian = qc.Gaussian() + wsi.qc(gaussian) + + Args: + mpp (float): Microns-per-pixel at which to perform filtering. + Defaults to 4 times the tile extraction MPP (e.g. for a + tile_px/tile_um combination at 10X effective magnification, + where tile_px=tile_um, the default blur_mpp would be 4, or + effective magnification 2.5x). + sigma (int): Sigma (radius) for Gaussian filter. Defaults to 3. + threshold (float): Gaussian threshold. Defaults to 0.02. + """ + self.mpp = mpp + self.sigma = sigma + self.threshold = threshold + + def __repr__(self): + return "Gaussian(mpp={!r}, sigma={!r}, threshold={!r})".format( + self.mpp, self.sigma, self.threshold + ) + + def _thumb_from_slide( + self, + wsi: "sf.WSI" + ) -> np.ndarray: + """Get a thumbnail from the given slide. + + Args: + wsi (sf.WSI): Whole-slide image. + + Returns: + np.ndarray: RGB thumbnail of the whole-slide image. + """ + if self.mpp is None: + _mpp = (wsi.tile_um/wsi.tile_px)*4 + sf.log.info(f"Performing Gaussian blur filter at mpp={_mpp:.3f}") + else: + _mpp = self.mpp + thumb = wsi.thumb(mpp=_mpp) + if thumb is None: + raise errors.QCError( + f"Thumbnail error for slide {wsi.shortname}, QC failed" + ) + thumb = np.array(thumb) + if thumb.shape[-1] == 4: + thumb = thumb[:, :, :3] + return thumb + + def __call__( + self, + wsi: Union["sf.WSI", np.ndarray], + mask: Optional[np.ndarray] = None, + ) -> np.ndarray: + """Perform Gaussian filtering on the given slide or image. + + Args: + slide (sf.WSI, np.ndarray): Either a Slideflow WSI or a numpy array, + with shape (h, w, c) and type np.uint8. + mask (np.ndarray): Restrict Otsu's threshold to the area of the + image indicated by this boolean mask. Defaults to None. + + Returns: + np.ndarray: QC boolean mask, where True = filtered out. + """ + if isinstance(wsi, sf.WSI): + thumb = self._thumb_from_slide(wsi) + else: + thumb = wsi + + gray = skimage.color.rgb2gray(thumb) + img_laplace = np.abs(skimage.filters.laplace(gray)) + gaussian = skimage.filters.gaussian(img_laplace, sigma=self.sigma) + blur_mask = gaussian <= self.threshold + + # Assign blur burden value + existing_qc_mask = mask + if mask is None and isinstance(wsi, sf.WSI): + existing_qc_mask = wsi.qc_mask + if existing_qc_mask is not None and isinstance(wsi, sf.WSI): + wsi.blur_burden = blur_burden(blur_mask, existing_qc_mask) + sf.log.debug(f"Blur burden: {wsi.blur_burden}") + + return blur_mask
+ +# ----------------------------------------------------------------------------- + +def blur_burden(blur_mask, existing_mask): + blur_mask = skimage.transform.resize(blur_mask, existing_mask.shape) + blur_mask = blur_mask.astype(bool) + blur = np.count_nonzero( + np.logical_and( + blur_mask, + np.logical_xor(blur_mask, existing_mask) + ) + ) + return blur / (blur_mask.shape[0] * blur_mask.shape[1]) +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/qc/otsu/index.html b/docs/_modules/slideflow/slide/qc/otsu/index.html new file mode 100644 index 000000000..7018c12e5 --- /dev/null +++ b/docs/_modules/slideflow/slide/qc/otsu/index.html @@ -0,0 +1,604 @@ + + + + + + + + + + + + slideflow.slide.qc.otsu — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.qc.otsu

+"""Otsu's thresholding QC algorithm."""
+
+import cv2
+import numpy as np
+import slideflow as sf
+import rasterio
+import shapely.affinity as sa
+from slideflow import errors
+from typing import Union, Optional
+
+
+def _apply_mask(image, mask):
+    resized_mask = cv2.resize(
+        (~mask).astype(np.uint8),
+        (image.shape[1], image.shape[0]),
+        interpolation=cv2.INTER_NEAREST
+    )
+    return cv2.bitwise_or(image, image, mask=resized_mask)
+
+
+def _get_level_for_otsu(wsi: "sf.WSI", min_size: int = 500) -> int:
+    """Find the smallest downsample level of a minimum size."""
+    smallest_dim = np.array([min(L['dimensions']) for L in wsi.levels])
+    level_ids = np.array([L['level'] for L in wsi.levels])
+    sorted_idx = np.argsort(smallest_dim)
+    try:
+        best_idx = np.where(smallest_dim[sorted_idx] > min_size)[0][0]
+    except IndexError:
+        # If the slide is smaller than the target minimum dimension,
+        # use the full slide image
+        best_idx = sorted_idx[-1]
+    return level_ids[sorted_idx][best_idx]
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Otsu: + + def __init__(self, slide_level: Optional[int] = None): + """Prepare Otsu's thresholding algorithm for filtering a slide. + + This method is used to detect areas of tissue and remove background. + + This QC method works by obtaining a thumbnail of a slide, and converting + the image into the HSV color space. The HSV image undergoes a median blur + using OpenCV with a kernel size of 7, and the image is thresholded + using ``cv2.THRESH_OTSU``. This results in a binary mask, which + is then applied to the slide for filtering. + + Original paper: https://ieeexplore.ieee.org/document/4310076 + + .. warning:: + + Otsu's thresholding may give unexpected results with slides + that have many large pen marks, erroneously identifying pen marks + as tissue and removing the actual tissue as background. + This behavior can be circumvented by applying a Gaussian filter + before Otsu's thresholding. + + .. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + wsi = sf.WSI(...) + gaussian = qc.GaussianV2() + otsu = qc.Otsu() + wsi.qc([gaussian, otsu]) + + Examples + Apply Otsu's thresholding to a slide. + + .. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + wsi = sf.WSI(...) + otsu = qc.Otsu() + wsi.qc(otsu) + + Args: + level (int): Slide pyramid level at which to perform filtering. + Defaults to second-lowest available level. + """ + self.level = slide_level + + def __repr__(self): + return "Otsu(slide_level={!r})".format( + self.level + ) + + def _thumb_from_slide( + self, + wsi: "sf.WSI", + ) -> np.ndarray: + """Get a thumbnail from the given slide. + + Args: + wsi (sf.WSI): Whole-slide image. + + Returns: + np.ndarray: RGB thumbnail of the whole-slide image. + """ + if self.level is None: + # Otsu's thresholding can be done on the smallest downsample level, + # with the smallest dimension being at least 500 pixels + level = _get_level_for_otsu(wsi, min_size=500) + else: + level = self.level + + try: + if wsi.slide.has_levels: + sf.log.debug("Applying Otsu's thresholding at level={}".format(level)) + thumb = wsi.slide.read_level(level=level, to_numpy=True) + else: + sf.log.debug("Applying Otsu's thresholding at level=None") + thumb = wsi.slide.read_level(to_numpy=True) + except Exception as e: + raise errors.QCError( + f"Thumbnail error for slide {wsi.shortname}, QC failed: {e}" + ) + if thumb.shape[-1] == 4: + thumb = thumb[:, :, :3] + + # Only apply Otsu thresholding within ROI, if present + # If ROI is the ROI_issues, invert it + if wsi.has_rois(): + ofact = 1 / wsi.slide.level_downsamples[level] + roi_mask = np.zeros((thumb.shape[0], thumb.shape[1])) + + # Scale ROIs to thumbnail size + scaled_polys = wsi._scale_polys( + [roi.poly for roi in wsi.get_rois(ignore_artifact=True)], + xfact=ofact, + yfact=ofact, + ) + scaled_issues_polys = wsi._scale_polys( + [roi.invert(*wsi.dimensions).poly for roi in wsi.get_artifacts()], + xfact=ofact, + yfact=ofact, + ) + # Rasterize scaled ROIs + if len(scaled_polys) > 0: + roi_mask = rasterio.features.rasterize( + scaled_polys, + out_shape=thumb.shape[:2] + ) + if len(scaled_issues_polys) > 0: + roi_mask_issues = rasterio.features.rasterize( + scaled_issues_polys, + out_shape=thumb.shape[:2] + ) + # If there are artifacts, remove them from the ROI mask + if len(scaled_polys) > 0: + roi_mask = np.minimum(roi_mask_issues, roi_mask) + else: + roi_mask = roi_mask_issues + + if wsi.roi_method == 'outside': + roi_mask = ~roi_mask + thumb = cv2.bitwise_or( + thumb, + thumb, + mask=roi_mask.astype(np.uint8) + ) + # Only apply Otsu thresholding within areas not already removed + # with other QC methods. + if wsi.has_non_roi_qc(): + thumb = _apply_mask(thumb, wsi.get_qc_mask(roi=False)) + return thumb + + def __call__( + self, + wsi: Union["sf.WSI", np.ndarray], + mask: Optional[np.ndarray] = None, + ) -> np.ndarray: + """Perform Otsu's thresholding on the given slide or image. + + Args: + slide (sf.WSI, np.ndarray): Either a Slideflow WSI or a numpy array, + with shape (h, w, c) and type np.uint8. + mask (np.ndarray): Restrict Otsu's threshold to the area of the + image indicated by this boolean mask. Defaults to None. + + Returns: + np.ndarray: QC boolean mask, where True = filtered out. + """ + if isinstance(wsi, sf.WSI): + thumb = self._thumb_from_slide(wsi) + else: + thumb = wsi + if mask is not None: + thumb = _apply_mask(thumb, mask) + hsv_img = cv2.cvtColor(thumb, cv2.COLOR_RGB2HSV) + img_med = cv2.medianBlur(hsv_img[:, :, 1], 7) + flags = cv2.THRESH_OTSU+cv2.THRESH_BINARY_INV + _, otsu_mask = cv2.threshold(img_med, 0, 255, flags) + return otsu_mask.astype(bool)
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/qc/saver/index.html b/docs/_modules/slideflow/slide/qc/saver/index.html new file mode 100644 index 000000000..af500a9a1 --- /dev/null +++ b/docs/_modules/slideflow/slide/qc/saver/index.html @@ -0,0 +1,507 @@ + + + + + + + + + + + + slideflow.slide.qc.saver — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.qc.saver

+"""Functions for saving/loading QC masks."""
+
+import numpy as np
+import slideflow as sf
+from os.path import dirname, join, exists
+from typing import Optional
+
+
[docs]class Save: + + def __init__(self, dest: Optional[str] = None) -> None: + """QC function which saves the mask to a numpy file. + + When this QC method is applied to a slide, the current QC masks + (e.g., as applied by the Otsu or Gaussian filtering methods) are saved + to a numpy file. These saved masks can be loaded in the future + using :class:`slideflow.slide.qc.Load`. Saving/loading masks saves time + by allowing to avoid regenerating masks repeatedly. + + By default, masks are saved in the same folder as whole-slide images. + + .. code-block:: python + + from slideflow.slide import qc + + # Define a QC approach that auto-saves masks + qc = [ + qc.Otsu(), + qc.Save() + ] + P.extract_tiles(qc=qc) + + ... + # Auto-load previously saved masks + qc = [ + qc.Load() + ] + P.extract_tiles(qc=qc) + + Args: + dest (str, optional): Path in which to save the qc mask. + If None, will save in the same directory as the slide. + Defaults to None. + """ + self.dest = dest + + def __repr__(self): + return "Save(dest={!r})".format( + self.dest + ) + + def __call__(self, wsi: "sf.WSI") -> None: + """Save a QC mask for a given slide as a numpy file. + + Args: + wsi (sf.WSI): Whole-slide image. + + Returns: + None + """ + dest = self.dest if self.dest is not None else dirname(wsi.path) + mask = wsi.get_qc_mask(roi=False) + if mask: + np.savez(join(dest, wsi.name+'_qc.npz'), mask=mask) + return None
+ + +
[docs]class Load: + + def __init__(self, source: Optional[str] = None) -> None: + """QC function which loads a saved numpy mask. + + Loads and applies a QC mask which was saved by + :class:`slideflow.slide.qc.Save` + + Args: + source (str, optional): Path to search for qc mask. + If None, will search in the same directory as the slide. + Defaults to None. + """ + self.source = source + + def __repr__(self): + return "Load(source={!r})".format( + self.source + ) + + def __call__(self, wsi: "sf.WSI") -> Optional[np.ndarray]: + """Load a QC mask for a given slide from a numpy file. + + Args: + wsi (sf.WSI): Whole-slide image. + + Returns: + Optional[np.ndarray]: Returns the QC mask if a {slide}_qc.npz file + was found, otherwise returns None. + """ + source = self.source if self.source is not None else dirname(wsi.path) + if exists(join(source, wsi.name+'_qc.npz')): + return np.load(join(source, wsi.name+'_qc.npz'))['mask'] + else: + return None
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/qc/strided_dl/index.html b/docs/_modules/slideflow/slide/qc/strided_dl/index.html new file mode 100644 index 000000000..ec11f3850 --- /dev/null +++ b/docs/_modules/slideflow/slide/qc/strided_dl/index.html @@ -0,0 +1,683 @@ + + + + + + + + + + + + slideflow.slide.qc.strided_dl — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.qc.strided_dl

+import numpy as np
+
+from tqdm import tqdm
+from contextlib import contextmanager
+from typing import Callable, Union, Optional, TYPE_CHECKING
+from .strided_qc import _StridedQC, _StridedQC_V2
+
+if TYPE_CHECKING:
+    import slideflow as sf
+
+
+
[docs]class StridedDL(_StridedQC): + + def __init__( + self, + model: Callable, + pred_idx: int, + tile_px: int, + tile_um: Union[str, int], + *, + buffer: int = 8, + verbose: bool = False, + pred_threshold: float = 0.5, + **wsi_kwargs + ): + """QC function which uses a deep learning model to generate a QC mask. + + When this QC method is applied to a slide, the given deep learning model + generates predictions across the whole-slide image (using the class index + specified by ``pred_idx``). Areas with a prediction above + ``pred_threshold`` are masked, to be discarded. + + Examples + Create a DeepFocus module that filters out-of-focus tiles. + + .. code-block:: python + + import slideflow as sf + from slideflow.slide.qc import strided_dl + from deepfocus import deepfocus_v3 + + deepfocus = strided_dl.StridedDL( + model=deepfocus_v3(), + pred_idx=1, + tile_px=64, + tile_um='40x' + ) + wsi = sf.WSI(...) + wsi.qc(deepfocus) + + + Do the same, but using class inheritance. + + .. code-block:: python + + import slideflow as sf + from slideflow.slide.qc import strided_dl + from deepfocus import deepfocus_v3 + + class DeepFocus(strided_dl.StridedDL): + + def __init__(self): + model = deepfocus_v3() + checkpoint = '/path/to/checkpoint-ver5' + load_checkpoint(model, checkpoint) + super().__init__( + model=model, + pred_idx=1, + tile_px=64, + tile_um='40x' + ) + + wsi = sf.WSI(...) + deepfocus = DeepFocus() + wsi.qc(deepfocus) + + + Args: + model (callable): Deep learning model. + pred_idx (int): Index of the model output to interpret as the + final prediction. + tile_px (int): Tile size. + tile_um (str or float): Tile size, in microns (int) or + magnification (str). + + Keyword arguments: + verbose (bool): Show a progress bar during calculation. + buffer (int): Number of tiles (width and height) to extract and + process simultaneously. Extracted tile size (width/height) + will be ``tile_px * buffer``. Defaults to 8. + grayspace_fraction (float): Grayspace fraction when extracting + tiles from slides. Defaults to 1 (disables). + pred_threshold (float): Predictions below this value are masked. + kwargs (Any): All remaining keyword arguments are passed to + :meth:`slideflow.WSI.build_generator()`. + + """ + super().__init__( + tile_px=tile_px, + tile_um=tile_um, + buffer=buffer, + verbose=verbose, + lazy_iter=True, + deterministic=False, + **wsi_kwargs + ) + self.model = model + self.pred_idx = pred_idx + self.pred_threshold = pred_threshold + + def build_mask(self, x, y) -> np.ndarray: + """Build the base, empty QC mask.""" + return np.ones((x, y), dtype=np.float32) + + def apply(self, image: np.ndarray) -> np.ndarray: + """Predict focus value of an image tile using DeepFocus model.""" + y_pred = self.model(image, training=False)[:, self.pred_idx].numpy() + return y_pred.reshape(self.buffer, self.buffer) + + def collate_mask(self, mask: np.ndarray): + """Convert the mask from predictions to bool using a threshold.""" + if self.pred_threshold is not None: + return mask > self.pred_threshold + else: + return mask + + def preprocess(self, image: np.ndarray): + """Apply preprocessing to an image.""" + return np.clip(image.astype(np.float32) / 255, 0, 1) + + @contextmanager + def _set_threshold(self, threshold: Optional[Union[bool, float]]): + """Temporariliy set or disable the prediction threshold.""" + _orig_threshold = self.pred_threshold + if isinstance(threshold, float): + # Set the threshold to a given threshold + self.pred_threshold = threshold + elif threshold is False: + # Disable thresholding (return raw values) + self.pred_threshold = None + + yield + + # Return the threshold to irs original value + self.pred_threshold = _orig_threshold + + def __call__( + self, + wsi: "sf.WSI", + threshold: Optional[Union[bool, float]] = None + ) -> Optional[np.ndarray]: + + with self._set_threshold(threshold): + return super().__call__(wsi)
+ + +# ----------------------------------------------------------------------------- + +def _taper_mask(ly=224, lx=224, sig=7.5): + bsize = max(224, max(ly, lx)) + xm = np.arange(bsize) + xm = np.abs(xm - xm.mean()) + mask = 1/(1 + np.exp((xm - (bsize/2-20)) / sig)) + mask = mask * mask[:, np.newaxis] + mask = mask[bsize//2-ly//2 : bsize//2+ly//2+ly%2, + bsize//2-lx//2 : bsize//2+lx//2+lx%2] + return mask + +# ----------------------------------------------------------------------------- + +class StridedDL_V2(_StridedQC_V2): + + """Implementation of a strided deep learning QC algorithm. + + The _StrdedQC_V2 base class collates tiled QC masks into a single mask by + cropping out the overlap regions. This approach is suitable for algorithms + that generate artifacts at the edges of tiles, but is not adequate for + stitching together deep learning predictions. + + This class is a subclass of _StridedQC_V2, and is designed to stitch + together output from a deep learning QC model for tiles using a tapered mask. + """ + + def __init__( + self, + *args, + out_classes: int = 0, + **kwargs + ): + """Create a new StridedDL_V2 object. + + Args: + *args (Any): Arguments to pass to the parent class. + out_classes (int): Number of output classes from the deep learning model. + If provided, the shape of the QC mask will be (out_classes, h, w). + If 0 or not provided, the shape will be (h, w). + **kwargs (Any): Keyword arguments to pass to the parent class. + """ + super().__init__(*args, **kwargs) + self.out_classes = out_classes + + def _calc_mask(self, item): + """Calculate a QC mask from a given tile.""" + grid_i = item['grid'][0] + grid_j = item['grid'][1] + image = item['image'] + + mask = self.apply(image) + return mask, (grid_i, grid_j) + + def build_masks(self, wsi: "sf.WSI"): + """Return empty arrays for storing QC mask and the average (taper) mask.""" + dim = (wsi.dimensions[1], wsi.dimensions[0]) + px_ratio = wsi.tile_px / wsi.full_extract_px + target_dim = tuple((np.array(dim) * px_ratio).astype(int)) + if self.out_classes: + qc_dim = (self.out_classes, target_dim[0], target_dim[1]) + else: + qc_dim = target_dim + qc_mask = np.zeros(qc_dim, np.float32) + avg_mask = np.zeros(target_dim, np.float32) + return qc_mask, avg_mask + + def get_tile_bounds(self, wsi: "sf.WSI", i: int, j: int): + """Return the bounds of a tile.""" + fy, fx = wsi.grid_to_coord(i, j, anchor="topleft") + px_ratio = wsi.tile_px / wsi.full_extract_px + x0 = int(fx * px_ratio) + y0 = int(fy * px_ratio) + x1 = x0 + wsi.tile_px + y1 = y0 + wsi.tile_px + return x0, x1, y0, y1 + + def __call__( + self, + wsi: "sf.WSI", + ) -> Optional[np.ndarray]: + """Apply QC filtering to a slide.""" + + qc_wsi, mpp = self.get_slide_and_mpp(wsi) + qc_mask, avg_mask = self.build_masks(qc_wsi) + dts = self.build_tile_generator(qc_wsi) + + # Get the base taper mask + taper_mask = _taper_mask(ly=self.tile_px, lx=self.tile_px, sig=7.5) + + # Progress bar tracking + if self.verbose: + pb = tqdm(dts, desc="Generating...", total=qc_wsi.estimated_num_tiles) + else: + pb = dts + + # Apply QC filter to each tile + if self.filter_pool is not None: + map_fn = self.filter_pool.imap_unordered + else: + map_fn = map + for (tile_mask, (i, j)) in map_fn(self._calc_mask, pb): + x0, x1, y0, y1 = self.get_tile_bounds(qc_wsi, i, j) + if self.out_classes: + x1 = min(x1, qc_mask.shape[1]) + y1 = min(y1, qc_mask.shape[2]) + qc_mask[:, x0:x1, y0:y1] += tile_mask[:, 0: x1-x0, 0: y1-y0] * taper_mask[0: x1-x0, 0: y1-y0] + else: + x1 = min(x1, qc_mask.shape[0]) + y1 = min(y1, qc_mask.shape[1]) + qc_mask[x0:x1, y0:y1] += tile_mask[0: x1-x0, 0: y1-y0] * taper_mask[0: x1-x0, 0: y1-y0] + avg_mask[x0:x1, y0:y1] += taper_mask[0: x1-x0, 0: y1-y0] + + # Normalize the mask + qc_mask = qc_mask / avg_mask + + # Close pools + if not self.persistent_threads: + self.close_pools() + + return qc_mask +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/utils/index.html b/docs/_modules/slideflow/slide/utils/index.html new file mode 100644 index 000000000..927c5f6cf --- /dev/null +++ b/docs/_modules/slideflow/slide/utils/index.html @@ -0,0 +1,1365 @@ + + + + + + + + + + + + slideflow.slide.utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.utils

+"""Utility functions and constants for slide reading."""
+
+import slideflow as sf
+import cv2
+import csv
+import io
+import numpy as np
+import shapely.validation as sv
+import shapely.geometry as sg
+import shapely.affinity as sa
+import xml.etree.ElementTree as ET
+
+from shapely.ops import unary_union, polygonize
+from PIL import Image, ImageDraw
+from slideflow import errors, log
+from types import SimpleNamespace
+from typing import Union, List, Tuple, Optional, Dict
+
+# Constants
+DEFAULT_JPG_MPP = 1
+OPS_LEVEL_COUNT = 'openslide.level-count'
+OPS_MPP_X = 'openslide.mpp-x'
+OPS_VENDOR = 'openslide.vendor'
+OPS_BOUNDS_HEIGHT = 'openslide.bounds-height'
+OPS_BOUNDS_WIDTH = 'openslide.bounds-width'
+OPS_BOUNDS_X = 'openslide.bounds-x'
+OPS_BOUNDS_Y = 'openslide.bounds-y'
+TIF_EXIF_KEY_MPP = 65326
+OPS_WIDTH = 'width'
+OPS_HEIGHT = 'height'
+DEFAULT_WHITESPACE_THRESHOLD = 230
+DEFAULT_WHITESPACE_FRACTION = 1.0
+DEFAULT_GRAYSPACE_THRESHOLD = 0.05
+DEFAULT_GRAYSPACE_FRACTION = 0.6
+FORCE_CALCULATE_WHITESPACE = -1
+FORCE_CALCULATE_GRAYSPACE = -1
+ROTATE_90_CLOCKWISE = 1
+ROTATE_180_CLOCKWISE = 2
+ROTATE_270_CLOCKWISE = 3
+FLIP_HORIZONTAL = 4
+FLIP_VERTICAL = 5
+
+
+def OPS_LEVEL_HEIGHT(level: int) -> str:
+    return f'openslide.level[{level}].height'
+
+
+def OPS_LEVEL_WIDTH(level: int) -> str:
+    return f'openslide.level[{level}].width'
+
+
+def OPS_LEVEL_DOWNSAMPLE(level: int) -> str:
+    return f'openslide.level[{level}].downsample'
+
+# -----------------------------------------------------------------------------
+# Classes
+
+class ROI:
+    """Object container for an ROI polygon annotation."""
+
+    def __init__(
+        self,
+        name: str,
+        coordinates: Union[np.ndarray, List[Tuple[int, int]]],
+        *,
+        label: Optional[str] = None,
+        holes: Optional[List["ROI"]] = None
+    ) -> None:
+        self.name = name
+        self._label = label if label else None
+        self.holes = holes if holes else {}
+        self._poly = None
+        self._triangles = None
+        self.coordinates = np.array(coordinates)
+        self.validate()
+
+    def __repr__(self):
+        return f"<ROI (coords={len(self.coordinates)} label={self.label})>"
+
+    @property
+    def description(self) -> str:
+        """Return a description of the ROI."""
+        if not self.holes:
+            return self.name
+        else:
+            return self.name + ' (holes: {})'.format(', '.join(
+                [h.name for h in self.holes.values()]
+            ))
+
+    @property
+    def label(self) -> Optional[str]:
+        """Return the label of the ROI."""
+        return self._label
+
+    @label.setter
+    def label(self, label: str) -> None:
+        """Set the label of the ROI."""
+        self._label = label
+        for h in self.holes.values():
+            h.label = label
+
+    # --- Polygons ------------------------------------------------------------
+
+    @property
+    def poly(self) -> sg.Polygon:
+        """Return the shapely polygon object."""
+        if self._poly is None:
+            self.update_polygon()
+        return self._poly
+
+    @property
+    def triangles(self) -> np.ndarray:
+        """Return the triangulated mesh."""
+        if self._triangles is None:
+            self._triangles = self.create_triangles()
+        return self._triangles
+
+    def make_polygon(self) -> sg.Polygon:
+        """Create a shapely polygon from the coordinates.
+
+        Raises:
+            ValueError: If the coordinates do not form a valid polygon.
+
+        Returns:
+            sg.Polygon: Shapely polygon object.
+
+        """
+        poly = sv.make_valid(sg.Polygon(self.coordinates))
+        to_delete = []
+        for h, hole in self.holes.items():
+            if not poly.contains(hole.poly):
+                # Hole is not contained within the polygon,
+                # so remove it from the list of holes.
+                to_delete.append(h)
+            poly = poly.difference(hole.poly)
+        for h in to_delete:
+            del self.holes[h]
+        return poly
+
+    def update_polygon(self) -> None:
+        """Update the shapely polygon object."""
+        self._poly = self.make_polygon()
+        self._triangles = None
+
+    def scaled_poly(self, scale: float) -> sg.Polygon:
+        """Create a scaled polygon."""
+        poly = sv.make_valid(sg.Polygon(self.scaled_coords(scale)))
+        for h in self.holes.values():
+            poly = poly.difference(h.scaled_poly(scale))
+        return poly
+
+    def poly_coords(self) -> np.ndarray:
+        """Return the coordinates of the polygon."""
+        if self.poly.geom_type in ('MultiPolygon', 'GeometryCollection'):
+            valid_polys = [p for p in self.poly.geoms if p.geom_type == 'Polygon']
+            if not len(valid_polys):
+                return np.array([])
+            else:
+                coords = np.concatenate([
+                    np.stack(p.exterior.coords.xy, axis=-1)
+                    for p in valid_polys
+                ])
+        elif self.poly.geom_type == 'Polygon':
+            coords = np.stack(self.poly.exterior.coords.xy, axis=-1)
+        else:
+            # This should have been caught by the validate function
+            raise errors.InvalidROIError(f"Unrecognized ROI polygon geometry: {self.poly.geom_type}")
+        # Remove duplicate points
+        coords = np.concatenate([
+            # Take the first coordinate
+            np.expand_dims(coords[0], 0),
+            # Only take subsequent coordinates if they are not repeating
+            coords[1:][~np.all(coords[:-1] == coords[1:], axis=-1)]
+        ], axis=0)
+        return coords
+
+    def simplify(self, tolerance: float = 5) -> None:
+        """Simplify the polygon."""
+        if self.poly.geom_type in ('MultiPolygon', 'GeometryCollection'):
+            poly_s = sg.MultiPolygon([p.simplify(tolerance) for p in self.poly.geoms if p.geom_type == 'Polygon'])
+            if not len(poly_s.geoms):
+                # Polygon is empty after simplification, and thus cannot be simplified.
+                log.warning(f"ROI {self.name} is empty after simplification.")
+                pass
+            else:
+                self.coordinates = np.concatenate([np.stack(p.exterior.coords.xy, axis=-1) for p in poly_s.geoms])
+        elif self.poly.geom_type == 'Polygon':
+            poly_s = self.poly.simplify(tolerance=tolerance)
+            self.coordinates = np.stack(poly_s.exterior.coords.xy, axis=-1)
+        else:
+            # This should have been caught by the validate function
+            raise errors.InvalidROIError(f"Unrecognized ROI polygon geometry: {self.poly.geom_type}")
+        for hole in self.holes.values():
+            hole.simplify(tolerance)
+        self.update_polygon()
+
+    def invert(self, width: int, height: int) -> "ROI":
+        """Invert the ROI within the bounds of a whole-slide image.
+
+        Args:
+            width (int): Width of the whole-slide image.
+            height (int): Height of the whole-slide image.
+
+        Returns:
+            ROI: Inverted ROI of shape (width, height) with this ROI as a hole.
+
+        """
+        # Ensure polygon is generated
+        self.update_polygon()
+        # Calculate polygon bounding box (whole-slide)
+        roi_wsi_coords = np.array([[0., 0.], [0., height], [width, height], [width, 0.]])
+        # Create the inverted ROI
+        inverted_ROI = ROI(name=self.name, coordinates=roi_wsi_coords)
+        # Add the hole to the ROI
+        inverted_ROI.add_hole(self)
+        return inverted_ROI
+
+    def create_triangles(self) -> Optional[np.ndarray]:
+        """Create a triangulated mesh from the polygon."""
+
+        def as_open_array(array):
+            if (array[0] == array[-1]).all():
+                return array[:-1]
+            else:
+                return array
+
+        # First, ensure the polygon is valid
+        if not self.polygon_is_valid():
+            sf.log.error(
+                "Unable to create triangles; ROI polygon is invalid."
+            )
+            return None
+        if self.poly.geom_type != 'Polygon' or any([h.poly.geom_type != 'Polygon' for h in self.holes.values()]):
+            sf.log.error(
+                "Unable to create triangles; ROI is not a simple polygon."
+            )
+            return None
+
+        # Vertices of the hole boundaries
+        hole_vertices = {
+            h: as_open_array(hole.poly_coords())
+            for h, hole in self.holes.items()
+        }
+
+        # Filter out holes that are too small
+        valid_holes = [hole for h, hole in self.holes.items() if len(hole_vertices[h]) > 3]
+        hole_vertices = [v for h, v in hole_vertices.items() if len(v) > 3]
+
+        # Verify all holes are contained within the polygon
+        for hole in valid_holes:
+            poly = sv.make_valid(sg.Polygon(self.poly_coords()))
+            if not poly.contains(hole.poly):
+                # Hole is not contained within the polygon
+                return None
+
+        # Vertices of representative points within each hole
+        hole_points = [
+            hole.poly.representative_point().coords[0]
+            for hole in valid_holes
+        ]
+
+        if not len(hole_vertices):
+            hole_vertices = None
+            hole_points = None
+
+        # Build triangles.
+        triangle_vertices = sf.util.create_triangles(
+            as_open_array(self.poly_coords()),
+            hole_vertices=hole_vertices,
+            hole_points=hole_points
+        )
+        return triangle_vertices
+
+    # --- Holes ---------------------------------------------------------------
+
+    def add_hole(self, roi: "ROI") -> None:
+        """Add a hole to the ROI."""
+        hole_name = self.get_next_hole_name()
+        self.holes[hole_name] = roi
+        self.update_polygon()
+
+    def remove_hole(self, roi: Union["ROI", str]) -> None:
+        """Remove a hole from the ROI."""
+        if isinstance(roi, str):
+            roi = self.get_hole(roi)
+        hole_idx = [h for h, r in self.holes.items() if r == roi]
+        del self.holes[hole_idx]
+        self.update_polygon()
+
+    def get_hole(self, name: str) -> "ROI":
+        """Get a hole by name."""
+        for h in self.holes.values():
+            if h.name == name:
+                return h
+        raise ValueError(f"No hole found with name {name}")
+
+    def get_next_hole_name(self) -> str:
+        """Get the next available hole name."""
+        return len(self.holes)
+
+    # --- Other functions -----------------------------------------------------
+
+    def validate(self) -> None:
+        """Validate the exterior coordinates form a valid polygon."""
+        try:
+            poly_coords = self.poly_coords()
+        except ValueError as e:
+            raise errors.InvalidROIError(f"Invalid ROI ({self.name}): {e}")
+        if len(poly_coords) < 4:
+            raise errors.InvalidROIError(f"Invalid ROI ({self.name}): ROI must contain at least 4 coordinates.")
+        if self.poly.geom_type not in ('Polygon', 'MultiPolygon', 'GeometryCollection'):
+            raise errors.InvalidROIError(
+                f"Invalid ROI ({self.name}): ROI must either be a Polygon, or "
+                "MultiPolygon/GeometryCollection containing at least one polygon."
+            )
+        if self.poly.geom_type in ('MultiPolygon', 'GeometryCollection'):
+            if not len([p for p in self.poly.geoms if p.geom_type == 'Polygon']):
+                raise errors.InvalidROIError(
+                    f"Invalid ROI ({self.name}): ROI must contain at least one valid polygon."
+                )
+
+    def polygon_is_valid(self) -> bool:
+        """Check if the polygon is valid."""
+        poly = sg.Polygon(self.coordinates)
+        if not len(list(polygonize(unary_union(poly)))):
+            # Polygon is self-intersecting
+            return False
+        if not poly.is_valid:
+            # Polygon is invalid
+            return False
+        return all([h.polygon_is_valid() for h in self.holes.values()])
+
+    def scaled_coords(self, scale: float) -> np.ndarray:
+        return np.multiply(self.coordinates, 1/scale)
+
+
+class QCMask:
+
+    def __init__(
+        self,
+        mask: np.ndarray,
+        filter_threshold: float = 0.6,
+        is_roi: bool = False
+    ) -> None:
+
+        if not 0 <= filter_threshold <= 1:
+            raise ValueError('filter_threshold must be between 0 and 1')
+        if not isinstance(mask, np.ndarray):
+            raise ValueError('mask must be a numpy array')
+        if not len(mask.shape) == 2:
+            raise ValueError('mask must be a 2D array')
+        if not mask.dtype == bool:
+            raise ValueError('mask must be a boolean array')
+
+        self.mask = mask
+        self.is_roi = is_roi
+        self.filter_threshold = filter_threshold
+
+    def __repr__(self):
+        return f"<QCMask (shape={self.shape}), filter_threshold={self.filter_threshold}, is_roi={self.is_roi}>"
+
+    @property
+    def shape(self):
+        return self.mask.shape
+
+
+class Alignment:
+
+    def __init__(
+        self,
+        origin: Tuple[int, int],
+        scale: float,
+        coord: Optional[np.ndarray] = None
+    ) -> None:
+        self.origin = origin
+        self.scale = scale
+        self.coord = coord
+        self.centroid = None  # type: Tuple[float, float]
+        self.normal = None    # type: Tuple[float, float]
+
+    def __repr__(self):
+        return f"<Alignment (origin={self.origin}, coord={self.coord}, centroid={self.centroid}, normal={self.normal})>"
+
+    @classmethod
+    def from_fit(cls, origin, scale, centroid, normal):
+        obj = cls(origin, scale, None)
+        obj.centroid = centroid
+        obj.normal = normal
+        return obj
+
+    @classmethod
+    def from_translation(cls, origin, scale):
+        return cls(origin, scale, None)
+
+    @classmethod
+    def from_coord(cls, origin, coord):
+        return cls(origin, None, coord)
+
+    def save(self, path):
+        save_dict = dict(
+            origin=np.array(self.origin),
+            scale=np.array(self.scale)
+        )
+        if self.coord is not None:
+            save_dict['coord'] = self.coord
+        if self.centroid is not None:
+            save_dict['centroid'] = np.array(self.centroid)
+        if self.normal is not None:
+            save_dict['normal'] = np.array(self.normal)
+        np.savez(path, **save_dict)
+
+    @classmethod
+    def load(cls, path):
+        load_dict = np.load(path, allow_pickle=True)
+        origin = tuple(load_dict['origin'])
+        scale = load_dict['scale']
+        coord = load_dict['coord'] if 'coord' in load_dict else None
+        centroid = load_dict['centroid'] if 'centroid' in load_dict else None
+        normal = load_dict['normal'] if 'normal' in load_dict else None
+        obj = cls(origin, scale, coord)
+        obj.centroid = centroid
+        obj.normal = normal
+        return obj
+
+
+# -----------------------------------------------------------------------------
+# Functions
+
+def numpy2jpg(img: np.ndarray) -> str:
+    if img.shape[-1] == 4:
+        img = img[:, :, 0:3]
+    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
+    return cv2.imencode(".jpg", img)[1].tobytes()   # Default quality = 95%
+
+
+def numpy2png(img: np.ndarray) -> str:
+    if img.shape[-1] == 4:
+        img = img[:, :, 0:3]
+    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
+    return cv2.imencode(".png", img)[1].tobytes()
+
+
+
[docs]def predict( + slide: str, + model: str, + *, + stride_div: int = 1, + **kwargs +) -> Tuple[np.ndarray, Optional[np.ndarray]]: + """Generate a whole-slide prediction from a saved model. + + Args: + slide (str): Path to slide. + model (str): Path to saved model trained in Slideflow. + + Keyword args: + stride_div (int, optional): Divisor for stride when convoluting + across slide. Defaults to 1. + roi_dir (str, optional): Directory in which slide ROI is contained. + Defaults to None. + rois (list, optional): List of paths to slide ROIs. Alternative to + providing roi_dir. Defaults to None. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + batch_size (int, optional): Batch size for calculating predictions. + Defaults to 32. + num_threads (int, optional): Number of tile worker threads. Cannot + supply both ``num_threads`` (uses thread pool) and + ``num_processes`` (uses multiprocessing pool). Defaults to + CPU core count. + num_processes (int, optional): Number of child processes to spawn + for multiprocessing pool. Defaults to None (does not use + multiprocessing). + enable_downsample (bool, optional): Enable the use of downsampled + slide image layers. Defaults to True. + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. Defaults to 'auto'. + generator_kwargs (dict, optional): Keyword arguments passed to + the :meth:`slideflow.WSI.build_generator()`. + device (torch.device, optional): PyTorch device. Defaults to + initializing a new CUDA device. + + Returns: + np.ndarray: Predictions for each outcome, with shape = (num_classes, ) + + np.ndarray, optional: Uncertainty for each outcome, if the model was + trained with uncertainty, with shape = (num_classes,) + + """ + from slideflow import Heatmap + log.info("Calculating whole-slide prediction...") + heatmap = Heatmap(slide, model, generate=True, stride_div=stride_div, **kwargs) + assert heatmap.predictions is not None + preds = heatmap.predictions.reshape(-1, heatmap.predictions.shape[-1]) + preds = np.nanmean(preds, axis=0).filled() + if heatmap.uncertainty is not None: + unc = heatmap.uncertainty.reshape(-1, heatmap.uncertainty.shape[-1]) + unc = np.nanmean(unc, axis=0).filled() + return preds, unc + else: + return preds
+ + +def log_extraction_params(**kwargs) -> None: + """Log tile extraction parameters.""" + + if 'whitespace_fraction' not in kwargs: + ws_f = DEFAULT_WHITESPACE_FRACTION + else: + ws_f = kwargs['whitespace_fraction'] + if 'whitespace_threshold' not in kwargs: + ws_t = DEFAULT_WHITESPACE_THRESHOLD + else: + ws_t = kwargs['whitespace_threshold'] + if 'grayspace_fraction' not in kwargs: + gs_f = DEFAULT_GRAYSPACE_FRACTION + else: + gs_f = kwargs['grayspace_fraction'] + if 'grayspace_threshold' not in kwargs: + gs_t = DEFAULT_GRAYSPACE_THRESHOLD + else: + gs_t = kwargs['grayspace_threshold'] + + if 'normalizer' in kwargs: + log.info(f'Extracting tiles using [magenta]{kwargs["normalizer"]}[/] ' + 'normalization') + if ws_f < 1: + log.info('Filtering tiles by whitespace fraction') + excl = f'(exclude if >={ws_f*100:.0f}% whitespace)' + log.debug(f'Whitespace defined as RGB avg > {ws_t} {excl}') + if gs_f < 1: + log.info('Filtering tiles by grayspace fraction') + excl = f'(exclude if >={gs_f*100:.0f}% grayspace)' + log.debug(f'Grayspace defined as HSV avg < {gs_t} {excl}') + + +def draw_roi( + img: Union[np.ndarray, str], + coords: List[List[int]], + color: str = 'red', + linewidth: int = 5 +) -> np.ndarray: + """Draw ROIs on image. + + Args: + img (Union[np.ndarray, str]): Image. + coords (List[List[int]]): ROI coordinates. + + Returns: + np.ndarray: Image as numpy array. + """ + annPolys = [sg.Polygon(b) for b in coords] + if isinstance(img, np.ndarray): + annotated_img = Image.fromarray(img) + elif isinstance(img, str): + annotated_img = Image.open(io.BytesIO(img)) # type: ignore + else: + raise ValueError("Expected img to be a numpy array or bytes, got: {}".format( + type(img) + )) + draw = ImageDraw.Draw(annotated_img) + for poly in annPolys: + if poly.geom_type in ('MultiPolygon', 'GeometryCollection'): + for p in poly.geoms: + if p.is_empty or p.geom_type != 'Polygon': + continue + x, y = p.exterior.coords.xy + zipped = list(zip(x.tolist(), y.tolist())) + draw.line(zipped, joint='curve', fill=color, width=linewidth) + else: + x, y = poly.exterior.coords.xy + zipped = list(zip(x.tolist(), y.tolist())) + draw.line(zipped, joint='curve', fill=color, width=linewidth) + return np.asarray(annotated_img) + + +def roi_coords_from_image( + c: List[int], + args: SimpleNamespace +) -> Tuple[List[int], List[np.ndarray], List[List[int]]]: + # Scale ROI according to downsample level + extract_scale = (args.extract_px / args.full_extract_px) + + # Scale ROI according to image resizing + resize_scale = (args.tile_px / args.extract_px) + + def proc_coords(_coords): + # Offset coordinates to extraction window + _coords = np.add(_coords, np.array([-1 * c[0], -1 * c[1]])) + # Rescale according to downsampling and resizing + _coords = np.multiply(_coords, (extract_scale * resize_scale)) + return _coords + + # Filter out ROIs not in this tile + coords = [] + ll = np.array([0, 0]) + ur = np.array([args.tile_px, args.tile_px]) + for roi in args.rois: + coord = proc_coords(roi.coordinates) + idx = np.all(np.logical_and(ll <= coord, coord <= ur), axis=1) + coords_in_tile = coord[idx] + if len(coords_in_tile) > 3: + coords += [coords_in_tile] + for hole in roi.holes.values(): + hole_coord = proc_coords(hole.coordinates) + hole_idx = np.all(np.logical_and(ll <= hole_coord, hole_coord <= ur), axis=1) + hole_coords_in_tile = hole_coord[hole_idx] + if len(hole_coords_in_tile) > 3: + coords += [hole_coords_in_tile] + + # Convert outer ROI to bounding box that fits within tile + boxes = [] + yolo_anns = [] + for coord in coords: + max_vals = np.max(coord, axis=0) + min_vals = np.min(coord, axis=0) + max_x = min(max_vals[0], args.tile_px) + max_y = min(max_vals[1], args.tile_px) + min_x = max(min_vals[0], 0) + min_y = max(0, min_vals[1]) + width = (max_x - min_x) / args.tile_px + height = (max_y - min_y) / args.tile_px + x_center = ((max_x + min_x) / 2) / args.tile_px + y_center = ((max_y + min_y) / 2) / args.tile_px + yolo_anns += [[x_center, y_center, width, height]] + boxes += [np.array([ + [min_x, min_y], + [min_x, max_y], + [max_x, max_y], + [max_x, min_y] + ])] + return coords, boxes, yolo_anns + + +def xml_to_csv(path: str) -> str: + """Create a QuPath format CSV ROI file from an ImageScope-format XML. + + ImageScope-formatted XMLs are expected to have "Region" and "Vertex" + attributes. The "Region" attribute should have an "ID" sub-attribute. + + Args: + path (str): ImageScope XML ROI file path + + Returns: + str: Path to new CSV file. + + Raises: + slideflow.errors.ROIError: If the XML could not be converted. + """ + tree = ET.parse(path) + root = tree.getroot() + new_csv_file = path[:-4] + '.csv' + required_attributes = ['.//Region', './/Vertex'] + if not all(root.findall(a) for a in required_attributes): + raise errors.ROIError( + f"No ROIs found in the XML file {path}. Check that the XML " + "file attributes are named correctly named in ImageScope " + "format with 'Region' and 'Vertex' tags." + ) + with open(new_csv_file, 'w', newline='') as csvfile: + csvwriter = csv.writer(csvfile) + csvwriter.writerow(['ROI_name', 'X_base', 'Y_base']) + for region in root.findall('.//Region'): + id_tag = region.get('Id') + if not id_tag: + raise errors.ROIError( + "No ID attribute found for Region. Check xml file and " + "ensure it adheres to ImageScope format." + ) + roi_name = 'ROI_' + str(id_tag) + vertices = region.findall('.//Vertex') + if not vertices: + raise errors.ROIError( + "No Vertex found in ROI. Check xml file and ensure it " + "adheres to ImageScope format." + ) + csvwriter.writerows([ + [roi_name, vertex.get('X'), vertex.get('Y')] + for vertex in vertices + ]) + return new_csv_file + + +def get_scaled_and_intersecting_polys( + polys: "sg.Polygon", + tile: "sg.Polygon", + scale: float, + origin: Tuple[int, int] +): + """Get scaled and intersecting polygons for a given tile. + + Args: + polys (sg.Polygon): Shapely polygon containing union of all ROIs, in + base dimensions. + tile (sg.Polygon): Shapely polygon representing the tile, in base + dimensions. + scale (float): Scale factor. + full_stride (int): Full stride, indictating the number of pixels in + between each tile. + grid_idx (Tuple[int, int]): Grid index of the tile (x, y). + + Returns: + sg.Polygon: ROI polygons intersecting the tile, scaled to the tile + size and with the origin with reference to the tile. + + """ + topleft, topright = origin + A = polys.intersection(tile) + + # Translate polygons so the intersection origin is at (0, 0) + B = sa.translate(A, -topleft, -topright) + + # Scale to the target tile size + C = sa.scale(B, xfact=scale, yfact=scale, origin=(0, 0)) + return C + + +def _align_to_matrix(im1: np.ndarray, im2: np.ndarray, warp_matrix: np.ndarray) -> np.ndarray: + """Align an image to a warp matrix.""" + # Use the warpAffine function to apply the transformation + return cv2.warpAffine(im1, warp_matrix, (im2.shape[1], im2.shape[0]), flags=cv2.INTER_LINEAR + cv2.WARP_INVERSE_MAP) + + +def _find_translation_matrix( + im1: np.ndarray, + im2: np.ndarray, + *, + denoise: bool = True, + h: float = 30, + block_size: int = 7, + search_window: int = 21, + n_iterations: int = 10000, + termination_eps = 1e-10, + warp_matrix: Optional[np.ndarray] = None +) -> np.ndarray: + """ + Align two images using only scaling and translation. + + :param im1: The image to be aligned. + :param im2: The reference image. + :return: Aligned image of im1. + """ + # Convert images to grayscale + im1_gray = cv2.cvtColor(im1, cv2.COLOR_BGR2GRAY) + im2_gray = cv2.cvtColor(im2, cv2.COLOR_BGR2GRAY) + + # De-noising + if denoise: + im1_gray = cv2.fastNlMeansDenoising(im1_gray, None, h, block_size, search_window) + im2_gray = cv2.fastNlMeansDenoising(im2_gray, None, h, block_size, search_window) + + # Transform images to normalize contrast + im1_gray = cv2.equalizeHist(im1_gray) + im2_gray = cv2.equalizeHist(im2_gray) + + # Define the motion model + warp_mode = cv2.MOTION_TRANSLATION + + # Define 2x3 matrix to store the transformation + if warp_matrix is None: + warp_matrix = np.eye(2, 3, dtype=np.float32) + + # Set the number of iterations and termination criteria + criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, n_iterations, termination_eps) + + # Use findTransformECC to compute the transformation + _, warp_matrix = cv2.findTransformECC(im2_gray, im1_gray, warp_matrix, warp_mode, criteria) + + return warp_matrix # type: ignore + + +def align_image(im1: np.ndarray, im2: np.ndarray) -> np.ndarray: + """ + Align two images using only scaling and translation. + + :param im1: The image to be aligned. + :param im2: The reference image. + :return: Aligned image of im1. + """ + warp_matrix = _find_translation_matrix(im1, im2) + return _align_to_matrix(im1, im2, warp_matrix) + + +def align_by_translation( + im1: np.ndarray, + im2: np.ndarray, + round: bool = False, + calculate_mse: bool = False, + **kwargs +) -> Union[Union[Tuple[float, float], Tuple[int, int]], + Tuple[Union[Tuple[float, float], Tuple[int, int]], float]]: + """ + Find the (x, y) translation that aligns im1 to im2. + + Args: + im1 (np.ndarray): Target for alignment. + im2 (np.ndarray): Image to align. + round (bool): Round to the nearest int. Defaults to False. + calculate_mse (bool): Return the mean squared error (MSE) of alignment. + Defaults to False. + + """ + try: + warp_matrix = _find_translation_matrix(im1, im2, **kwargs) + except cv2.error: + raise errors.AlignmentError( + "Could not align images. Check that the images are the same " + "size, that they are not rotated or flipped, and that they have " + "overlapping regions." + ) + alignment = -warp_matrix[0, 2], -warp_matrix[1, 2] + if round: + alignment = (int(np.round(alignment[0])), int(np.round(alignment[1]))) + + if calculate_mse: + aligned_im1 = _align_to_matrix(im1, im2, warp_matrix) + mse = compute_alignment_mse(aligned_im1, im2) + return alignment, mse + else: + return alignment + + +def compute_alignment_mse( + imageA: np.ndarray, + imageB: np.ndarray, + flatten: bool = True +) -> float: + """ + Compute the Mean Squared Error between two images in their overlapping region, + excluding areas that are black (0, 0, 0) in either image. + + :param imageA: First image. + :param imageB: Second image. + :return: Mean Squared Error (MSE) between the images in the valid overlapping region. + """ + # Remove the alpha channel from both images + if flatten: + imageA = imageA[:, :, 0:3] + imageB = imageB[:, :, 0:3] + + assert imageA.shape == imageB.shape, "Image sizes must match." + + # Create a combined mask where neither of the images is black + combined_mask = np.logical_not(np.logical_or(imageA == 0, imageB == 0)) + + # Compute MSE only for valid regions + diff = (imageA.astype("float") - imageB.astype("float")) ** 2 + err = np.sum(diff[combined_mask]) / np.sum(combined_mask) + + return err + + +def best_fit_plane(points): + # Ensure the input is a numpy array + points = np.array(points) + + # 1. Center the data + centroid = points.mean(axis=0) + centered_points = points - centroid + + # 2. Compute the covariance matrix + cov_matrix = np.cov(centered_points, rowvar=False) + + # 3. Compute the eigenvalues and eigenvectors + eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix) + + # 4. Get the eigenvector corresponding to the smallest eigenvalue + normal_vector = eigenvectors[:, np.argmin(eigenvalues)] + + # The equation of the plane is `normal_vector . (x - centroid) = 0` + return centroid, normal_vector + + +def z_on_plane(x, y, centroid, normal): + cx, cy, cz = centroid + nx, ny, nz = normal + + if nz == 0: + raise ValueError("Normal vector's Z component is zero. Can't compute Z value for the given X, Y.") + + z = cz + (nx * (cx - x) + ny * (cy - y)) / nz + return z + + +def calc_alignment(c, us, them, n=None): + idx, (x, y, xi, yi) = c + our_tile = us[xi, yi] + try: + their_tile = them[xi, yi] + except IndexError: + return None, c + if our_tile is None or their_tile is None: + return None, c + if n is not None: + our_tile = n.transform(our_tile[:, :, 0:3]) + their_tile = n.transform(their_tile[:, :, 0:3]) + try: + rough_alignment = sf.slide.utils._find_translation_matrix(their_tile, our_tile, h=50, search_window=53) + except cv2.error: + rough_alignment = None + log.debug("Initial rough alignment failed at x={}, y={} (grid {}, {})".format( + x, y, xi, yi + )) + else: + log.debug("Initial rough alignment complete at x={}, y={} (grid {}, {}): {}".format( + x, y, xi, yi, (int(np.round(-rough_alignment[0, 2])), int(np.round(-rough_alignment[1, 2]))) + )) + try: + return align_by_translation(their_tile, our_tile, round=True, warp_matrix=rough_alignment), c + except errors.AlignmentError as e: + return 'error', c + +# ----------------------------------------------------------------------------- +# Internals + +def _update_kw_with_defaults(kwargs) -> Dict: + """Updates a set of keyword arguments with default extraction values. + for whitepsace/grayspace filtering. + """ + if kwargs['whitespace_fraction'] is None: + kwargs['whitespace_fraction'] = DEFAULT_WHITESPACE_FRACTION + if kwargs['whitespace_threshold'] is None: + kwargs['whitespace_threshold'] = DEFAULT_WHITESPACE_THRESHOLD + if kwargs['grayspace_fraction'] is None: + kwargs['grayspace_fraction'] = DEFAULT_GRAYSPACE_FRACTION + if kwargs['grayspace_threshold'] is None: + kwargs['grayspace_threshold'] = DEFAULT_GRAYSPACE_THRESHOLD + if kwargs['img_format'] is None: + kwargs['img_format'] = 'jpg' + return kwargs + + +def _polyArea(x: List[float], y: List[float]) -> float: + return 0.5*np.abs(np.dot(x, np.roll(y, 1))-np.dot(y, np.roll(x, 1))) + + +def _convert_img_to_format(image: np.ndarray, img_format: str) -> str: + if img_format.lower() == 'png': + return cv2.imencode( + '.png', + cv2.cvtColor(image, cv2.COLOR_RGB2BGR) + )[1].tobytes() + elif img_format.lower() in ('jpg', 'jpeg'): + return cv2.imencode( + '.jpg', + cv2.cvtColor(image, cv2.COLOR_RGB2BGR), + [int(cv2.IMWRITE_JPEG_QUALITY), 100] + )[1].tostring() + else: + raise ValueError(f"Unknown image format {img_format}") +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/slide/wsi/index.html b/docs/_modules/slideflow/slide/wsi/index.html new file mode 100644 index 000000000..28a9ba915 --- /dev/null +++ b/docs/_modules/slideflow/slide/wsi/index.html @@ -0,0 +1,3834 @@ + + + + + + + + + + + + slideflow.slide.wsi — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.slide.wsi

+'''This module includes tools to convolutionally section whole slide images
+into tiles. These tessellated tiles can be exported as PNG or JPG as raw
+images or stored in the binary format TFRecords, with or without augmentation.'''
+
+from __future__ import absolute_import, division, print_function
+
+
+import time
+import os
+import csv
+import json
+import multiprocessing as mp
+import random
+import warnings
+import cv2
+import numpy as np
+import pandas as pd
+import rasterio.features
+import shapely.affinity as sa
+import skimage
+import skimage.filters
+from shapely import __version__ as shapely_version
+from shapely.errors import ShapelyDeprecationWarning
+from packaging import version
+from PIL import Image, ImageDraw
+from rich.progress import Progress
+from skimage import img_as_ubyte
+from slideflow import errors
+from functools import partial
+from os.path import exists, join, abspath
+from types import SimpleNamespace
+from typing import Any, Callable, Dict, List, Optional, Tuple, Union, Sequence
+
+import slideflow as sf
+import slideflow.slide.qc
+from slideflow.util import log, path_to_name  # noqa F401
+from .report import SlideReport
+from .utils import *
+from .backends import tile_worker, backend_formats, wsi_reader
+
+
+warnings.simplefilter('ignore', Image.DecompressionBombWarning)
+warnings.simplefilter("ignore", ShapelyDeprecationWarning)
+Image.MAX_IMAGE_PIXELS = 100000000000
+
+# -----------------------------------------------------------------------
+
+
[docs]class WSI: + '''Loads a slide and its annotated region of interest (ROI).''' + + def __init__( + self, + path: str, + tile_px: int, + tile_um: Union[int, str], + stride_div: int = 1, + *, + enable_downsample: bool = True, + roi_dir: Optional[str] = None, + rois: Optional[List[str]] = None, + roi_method: str = 'auto', + roi_filter_method: Union[str, float] = 'center', + origin: Union[str, Tuple[int, int]] = (0, 0), + pb: Optional[Progress] = None, + verbose: bool = True, + use_edge_tiles: bool = False, + mpp: Optional[float] = None, + simplify_roi_tolerance: Optional[float] = None, + artifact_labels: Optional[List[str]] = None, + **reader_kwargs: Any + ) -> None: + """Loads slide and ROI(s). + + Args: + path (str): Path to slide. + tile_px (int): Size of tiles to extract, in pixels. + tile_um (int or str): Size of tiles to extract, in microns (int) or + magnification (str, e.g. "20x"). + stride_div (int, optional): Stride divisor for tile extraction + (1 = no tile overlap; 2 = 50% overlap, etc). Defaults to 1. + enable_downsample (bool, optional): Allow use of downsampled + intermediate layers in the slide image pyramid, which greatly + improves tile extraction speed. May result in artifacts for + slides with incompletely generated intermediates pyramids. + Defaults to True. + roi_dir (str, optional): Directory in which to search for ROI CSV + files. Defaults to None. + rois (list(str)): Alternatively, a list of ROI paths can be + explicitly provided. Defaults to None. + roi_method (str): Either 'inside', 'outside', 'auto', or 'ignore'. + Determines how ROIs are used to extract tiles. + If 'inside' or 'outside', will extract tiles in/out of an ROI, + and raise errors.MissingROIError if an ROI is not available. + If 'auto', will extract tiles inside an ROI if available, + and across the whole-slide if no ROI is found. + If 'ignore', will extract tiles across the whole-slide + regardless of whether an ROI is available. + Defaults to 'auto'. + roi_filter_method (str or float): Method of filtering tiles with + ROIs. Either 'center' or float (0-1). If 'center', tiles are + filtered with ROIs based on the center of the tile. If float, + tiles are filtered based on the proportion of the tile inside + the ROI, and ``roi_filter_method`` is interpreted as a + threshold. If the proportion of a tile inside the ROI is + greater than this number, the tile is included. For example, + if ``roi_filter_method=0.7``, a tile that is 80% inside of an + ROI will be included, and a tile that is 50% inside of an ROI + will be excluded. Defaults to 'center'. + origin (str or tuple(int, int)): Offset the starting grid (x, y). + Either a tuple of ints or 'random'. Defaults to (0, 0). + pb (:class:`Progress`, optional): Multiprocessing + capable Progress instance; will update progress bar during + tile extraction if provided. + verbose (bool, optional): Controls verbosity of output. If False, + suppresses warnings about slide skipping when ROIs are missing. + Defaults to True. + mpp (float, optional): Override the microns-per-pixel value for + the slide. Defaults to None (auto-detects). + ignore_missing_mpp (bool, optional): If a slide does not have + microns-per-pixel (MPP) information stored in EXIF data + (key 65326), set the MPP to a default value + (``sf.slide.DEFAULG_JPG_MPP``). If False and MPP data is + missing, raises ``sf.errors.SlideMissingMPPError``. + use_bounds (bool): If True, use the slide bounds to determine + the slide dimensions. This will crop out unscanned white space. + If a tuple of int, interprets the bounds as ``(top_left_x, + top_left_y, width, height)``. If False, use the full slide + dimensions. **Only available when using Libvips** + (``SF_SLIDE_BACKEND=libvips``). Defaults to False. + transforms (list(int), optional): List of transforms to apply to + the slide before establishing coordinate grid. Options include + any combination of ``ROTATE_90_CLOCKWISE``, + ``ROTATE_180_CLOCKWISE``, ``ROTATE_270_CLOCKWISE``, + ``FLIP_HORIZONTAL``, and ``FLIP_VERTICAL``. **Only available + when using Libvips** (``SF_SLIDE_BACKEND=libvips``). + Defaults to None. + artifact_labels (list(str), optional): List of ROI issue labels + to treat as artifacts. Whenever this is not None, all the ROIs with + referred label will be inverted with ROI.invert(). + Defaults to an empty list. + + """ + # Initialize calculated variables + self.pb = pb + self.name = path_to_name(path) + self.shortname = sf.util._shortname(self.name) + self.tile_px = tile_px + self.enable_downsample = enable_downsample + self.thumb_image = None # type: Optional[Image.Image] + self.stride_div = stride_div + self.path = path + self.filetype = sf.util.path_to_ext(path) + self.blur_burden = None # type: Optional[float] + self.roi_method = None # type: Optional[str] + self.extracted_x_size = 0 # type: int + self.extracted_y_size = 0 # type: int + self.estimated_num_tiles = 0 # type: int + self.rois = [] # type: List[ROI] # List of individual ROI annotations + self.roi_method = roi_method + self.roi_grid = None # type: Optional[np.ndarray] + self.roi_filter_method = roi_filter_method + self.qc_masks = [] # type: List[QCMask] + self.alignment = None # type: Optional[Alignment] + self.verbose = verbose + self.segmentation = None + self.use_edge_tiles = use_edge_tiles + self.__slide = None + self._mpp_override = mpp + self._reader_kwargs = reader_kwargs + self.grid: np.ndarray + self.artifact_labels = artifact_labels # type: Optional[List[str]] + if self.artifact_labels is None: + self.artifact_labels = [] + + if isinstance(origin, str) and origin != 'random': + raise ValueError( + "Unrecognized value for argument 'origin': {} ." + "Expected either 'random' or a tuple of ints.".format(origin) + ) + if isinstance(origin, tuple) and len(origin) != 2: + raise ValueError( + "If 'origin' is a tuple, it must be of length 2." + ) + self.origin = origin + + if (not isinstance(roi_filter_method, (int, float)) + and roi_filter_method != 'center'): + raise ValueError( + "Unrecognized value for argument 'roi_filter_method': {} ." + "Expected either float or 'center'.".format(roi_filter_method) + ) + if (isinstance(roi_filter_method, (int, float)) + and (roi_filter_method < 0 or roi_filter_method > 1)): + raise ValueError( + "If 'roi_filter_method' is a float, it must be between 0-1." + ) + + if rois is not None and not isinstance(rois, (list, tuple)): + rois = [rois] + + # Initiate supported slide reader + if not os.path.exists(path): + raise errors.SlideNotFoundError(f"Could not find slide {path}.") + if self.filetype.lower() not in sf.util.SUPPORTED_FORMATS: + raise errors.SlideLoadError( + f"{self.name}: unsupported filetype '{self.filetype}'" + ) + if self.filetype.lower() not in backend_formats(): + raise errors.IncompatibleBackendError( + f"{self.name}: filetype '{self.filetype}' is not supported " + f"by the current backend, {sf.slide_backend()}" + ) + + # Collect basic slide information + if not self.slide.has_mpp: + raise errors.SlideMissingMPPError( + f"Slide {self.path} missing MPP ({OPS_MPP_X})" + ) + try: + self.mpp = float(self.slide.mpp) + except Exception as e: + raise errors.SlideMissingMPPError( + f"Unable to parse MPP for slide {self.path} ({OPS_MPP_X}). " + f"Error raised: {e}" + ) + + # Configure downsample information + self._configure_downsample(tile_um) + + # Look in ROI directory if available + if roi_dir and exists(join(roi_dir, self.name + ".csv")): + self.load_csv_roi( + join(roi_dir, self.name + ".csv"), + process=False, + simplify_tolerance=simplify_roi_tolerance + ) + elif rois and self.name in [path_to_name(r) for r in rois]: + matching_rois = [] + for rp in rois: + rn = path_to_name(rp) + if rn == self.name: + matching_rois += [rp] + matching = matching_rois[0] + if len(matching_rois) > 1: + log.warning( + f"Multiple ROIs found for {self.name}; using {matching}" + ) + self.load_csv_roi( + matching, + process=False, + simplify_tolerance=simplify_roi_tolerance + ) + + # Handle missing ROIs + if (not len(self.rois) + and roi_method != 'ignore' + and not (rois or roi_dir)): + # No ROIs found because the user did not provide rois or roi_dir, + # but the roi_method is not set to 'ignore', + # indicating that this may be user error. + warn_msg = f"No ROIs provided for {self.name}" + if verbose and not (rois is None and roi_dir is None): + log.warning(warn_msg) + else: + log.debug(warn_msg) + if not len(self.rois) and roi_method in ('inside', 'outside'): + raise errors.MissingROIError( + f"Slide [green]{self.name}[/] missing ROI." + ) + elif not len(self.rois): + info_msg = f"No ROI for {self.name}, using whole slide." + if verbose and roi_method == 'auto': + log.info(info_msg) + else: + log.debug(info_msg) + elif len(self.rois) and roi_method == 'auto': + log.debug(f"Slide {self.name}: extracting tiles from inside ROI.") + self.roi_method = 'inside' + + # Build coordinate grid + self.process_rois() + + # Summarize slide information + self._log_slide_summary() + + def __repr__(self) -> str: + base = "WSI(\n" + base += " path = {!r},\n".format(self.path) + base += " tile_px = {!r},\n".format(self.tile_px) + base += " tile_um = {!r},\n".format(self.tile_um) + base += " stride_div = {!r},\n".format(self.stride_div) + base += " enable_downsample = {!r},\n".format(self.enable_downsample) + base += " roi_method = {!r},\n".format(self.roi_method) + base += ")" + return base + + def __getitem__(self, index) -> Optional[np.ndarray]: + """Returns a tile at the given index. + + Args: + index (tuple): (x, y) grid coordinates of tile to extract. + + Returns: + Optional[numpy.ndarray]: Image tile, or None if tile is filtered. + + """ + # Verify indices are valid + if (not isinstance(index, (tuple, list, np.ndarray)) + or not len(index) == 2): + raise IndexError("Must supply exactly two indices: (x, y)") + if not (index[0] < self.shape[0]): + raise IndexError( + "index {} is out of bounds for axis 0 with size {}".format( + index[0], + self.shape[0] + ) + ) + if not (index[1] < self.shape[1]): + raise IndexError( + "index {} is out of bounds for axis 0 with size {}".format( + index[1], + self.shape[1] + ) + ) + + # Find the corresponding coordinate given the provided indices. + coord_idx, = np.where(( + (self.coord[:, 2] == index[0]) + & (self.coord[:, 3] == index[1]) + )) + if not len(coord_idx): + return None + assert len(coord_idx) == 1 + x, y, grid_x, grid_y = self.coord[coord_idx[0]] + + # Check if indices correspond to a tile that is filtered out, + # either by ROI or QC. If so, return None. + if not self.grid[grid_x, grid_y]: + return None + + # Extract the numpy image at this grid location. + image_dict = tile_worker( + (x, y, grid_x, grid_y), + SimpleNamespace( + full_extract_px=self.full_extract_px, + mpp_override=self._mpp_override, + reader_kwargs=self._reader_kwargs, + grid=self.grid, + downsample_level=self.downsample_level, + path=self.path, + extract_px=self.extract_px, + tile_px=self.tile_px, + full_stride=self.full_stride, + normalizer=None, + whitespace_fraction=1, + whitespace_threshold=1, + grayspace_fraction=1, + grayspace_threshold=1, + img_format='numpy', + yolo=False, + draw_roi=False, + dry_run=False, + has_segmentation=False, + ) + ) + return image_dict['image'] + + def __getstate__(self): + state = self.__dict__.copy() + # Remove the unpicklable entries. + if '__slide' in state: + state['__slide'] = None + if '_WSI__slide' in state: + state['_WSI__slide'] = None + if 'pb' in state: + state['pb'] = None + return state + + def __setstate__(self, state): + self.__dict__.update(state) + + def _rasterize_rois_to_grid( + self, + rois: List["ROI"], + x_offset: float = 0, + y_offset: float = 0, + xfact: float = 1., + yfact: float = 1., + *, + grid_scale: int = 1, + invert: bool = False + ) -> np.ndarray: + """Rasterize ROIs to the size of the tile extraction grid. + + Args: + rois (List[ROI]): ROIs to rasterize. + x_offset (float): Offset to align the ROI polygons with the image tile grid. + y_offset (float): Offset to align the ROI polygons with the image tile grid. + xfact (float): Scaling factor along x dimension. + yfact (float): Scaling factor along y dimension. + + Keyword Args: + grid_scale (int): Scaling factor for the grid. Defaults to 1. + invert (bool): Whether to invert the ROI. Defaults to False. + + Returns: + Optional[np.ndarray]: Rasterized ROIs. + + """ + def _get_poly(_roi): + if invert: + return _roi.invert(*self.dimensions).poly + else: + return _roi.poly + + # Convert ROIs to polygons. + polys = list(map(_get_poly, rois)) + + # Translate and scale. + if x_offset or y_offset: + polys = [sa.translate(poly, x_offset, y_offset) for poly in polys] + if xfact != 1 or yfact != 1: + polys = self._scale_polys(polys, xfact * grid_scale, yfact * grid_scale) + + # Rasterize polygons to the size of the tile extraction grid. + return self._rasterize_polys( + polys, + grid_scale=grid_scale, + intersection=('min' if invert else 'max') + ) + + def _rasterize_polys( + self, + polys: List["sg.Polygon"], + *, + grid_scale: float = 1, + intersection: str = 'max' + ) -> np.ndarray: + """Rasterize polygons to the size of the tile extraction grid. + + Args: + polys (List[sg.Polygon]): Polygons to rasterize. + + Keyword args: + scale (float): Scaling factor for the grid. + Defaults to 1. + intersection (str): Method for combining multiple polygons. + Either 'max' or 'min'. 'max' yields the union of the polygons, + 'min' yields the intersection. Defaults to 'max'. + + Returns: + np.ndarray: Rasterized polygons. + """ + # Rasterize polygons for ROIs individually, to keep track of + # which ROI each tile belongs to, then merge. + roi_grid = np.stack([ + rasterio.features.rasterize( + [poly], + out_shape=(self.grid.shape[1] * grid_scale, + self.grid.shape[0] * grid_scale), + all_touched=False).astype(bool).astype(int) * (i + 1) + for i, poly in enumerate(polys) + ], axis=0) + if intersection == 'max': + return roi_grid.max(axis=0).T + elif intersection == 'min': + return roi_grid.min(axis=0).T + else: + raise ValueError( + f"Unrecognized value for 'intersection': {intersection}" + ) + + def _scale_polys( + self, + polys: List["sg.Polyon"], + xfact: float, + yfact: float, + ): + """Scale polygons. + + Args: + polys (List[sg.Polygon]): Polygons to scale. + xfact (float): Scaling factor along x dimension. + yfact (float): Scaling factor along y dimension. + + Returns: + List[sg.Polygon]: Scaled polygons. + + """ + return [ + sa.scale(poly, xfact=xfact, yfact=yfact, origin=(0, 0)) + for poly in polys + ] + + def _build_coord(self) -> None: + """Set up coordinate grid for image tiles. + + The coordinate grid, stored in ``self.coord``, is a list of lists, + where each sublist contains the following information: + + - 0: **x**: x-coordinate of the top-left corner of the tile. + - 1: **y**: y-coordinate of the top-left corner of the tile. + - 2: **grid_x**: x-coordinate of the tile in self.grid. + - 3: **grid_y**: y-coordinate of the tile in self.grid. + + """ + + # First, remove any existing ROI QC Masks, as these will be recalculated + # when the coordinate grid is rebuilt. + self.remove_roi_qc() + + # Calculate window sizes, strides, and coordinates for windows + self.extracted_x_size = self.dimensions[0] - self.full_extract_px + self.extracted_y_size = self.dimensions[1] - self.full_extract_px + + # Randomize origin, if desired + if self.origin == 'random': + start_x = random.randint(0, self.full_stride-1) + start_y = random.randint(0, self.full_stride-1) + else: + assert isinstance(self.origin, tuple) + start_x, start_y = self.origin + log.debug("Slide origin: ({}, {})".format(start_x, start_y)) + + # Coordinates must be in level 0 (full) format + # for the read_region function. + # Coordinates correspond to top-left corner of the tile. + self.coord = [] # type: Union[List, np.ndarray] + edge_buffer = 0 if self.use_edge_tiles else self.full_extract_px + y_range = np.arange( + start_y, + (self.dimensions[1]+1) - edge_buffer, + self.full_stride + ) + x_range = np.arange( + start_x, + (self.dimensions[0]+1) - edge_buffer, + self.full_stride + ) + + self.grid = np.ones((len(x_range), len(y_range)), dtype=bool) + + # For any indexes in y_range or x_range corresponding to a negative value, + # set the corresponding index in self.grid to False. + # This may occur after slide alignment. + self.grid[np.argwhere(x_range < 0), :] = False + self.grid[:, np.argwhere(y_range < 0)] = False + + # ROI filtering + roi_by_center = (self.roi_filter_method == 'center') + if self.has_rois(): + + # Full extraction size and stride + full_extract = self.tile_um / self.mpp + stride = full_extract / self.stride_div + + # Coverage size of the extracted image tiles + xtrim = int(stride * (self.grid.shape[0])) # type: ignore + ytrim = int(stride * (self.grid.shape[1])) # type: ignore + + # Degree to which the ROIs will need to be scaled + # to match the extracted image tile grid + xfact = self.grid.shape[0] / xtrim + yfact = self.grid.shape[1] / ytrim + + # Offset to align the ROI polygons with the image tile grid + x_offset = - (full_extract/2 - stride/2) + y_offset = - (full_extract/2 - stride/2) + + # Separate ROIs by whether they are artifact or not + rois = self.get_rois(ignore_artifact=True) + artifacts = self.get_artifacts() + + # Prepare ROI rasterization arguments + rasterize_kw = dict( + x_offset=x_offset, + y_offset=y_offset, + xfact=xfact, + yfact=yfact, + grid_scale=(1 if roi_by_center else 50), + ) + + # Rasterize ROIs to the grid + if len(rois): + self.roi_grid = self._rasterize_rois_to_grid(rois, **rasterize_kw) + else: + self.roi_grid = None + + # If there are artifact ROIs, rasterize these to the grid + # and subtract them from the main ROI grid. + if len(artifacts): + roi_grid_issues = self._rasterize_rois_to_grid(artifacts, invert=True, **rasterize_kw) + if self.roi_grid is None: + self.roi_grid = roi_grid_issues + else: + self.roi_grid = np.minimum(roi_grid_issues, self.roi_grid) + + # Create a merged boolean mask. + self.roi_mask = self.roi_grid.T.astype(bool) # type: ignore + else: + self.roi_mask = None + + for yi, y in enumerate(y_range): + for xi, x in enumerate(x_range): + y = int(y) + x = int(x) + + # Skip the slide if the coordinate has a negative value. + # This may happen after slide alignment. + if x < 0 or y < 0: + continue + + self.coord.append([x, y, xi, yi]) + + # ROI filtering + if self.has_rois() and roi_by_center: + point_in_roi = self.roi_mask[yi, xi] + # If the extraction method is 'inside', + # skip the tile if it's not in an ROI + if (((self.roi_method in ('inside', 'auto')) and not point_in_roi) + or ((self.roi_method == 'outside') and point_in_roi)): + self.grid[xi, yi] = 0 + + # If roi_filter_method is a float, then perform tile selection + # based on what proportion of the tile is in an ROI, + # rather than choosing a tile by centroid (roi_filter_method='center') + if self.roi_method != 'ignore' and self.has_rois() and not roi_by_center: + self.apply_qc_mask( + (~self.roi_mask if self.roi_method == 'inside' else self.roi_mask), + filter_threshold=(1-self.roi_filter_method), # type: ignore + is_roi=True + ) + + self.coord = np.array(self.coord) + # Handle the case where there is only one tile + if self.coord.ndim == 1 and self.coord.shape[0] > 0: + self.coord = self.coord[np.newaxis, :] + self.estimated_num_tiles = int(self.grid.sum()) + log.debug(f"Set up coordinate grid, shape={self.grid.shape}") + + def _configure_downsample( + self, + tile_um: Union[str, int], + enable_downsample: bool = True + ) -> None: + """Configure downsample level for tile extraction. + + Args: + tile_um (int or str): Size of tiles to extract, in microns (int) or + magnification (str, e.g. "20x"). + enable_downsample (bool, optional): Allow use of downsampled + intermediate layers in the slide image pyramid, which greatly + improves tile extraction speed. May result in artifacts for + slides with incompletely generated intermediates pyramids. + Defaults to True. + + """ + # Calculate downsample by magnification + if isinstance(tile_um, str): + sf.util.assert_is_mag(tile_um) + _mag_lvl = 10 / (np.array(self.slide.level_downsamples) * self.mpp) + mag_levels = _mag_lvl.tolist() + closest_mag = min( + mag_levels, + key=lambda x: abs(x - sf.util.to_mag(tile_um)) # type: ignore + ) + if abs(closest_mag - sf.util.to_mag(tile_um)) > 2: + raise errors.SlideLoadError( + f"{self.name}: Could not find magnification level " + f"matching {tile_um} (closest: {closest_mag:.1f})" + ) + ds_level = mag_levels.index(closest_mag) + if not enable_downsample and ds_level != 0: + raise ValueError(f"Unable to use magnification {tile_um} with " + "enable_downsample=False") + self.downsample_factor = self.slide.level_downsamples[ds_level] + self.extract_px = self.tile_px + self.full_extract_px = int(self.downsample_factor * self.tile_px) + self.tile_um = int(self.downsample_factor * self.mpp * self.tile_px) + log.debug(f"Using magnification {closest_mag:.1f}x (level=" + f"{ds_level}, tile_um={self.tile_um})") + + # Calculate downsample level by tile micron size + else: + assert isinstance(tile_um, int) + self.tile_um = tile_um + self.full_extract_px = int(tile_um / self.mpp) + ds = self.full_extract_px / self.tile_px + if enable_downsample: + ds_level = self.slide.best_level_for_downsample(ds) + else: + ds_level = 0 + self.downsample_factor = self.slide.level_downsamples[ds_level] + self.extract_px = self.full_extract_px // self.downsample_factor + + # Calculate filter dimensions (low magnification for filtering out + # white background and performing edge detection) + self.filter_dimensions = self.slide.level_dimensions[-1] + self.filter_magnification = (self.filter_dimensions[0] + / self.dimensions[0]) + self.filter_px = int(self.full_extract_px * self.filter_magnification) + + # Calculate shape and stride + self.downsample_level = ds_level + self.downsample_dimensions = self.slide.level_dimensions[ds_level] + self.stride = int(np.round(self.extract_px / self.stride_div)) + self.full_stride = int(np.round(self.full_extract_px / self.stride_div)) + + def _log_slide_summary(self) -> None: + """Log slide information (MPP, ROIs, grid shape, number of tiles).""" + mpp_roi_msg = f'{self.mpp} um/px | {len(self.rois)} ROI(s)' + size_msg = f'Size: {self.dimensions[0]} x {self.dimensions[1]}' + log.debug(f"{self.shortname}: Slide info: {mpp_roi_msg} | {size_msg}") + grid_msg = f"{self.shortname}: Grid shape: {self.grid.shape} " + grid_msg += f"| Tiles to extract: {self.estimated_num_tiles}" + log.debug(grid_msg) + + def _log_tile_extraction(self) -> None: + """Log tile extraction parameters.""" + lead_msg = f'Extracting {self.tile_um}um tiles' + if self.extract_px != self.tile_px: + resize_msg = f'(resizing {self.extract_px}px -> {self.tile_px}px)' + else: + resize_msg = f'({self.extract_px}px, not resizing)' + stride_msg = f'stride: {int(self.stride)}px' + log.debug(f"{self.shortname}: {lead_msg} {resize_msg}; {stride_msg}") + if self.tile_px > self.extract_px: + ups_msg = 'Tiles will be up-scaled with bilinear interpolation' + ups_amnt = f'({self.extract_px}px -> {self.tile_px}px)' + warn = f"[red]'!WARN!'[/]" + log.warn(f"{self.shortname}: {warn} {ups_msg} {ups_amnt}") + + @property + def dimensions(self) -> Tuple[int, int]: + """Dimensions of highest-magnification level (width, height)""" + return self.slide.dimensions + + @property + def levels(self) -> Dict: + """List of dict, with metadata for each level. + + Each dict has the keys 'dimensions', 'downsample', 'height', and 'weight'. + + - **'dimensions'**: (height, width) of the level. + - **'downsample'**: Downsample level, where higher numbers indicate + lower magnification and the highest magnification is 1. + - **`height'**: Height of the level. + - **`height'**: Width of the level. + + """ + return self.slide.levels + + @property + def level_dimensions(self) -> List[List[int]]: + """List of list, with dimensions for each slide level.""" + return self.slide.level_dimensions + + @property + def level_downsamples(self) -> List[float]: + """Downsample of each level (starts at 1, increases with lower mag).""" + return self.slide.level_downsamples + + @property + def level_mpp(self) -> List[float]: + """Microns-per-pixel (MPP) for each level.""" + return [d * self.mpp for d in self.level_downsamples] + + @property + def properties(self) -> Dict: + """Dictionary of metadata loaded from the slide.""" + return self.slide.properties + + @property + def vendor(self) -> Optional[str]: + """Slide scanner vendor, if available.""" + if OPS_VENDOR in self.slide.properties: + return self.slide.properties[OPS_VENDOR] + else: + return None + + @property + def shape(self): + """Returns the shape of the tile grid.""" + return self.grid.shape + + @property + def slide(self) -> Any: + """Backend-specific slide object.""" + if self.__slide is not None: + return self.__slide + try: + self.__slide = wsi_reader( + self.path, + self._mpp_override, + **self._reader_kwargs) + return self.__slide # type: ignore + except errors.SlideMissingMPPError: + raise + except Exception as e: + raise errors.SlideLoadError( + f"Error loading slide {self.shortname}: {e}" + ) + + @property + def qc_mask(self) -> Optional[np.ndarray]: + """Returns union of all QC masks.""" + return self.get_qc_mask() + + # --- Alignment -------------------------------------------------------- + + def align_to( + self, + slide: "WSI", + apply: bool = True, + *, + finetune_depth: Optional[Sequence[float]] = None, + normalizer: Optional[str] = 'reinhard_mask', + allow_errors: bool = False + ) -> Tuple[Tuple[int, int], float]: + """Align this slide to another slide. + + Alignment is performed by first aligning thumbnails at low magnification + (mpp = 8), then progressively fine-tuning alignment at increasing + magnification (mpp = 1, 0.5, 0.25), focused on a dense tissue region. + The densest tissue region is identified using the QC mask, if available, + otherwise via Otsu thresholding. + + Args: + slide (:class:`slideflow.WSI`): Slide to align to. + apply (bool): Whether to apply the alignment to the slide. + + Keyword Args: + finetune_depth (Optional[List[int]]): List of magnifications at + which to fine-tune alignment. Defaults to [1, 0.5, 0.25]. + normalizer (str, optional): Stain normalization method to use. + Defaults to 'reinhard_mask'. + allow_errors (bool): Whether to allow and ignore alignment errors + when finetuning at higher magnification. Defaults to False. + + Returns: + Tuple of (x, y) offset and MSE of initial alignment. + + Raises: + TypeError: If ``slide`` is not a :class:`slideflow.WSI` object. + + AlignmentError: If initial, thumbnail-based alignment fails, or + if finetuning alignment fails at any magnification and + ``allow_errors`` is False. + + """ + from scipy import ndimage + + if not isinstance(slide, WSI): + raise TypeError("Can only align to another slide.") + + if finetune_depth is None: + finetune_depth = [1, 0.5, 0.25] + + # Steps: + # 1. Identify tissue region as target for alignment. + # 2. Rough align with low-mag thumbnails (mpp = 8). + # 3. Fine-tune alignment at a dense tissue region (mpp = 1, 0.5, 0.25). + + # --- 1. Identify tissue regions as targets for alignment. ------------ + + # Use QC mask (.qc_mask) if available, otherwise calculate one. + # Target should be the centroid of unmasked tissue regions, but + # there may be multiple distinct tissue regions. + + # First, grab the QC mask, or make one if it is not available. + if self.qc_mask is not None: + mask = self.qc_mask + else: + log.debug("Applying Otsu thresholding to identify tissue regions.") + mask = sf.slide.qc.Otsu()(self) + + # Next, fill holes and remove small peaks through gaussian blur, + # thresholding, and morphological closing. + log.debug("Filling holes and removing small peaks in tissue mask.") + mask = skimage.morphology.binary_closing( + skimage.filters.gaussian(mask, sigma=5) > 0.5, + skimage.morphology.disk(5) + ) + + # For each pixel in the mask, calculate the nearest distance to an + # unmasked pixel. This will assist us with finding the densest areas + # of tissue. + log.debug("Calculating distance transform of tissue mask.") + distances = ndimage.distance_transform_edt(~mask) + + # Find the coordinates of the pixel with the highest average distance. + # This is the center of the densest tissue region. + log.debug("Identifying target for alignment.") + target = np.unravel_index(np.argmax(distances), distances.shape) + + # Convert from mask coordinates to slide coordinates. + target = ( + int(target[1] * (self.dimensions[0] / mask.shape[1])), + int(target[0] * (self.dimensions[1] / mask.shape[0])) + ) + target_them = ( + int(np.round(target[0] * (self.mpp / slide.mpp))), + int(np.round(target[1] * (self.mpp / slide.mpp))) + ) + log.debug("Low-mag alignment complete.") + log.debug("Target for alignment (us): {}".format(target)) + log.debug("Target for alignment (them, pre-alignment): {}".format(target_them)) + + # --- 2. Align low-mag thumbnails. ------------------------------------ + + # Calculate thumbnails for alignment. + log.debug("Calculating low-mag thumbnails for alignment.") + our_thumb = np.array(self.thumb(mpp=8)) + their_thumb = np.array(slide.thumb(mpp=8)) + + # Stain normalization + if normalizer is not None: + log.debug("Aligning with stain normalization: {}".format(normalizer)) + if isinstance(normalizer, str): + norm = sf.norm.autoselect(normalizer, backend='opencv') + elif isinstance(normalizer, sf.norm.StainNormalizer): + norm = normalizer + else: + raise ValueError("normalizer must be a str or instance of StainNormalizer") + our_thumb = norm.transform(our_thumb[:, :, 0:3]) + their_thumb = norm.transform(their_thumb[:, :, 0:3]) + + # Align thumbnails and adjust for scale. + try: + log.debug("Aligning low-mag thumbnails (mpp=8)...") + alignment_raw, mse = align_by_translation( + their_thumb, our_thumb, round=True, calculate_mse=True + ) + except errors.AlignmentError: + raise errors.AlignmentError("Alignment failed at thumbnail (mpp=8)") + alignment = (int(np.round(alignment_raw[0] * (8 / self.mpp))), + int(np.round(alignment_raw[1] * (8 / self.mpp)))) + alignment_them = (-int(np.round(alignment_raw[0] * (8 / slide.mpp))), + -int(np.round(alignment_raw[1] * (8 / slide.mpp)))) + + log.debug("Low-mag alignment (us): {}".format(alignment)) + log.debug("Low-mag alignment (them): {}".format(alignment_them)) + + # --- 3. Fine-tune alignment at tissue regions. ----------------------- + + # Get the coordinates of the tissue region in both slides. + for finetune_mpp in finetune_depth: + if (finetune_mpp < self.mpp) or (finetune_mpp < slide.mpp): + log.debug("Skipping finetune at mpp={}".format(finetune_mpp)) + continue + # Us + our_window_size = ( + int(np.round(512 * (finetune_mpp/self.mpp))), + int(np.round(512 * (finetune_mpp/self.mpp))) + ) + our_top_left = ( + int(np.round(target[0] - (our_window_size[0]/2))), + int(np.round(target[1] - (our_window_size[1]/2))) + ) + log.debug("Extracting mpp={} alignment window (ours) at window_size={}, top_left={}".format( + finetune_mpp, our_window_size, our_top_left) + ) + our_region = self.slide.read_from_pyramid( + top_left=our_top_left, + window_size=our_window_size, + target_size=(512, 512), + convert='numpy', + flatten=True, + pad_missing=True + ) + # Them + their_window_size = ( + int(np.round(512 * (finetune_mpp/slide.mpp))), + int(np.round(512 * (finetune_mpp/slide.mpp))) + ) + their_top_left = ( + int(np.round(target_them[0] - (their_window_size[0]/2))) + alignment_them[0], + int(np.round(target_them[1] - (their_window_size[1]/2))) + alignment_them[1] + ) + log.debug("Extracting mpp={} alignment window (theirs) at window_size={}, top_left={}".format( + finetune_mpp, their_window_size, their_top_left) + ) + their_region = slide.slide.read_from_pyramid( + top_left=their_top_left, + window_size=their_window_size, + target_size=(512, 512), + convert='numpy', + flatten=True, + pad_missing=True + ) + + if normalizer is not None: + our_region = norm.transform(our_region[:, :, 0:3]) + their_region = norm.transform(their_region[:, :, 0:3]) + + try: + rough_alignment = sf.slide.utils._find_translation_matrix(their_region, our_region, h=50, search_window=53) + except cv2.error: + rough_alignment = None + log.debug("Initial rough alignment failed at mpp={}".format(finetune_mpp)) + else: + log.debug("Initial rough alignment complete at mpp={}".format(finetune_mpp)) + + # Finetune alignment on this region. + try: + alignment_fine = align_by_translation(their_region, our_region, round=True, warp_matrix=rough_alignment) + except errors.AlignmentError: + msg = "Alignment failed at finetuning (mpp={})".format(finetune_mpp) + if allow_errors: + log.error(msg) + else: + raise errors.AlignmentError(msg) + else: + alignment = ( + alignment[0] + int(np.round(alignment_fine[0] * (finetune_mpp/self.mpp))), + alignment[1] + int(np.round(alignment_fine[1] * (finetune_mpp/self.mpp))) + ) + alignment_them = ( + alignment_them[0] - int(np.round(alignment_fine[0] * (finetune_mpp/slide.mpp))), + alignment_them[1] - int(np.round(alignment_fine[1] * (finetune_mpp/slide.mpp))) + ) + log.debug("Finetune alignment complete at mpp={}.".format(finetune_mpp)) + log.debug("Finetuned alignment (us) at mpp={}: {}".format(finetune_mpp, alignment)) + log.debug("Finetuned alignment (them) at mpp={}: {}".format(finetune_mpp, alignment_them)) + + # If not applying alignment, return the base alignment and MSE. + if not apply: + log.info("Slide aligned with MSE {:.2f}".format(mse)) + return alignment, mse # type: ignore + + # Apply alignment. + self.origin = alignment + self.alignment = Alignment.from_translation( + origin=self.slide.coord_to_raw(*alignment), + scale=(slide.mpp / self.mpp), + ) + log.info("Slide aligned with MSE {:.2f}. Origin set to {}".format( + mse, self.origin + )) + + # Rebuild coordinates and reapply QC, if present. + self._build_coord() + if self.has_non_roi_qc(): + self.apply_qc_mask() + + return alignment, mse # type: ignore + + def align_tiles_to( + self, + slide: "WSI", + normalizer: Optional[str] = 'reinhard_mask', + *, + allow_errors: bool = True, + mask_on_fail: bool = True, + align_by: str = 'fit', + ignore_outliers = True, + num_workers: Optional[int] = None, + **kwargs + ) -> np.ndarray: + """Align tiles to another slide. + + Differs from :meth:`slideflow.WSI.align_to` in that it aligns each + tile individually, rather than the slide as a whole. This is useful + when aligning slides with distortion, whose alignment may drift across + the slide. + + Args: + slide (:class:`slideflow.WSI`): Slide to align to. + normalizer (str, optional): Stain normalization method to use. + + Keyword Args: + allow_errors (bool): Whether to allow and ignore alignment errors + when finetuning alignment fails at any magnification and + ``allow_errors`` is False. Defaults to True. + mask_on_fail (bool): Whether to mask tiles that fail alignment. + Defaults to True. + align_by (str): Either 'tile' or 'fit'. If 'tile', tiles are + aligned individually. If 'fit', tiles are aligned by fitting + a plane to the alignment of all tiles. Defaults to 'tile'. + ignore_outliers (bool): Whether to ignore outliers when fitting + a plane to tile alignment. Defaults to True. + **kwargs: Keyword arguments passed to :meth:`slideflow.WSI.align_to`. + + Raises: + ValueError: If ``align_by`` is not 'tile' or 'fit'. + + Returns: + np.ndarray: Alignment grid, with shape = (grid_x, grid_y, 2). + + """ + if align_by not in ('tile', 'fit'): + raise ValueError("align_by must be 'tile' or 'median'") + + # Stain normalizer. + if normalizer is not None: + if isinstance(normalizer, str): + normalizer = sf.norm.autoselect(normalizer, backend='opencv') + elif not isinstance(normalizer, sf.norm.StainNormalizer): + raise ValueError("normalizer must be a str or instance of StainNormalizer") + + # Perform coarse alignment. + self.align_to( + slide, apply=True, normalizer=normalizer, allow_errors=allow_errors, **kwargs + ) + + # Finetune alignment at each tile location. + from tqdm import tqdm + + ctx = mp.get_context('spawn') if sf.slide_backend() == 'libvips' else mp.get_context('fork') + pool = ctx.Pool(num_workers or sf.util.num_cpu()) + + alignment_coords = np.zeros((self.coord.shape[0], 2)) + half_extract_px = int(np.round(self.full_extract_px/2)) + idx_to_remove = [] + for tile_alignment, c in tqdm(pool.imap_unordered( + partial(calc_alignment, + us=self, + them=slide, + n=normalizer), + enumerate(self.coord)), + desc="Aligning tiles...", + total=len(self.coord)): + idx, (x, y, xi, yi) = c + if tile_alignment == 'error': + msg = "Tile alignment failed at x={}, y={} (grid {}, {})".format( + x, y, xi, yi + ) + if allow_errors: + log.debug(msg) + tile_alignment = None + else: + raise errors.AlignmentError(msg) + if tile_alignment is None and mask_on_fail and align_by == 'tile': + self.grid[xi, yi] = False + idx_to_remove += [idx] + elif tile_alignment is None: + idx_to_remove += [idx] + if tile_alignment is not None: + pixel_ratio = (self.full_extract_px / self.tile_px) + x_adjust = int(np.round(tile_alignment[0] * pixel_ratio)) + y_adjust = int(np.round(tile_alignment[1] * pixel_ratio)) + x_base, y_base = self.slide.coord_to_raw( + x + half_extract_px, + y + half_extract_px + ) + x_base_adjusted, y_base_adjusted = self.slide.coord_to_raw( + x + half_extract_px + x_adjust, + y + half_extract_px + y_adjust + ) + x_base_adjustment = x_base_adjusted - x_base + y_base_adjustment = y_base_adjusted - y_base + alignment_coords[idx] = np.array([x_base_adjustment, y_base_adjustment]) + log.debug("Tile alignment complete at x={}, y={} (grid {}, {}): adjust by {}, {}".format( + x, y, xi, yi, x_adjust, y_adjust + )) + + pool.close() + + coord_mask = np.any(self.get_masked_coord().mask, 1) + coord_mask[np.array(idx_to_remove).astype(int)] = True + mask = np.repeat(coord_mask[:, None], 2, axis=1) + all_alignment_coords = np.ma.masked_array(alignment_coords, mask=mask) # type: ignore + coord_raw = self.slide.coord_to_raw( + self.coord[~coord_mask][:, 0] + half_extract_px, + self.coord[~coord_mask][:, 1] + half_extract_px + ) + log.debug("Removing {} indices with failed alignment. Max coord size: {}".format(len(idx_to_remove), len(self.coord))) + + if align_by == 'fit': + log.debug("Fitting to {} coordinates.".format((~coord_mask).sum())) + x_adjustment_coordinates = np.column_stack(( + coord_raw[0], + coord_raw[1], + all_alignment_coords[~coord_mask][:, 0], + )) + y_adjustment_coordinates = np.column_stack(( + coord_raw[0], + coord_raw[1], + all_alignment_coords[~coord_mask][:, 1], + )) + + def build_aligned_coords(x_centroid, x_normal, y_centroid, y_normal): + coord_on_plane = np.zeros((len(self.coord), 2), dtype=int) + coord_on_plane = np.ma.masked_array(coord_on_plane, mask=mask) + for idx, (x, y, xi, yi) in enumerate(self.coord): + # Convert coordinates to raw base layer coordinates + bx, by = self.slide.coord_to_raw( + x + half_extract_px, + y + half_extract_px + ) + # Align to raw base layer coordinates + coord_on_plane[idx] = ( + int(np.round(z_on_plane(bx, by, x_centroid, x_normal))), + int(np.round(z_on_plane(bx, by, y_centroid, y_normal))) + ) + return coord_on_plane + + x_centroid, x_normal = best_fit_plane(x_adjustment_coordinates) + y_centroid, y_normal = best_fit_plane(y_adjustment_coordinates) + fit_alignment = build_aligned_coords(x_centroid, x_normal, y_centroid, y_normal) + + if ignore_outliers: + # Calculate outlier threshold (90th percentile) + diff = np.abs(all_alignment_coords - fit_alignment) + diff = np.max(diff, axis=-1) + threshold = np.percentile(diff[~diff.mask].data, 90) + all_alignment_coords.mask[diff > threshold] = True + coord_mask[diff > threshold] = True + fit_alignment.mask = all_alignment_coords.mask + log.debug("Re-fitting to {} coordinates, ignoring outliers.".format((~coord_mask).sum())) + + coord_raw = self.slide.coord_to_raw( + self.coord[~coord_mask][:, 0] + half_extract_px, + self.coord[~coord_mask][:, 1] + half_extract_px + ) + + # Recalculate fit without outliers + x_adjustment_coordinates = np.column_stack(( + coord_raw[0], + coord_raw[1], + all_alignment_coords[~coord_mask][:, 0], + )) + y_adjustment_coordinates = np.column_stack(( + coord_raw[0], + coord_raw[1], + all_alignment_coords[~coord_mask][:, 1], + )) + + x_centroid, x_normal = best_fit_plane(x_adjustment_coordinates) + y_centroid, y_normal = best_fit_plane(y_adjustment_coordinates) + + all_alignment_coords = build_aligned_coords(x_centroid, x_normal, y_centroid, y_normal) + else: + all_alignment_coords = fit_alignment + + self.alignment = Alignment.from_fit( + origin=self.slide.coord_to_raw(*self.origin), + scale=(slide.mpp / self.mpp), + centroid=(x_centroid, y_centroid), + normal=(x_normal, y_normal) + ) + + for idx, (x, y, xi, yi) in enumerate(self.coord): + if np.ma.is_masked(all_alignment_coords[idx][0]): + continue + + bx, by = self.slide.coord_to_raw( + x + half_extract_px, + y + half_extract_px + ) + x, y = self.slide.raw_to_coord( + bx + all_alignment_coords[idx][0], + by + all_alignment_coords[idx][1] + ) + self.coord[idx, 0] = x - half_extract_px + self.coord[idx, 1] = y - half_extract_px + + # Delete tiles that failed to align. + if idx_to_remove and align_by == 'tile': + log.warning("Removing {} tiles that failed to align.".format(len(idx_to_remove))) + self.coord = np.delete(self.coord, idx_to_remove, axis=0) + + if align_by != 'fit': + self.alignment = Alignment.from_coord( + origin=self.slide.coord_to_raw(*self.origin), + scale=(slide.mpp / self.mpp), + coord=self.coord + ) + + log.info("Slide alignment complete and finetuned at each unmasked tile location.") + + return all_alignment_coords + + def apply_alignment(self, alignment: Alignment) -> None: + """Apply alignment to the slide. + + Args: + alignment (slideflow.slide.Alignment): Alignment object. + + """ + self.alignment = alignment + self.origin = self.slide.raw_to_coord(*alignment.origin) + if alignment.coord is not None: + self.coord = alignment.coord + elif alignment.centroid is None: + self._build_coord() + if self.qc_mask is not None: + self.apply_qc_mask() + else: + self._build_coord() + if self.qc_mask is not None: + self.apply_qc_mask() + if alignment.centroid is not None: + x_centroid, y_centroid = alignment.centroid + x_normal, y_normal = alignment.normal + half_extract_px = int(np.round(self.full_extract_px/2)) + for idx, (x, y, xi, yi) in enumerate(self.coord): + x = (xi * int(np.round(self.full_stride/alignment.scale))) * alignment.scale + y = (yi * int(np.round(self.full_stride/alignment.scale))) * alignment.scale + x += self.origin[0] + y += self.origin[1] + bx, by = self.slide.coord_to_raw( + x + half_extract_px, + y + half_extract_px + ) + adjust_x = int(np.round(z_on_plane(bx, by, x_centroid, x_normal))) + adjust_y = int(np.round(z_on_plane(bx, by, y_centroid, y_normal))) + x, y = self.slide.raw_to_coord(bx + adjust_x, by + adjust_y) + self.coord[idx, 0] = x - half_extract_px + self.coord[idx, 1] = y - half_extract_px + + def load_alignment(self, path: str) -> None: + """Load alignment from a file. + + Args: + path (str): Path to alignment file. + + """ + self.apply_alignment(Alignment.load(path)) + + # --- All other functions ----------------------------------------------- + + def apply_qc_mask( + self, + mask: Optional[Union[np.ndarray, QCMask]] = None, + filter_threshold: Optional[float] = None, + *, + is_roi: bool = False + ) -> "Image": + """Apply custom slide-level QC by filtering grid coordinates. + + The mask should have a shape (height, width) proportional to the + slide's dimensions. + + If the mask is numerical, the mask is thresholded at filter_threshold, + with values above the threshold indicating a region to discard. + + If the mask is a boolean array, True indicates a region to + discard and False indicates a region to keep. + + If the mask is a QCMask, the filter_threshold is ignored. + + Args: + mask (np.ndarray or :class:`slideflow.slide.QCMask`, optional): + Boolean QC mask array or ``QCMask`` object. If None, will + re-apply the current masks. Defaults to None. + filter_threshold (float): Percent of a tile detected as + background that will trigger a tile to be discarded. + Only used if ``mask`` is an np.ndarray. + Defaults to 0.6. + + Keyword Args: + is_roi (bool): Whether the mask is an ROI mask. Only used if ``mask`` + is an ``np.ndarray``. Defaults to False. + + Returns: + Image: Image of applied QC mask. + """ + # If no mask is provided and none has been previously applied, + # raise an error. + if mask is None and not len(self.qc_masks): + raise errors.QCError("No QC mask available") + + # If no mask provided, re-apply the current masks. + if mask is None: + for qc_mask in self.qc_masks: + self.apply_qc_mask(qc_mask) + return Image.fromarray(img_as_ubyte(self.qc_mask)) + + # Verify that the mask is a np.ndarray or QCMask. + if not isinstance(mask, (np.ndarray, QCMask)): + raise TypeError("mask must be a np.ndarray or QCMask") + + # Set the filter threshold if not provided. + # If mask is a QCMask, use its filter_threshold. + # Otherwise, default to 0.6. + if not isinstance(mask, QCMask) and filter_threshold is None: + filter_threshold = 0.6 + elif filter_threshold is not None and isinstance(mask, QCMask): + raise ValueError( + "filter_threshold cannot be provided if mask is a QCMask" + ) + elif filter_threshold is None: + filter_threshold = mask.filter_threshold # type: ignore + + # If the provided mask is an np.ndarray, convert it to a QCMask. + if not isinstance(mask, QCMask): + mask = QCMask(mask, filter_threshold=filter_threshold, is_roi=is_roi) # type: ignore + self.qc_masks.append(mask) + + # Apply the mask to the grid. + downsample = self.dimensions[0] / mask.shape[1] + qc_ratio = 1 / downsample + qc_width = int(np.round(self.full_extract_px * qc_ratio)) + for x, y, xi, yi in self.coord: # type: ignore + # x and y are top-left coordinates for the tile. + qc_x = int(np.round(x * qc_ratio)) + qc_y = int(np.round(y * qc_ratio)) + submask = mask.mask[qc_y:(qc_y+qc_width), qc_x:(qc_x+qc_width)] + if (submask.size > 0) and (np.mean(submask) > filter_threshold): + self.grid[xi, yi] = 0 + + # Update the estimated number of tiles. + self.estimated_num_tiles = int(self.grid.sum()) + + # Return an image of the applied mask. + return Image.fromarray(img_as_ubyte(self.qc_mask)) + + def apply_segmentation(self, segmentation: "sf.cellseg.Segmentation") -> None: + """Apply cell segmentation to the slide. + + This sets the coordinates to the centroids of the segmentation. + + Args: + segmentation (slideflow.cellseg.Segmentation): Segmentation object + to apply. + + """ + # Filter out masks outside of ROIs, if present. + if self.has_rois(): + log.debug(f"Applying {len(self.rois)} ROIs to segmentation.") + rois = self.get_rois(ignore_artifact=True) + segmentation.apply_rois(1, [r.poly for r in rois]) + + if segmentation.slide is None: + segmentation.slide = self + self.segmentation = segmentation + centroids = segmentation.centroids(wsi_dim=True) + self.seg_coord = np.concatenate( + (centroids, np.expand_dims(np.arange(centroids.shape[0]), axis=-1)), + axis=-1) + nonzero = self.seg_coord[:, 0] > 0 + self.seg_coord[:, 0:2][nonzero] -= int(self.full_extract_px/2) + self.estimated_num_tiles = centroids.shape[0] + + def area(self) -> float: + """Calculate area (mm^2) of slide that passes QC masking.""" + dim_x, dim_y = self.dimensions[0], self.dimensions[1] + total_area_in_sq_microns = (dim_x * self.mpp) * (dim_y * self.mpp) + if self.qc_mask is not None: + s = self.qc_mask.shape + p = 1 - (self.qc_mask.sum() / (s[0] * s[1])) + area_in_sq_microns = p * total_area_in_sq_microns + else: + area_in_sq_microns = total_area_in_sq_microns + area_in_sq_mm = area_in_sq_microns * 1e-6 + return area_in_sq_mm + + def build_generator( + self, + *, + shuffle: bool = True, + whitespace_fraction: float = None, + whitespace_threshold: float = None, + grayspace_fraction: float = None, + grayspace_threshold: float = None, + normalizer: Optional[Union[str, "slideflow.norm.StainNormalizer"]] = None, + normalizer_source: str = None, + context_normalize: bool = False, + num_threads: Optional[int] = None, + num_processes: Optional[int] = None, + show_progress: bool = False, + img_format: str = 'numpy', + full_core: bool = False, + yolo: bool = False, + draw_roi: bool = False, + pool: Optional["mp.pool.Pool"] = None, + dry_run: bool = False, + lazy_iter: bool = False, + shard: Optional[Tuple[int, int]] = None, + max_tiles: Optional[int] = None, + from_centroids: bool = False, + apply_masks: bool = True, + deterministic: bool = True + ) -> Optional[Callable]: + """Builds a tile generator to extract tiles from this slide. + + Keyword args: + shuffle (bool): Shuffle images during extraction. + whitespace_fraction (float, optional): Range 0-1. Defaults to 1. + Discard tiles with this fraction of whitespace. If 1, will not + perform whitespace filtering. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. If 1, will not + perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this threshold are + considered grayspace. + normalizer (str, optional): Normalization strategy to use on image + tiles. Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + context_normalize (bool): If normalizing, use context from + the rest of the slide when calculating stain matrix + concentrations. Defaults to False (normalize each image tile + as separate images). + num_threads (int): If specified, will extract tiles with a + ThreadPool using the specified number of threads. Cannot + supply both `num_threads` and `num_processes`. Libvips is + particularly slow with ThreadPools. Defaults to None in the + Libvips backend, and the number of CPU cores when using cuCIM. + num_processes (int): If specified, will extract tiles with a + multiprocessing pool using the specified number of processes. + Cannot supply both `num_threads` and `num_processes`. + With the libvips backend, this defaults to half the number of + CPU cores, and with cuCIM, this defaults to None. + show_progress (bool, optional): Show a progress bar. + img_format (str, optional): Image format. Either 'numpy', 'jpg', + or 'png'. Defaults to 'numpy'. + yolo (bool, optional): Include yolo-formatted tile-level ROI + annotations in the return dictionary, under the key 'yolo'. + Defaults to False. + draw_roi (bool, optional): Draws ROIs onto extracted tiles. + Defaults to False. + dry_run (bool, optional): Determine tiles that would be extracted, + but do not export any images. Defaults to None. + max_tiles (int, optional): Only extract this many tiles per slide. + Defaults to None. + from_centroids (bool): Extract tiles from cell segmentation + centroids, rather than in a grid-wise pattern. Requires that + cell segmentation has already been applied with + `WSI.apply_segmentation()`. Defaults to False. + apply_masks (bool): Apply cell segmentation masks to tiles. Ignored + if cell segmentation has been applied to the slide. + Defaults to True. + deterministic (bool): Return tile images in reproducible, + deterministic order. May slightly decrease iteration time. + Defaults to True. + shard (tuple(int, int), optional): If provided, will only extract + tiles from the shard with index `shard[0]` out of `shard[1]` + shards. Defaults to None. + + Returns: + A generator that yields a dictionary with the keys: + + - ``"image"``: image data. + - ``"yolo"``: yolo-formatted annotations, (x_center, y_center, width, height), optional. + - ``"grid"``: (x, y) grid coordinates of the tile. + - ``"loc"``: (x, y) coordinates of tile center, in base (level=0) dimension. + + """ + if (isinstance(num_threads, int) + and isinstance(num_processes, int) + and num_threads > 1 + and num_processes > 1): + raise ValueError("num_threads and num_processes cannot both be " + "non-zero.") + if (shard is not None + and (not isinstance(shard, (tuple, list)) + or len(shard) != 2 + or any(not isinstance(s, int) for s in shard))): + raise ValueError("If shard is provided, it must be a tuple of " + "two int (shard_idx, shard_count)") + + if from_centroids and self.segmentation is None: + raise ValueError( + "Cannot build generator from segmentation centroids; " + "segmentation not yet applied. Use WSI.apply_segmentation()." + ) + + self._log_tile_extraction() + if self.estimated_num_tiles == 0: + log.warning(f"No tiles extracted for slide [green]{self.name}") + return None + + # Set whitespace / grayspace fraction to defaults if not provided + if whitespace_fraction is None: + whitespace_fraction = DEFAULT_WHITESPACE_FRACTION + if whitespace_threshold is None: + whitespace_threshold = DEFAULT_WHITESPACE_THRESHOLD + if grayspace_fraction is None: + grayspace_fraction = DEFAULT_GRAYSPACE_FRACTION + if grayspace_threshold is None: + grayspace_threshold = DEFAULT_GRAYSPACE_THRESHOLD + + # Get information about highest level downsample, as we will filter + # on that layer if downsampling is enabled + if self.enable_downsample: + downsamples = np.array(self.slide.level_downsamples) + filter_lev = np.max(np.argwhere(downsamples < self.extract_px)) + filter_downsample_factor = self.slide.level_downsamples[filter_lev] + lev_ds = self.slide.level_downsamples[self.downsample_level] + filter_downsample_ratio = filter_downsample_factor // lev_ds + else: + filter_lev = self.downsample_level + filter_downsample_ratio = 1 + + # Prepare stain normalization + if normalizer and not isinstance(normalizer, sf.norm.StainNormalizer): + if sf.slide_backend() == 'cucim': + normalizer = sf.norm.autoselect( # type: ignore + method=normalizer, + source=normalizer_source + ) + else: + # Libvips with spawn multiprocessing + # is not compatible with Tensorflow-native stain normalization + # due to GPU memory issues + normalizer = sf.norm.StainNormalizer(normalizer) # type: ignore + if normalizer_source is not None: + normalizer.fit(normalizer_source) # type: ignore + + if normalizer and context_normalize: + assert isinstance(normalizer, sf.norm.StainNormalizer) + log.debug("Preparing whole-slide context for normalizer") + normalizer.set_context(self) + + w_args = SimpleNamespace(**{ + 'full_extract_px': self.full_extract_px, + 'mpp_override': self._mpp_override, + 'reader_kwargs': self._reader_kwargs, + 'grid': self.grid, + 'downsample_level': self.downsample_level, + 'filter_downsample_level': filter_lev, + 'filter_downsample_ratio': filter_downsample_ratio, + 'path': self.path, + 'extract_px': self.extract_px, + 'tile_px': self.tile_px, + 'full_stride': self.full_stride, + 'normalizer': normalizer, + 'whitespace_fraction': whitespace_fraction, + 'whitespace_threshold': whitespace_threshold, + 'grayspace_fraction': grayspace_fraction, + 'grayspace_threshold': grayspace_threshold, + 'img_format': img_format, + 'yolo': yolo, + 'draw_roi': draw_roi, + 'dry_run': dry_run, + 'has_segmentation': from_centroids + }) + + def generator(): + nonlocal pool, num_threads, num_processes + should_close = False + n_extracted = 0 + + # Skip tiles filtered out with QC or ROI + if not from_centroids: + non_roi_coord = self.coord[ + self.grid[tuple(self.coord[:, 2:4].T)].astype(bool) + ] + # Shuffle coordinates to randomize extraction order + if shuffle: + np.random.shuffle(non_roi_coord) + num_possible_tiles = len(non_roi_coord) + else: + from slideflow.cellseg import seg_utils + + log.info("Building generator from segmentation centroids.") + nonzero = self.seg_coord[:, 0] > 0 + num_possible_tiles = nonzero.sum() + if apply_masks: + sparse = seg_utils.sparse_mask(self.segmentation.masks) + + def _sparse_generator(): + + def proc(c): + mask = None if not apply_masks else self.get_tile_mask(c[2], sparse) + return c, mask + + if shuffle: + for idx in np.random.permutation(self.seg_coord.shape[0]): + if nonzero[idx]: + yield proc(self.seg_coord[idx]) + else: + for c in self.seg_coord[nonzero]: + yield proc(c) + + non_roi_coord = _sparse_generator() + + if shard is not None: + shard_idx, shard_count = shard + sharded_coords = np.array_split(non_roi_coord, shard_count) + non_roi_coord = sharded_coords[shard_idx] + + # Set up worker pool + if pool is None: + if num_threads is None and num_processes is None: + # Libvips is extremely slow with ThreadPools. + # In the cuCIM backend, ThreadPools are used by default + # to reduce memory utilization. + # In the Libvips backend, a multiprocessing pool is default + # to significantly improve performance. + n_cores = sf.util.num_cpu(default=8) + if sf.slide_backend() == 'libvips': + num_processes = max(int(n_cores/2), 1) + else: + num_threads = n_cores + if num_threads is not None and num_threads > 1: + log.debug(f"Building generator ThreadPool({num_threads})") + pool = mp.dummy.Pool(processes=num_threads) + should_close = True + elif num_processes is not None and num_processes > 1: + ptype = 'spawn' if sf.slide_backend() == 'libvips' else 'fork' + log.debug(f"Building generator with Pool({num_processes}), " + f"type={ptype}") + ctx = mp.get_context(ptype) + pool = ctx.Pool( + processes=num_processes, + initializer=sf.util.set_ignore_sigint, + ) + should_close = True + else: + log.debug(f"Building generator without multithreading") + def _generator(): + for c in non_roi_coord: + yield tile_worker(c, args=w_args) + i_mapped = _generator() + else: + log.debug("Building generator with a shared pool") + if show_progress: + pbar = Progress(transient=sf.getLoggingLevel() > 20) + task = pbar.add_task('Extracting...', total=self.estimated_num_tiles) + pbar.start() + else: + pbar = None + + if pool is not None: + map_fn = pool.imap if deterministic else pool.imap_unordered + if lazy_iter: + if max_tiles: + batch_size = min(pool._processes, max_tiles) + else: + batch_size = pool._processes + batched_coord = sf.util.batch(non_roi_coord, batch_size) + def _generator(): + for batch in batched_coord: + yield from map_fn( + partial(tile_worker, args=w_args), + batch + ) + i_mapped = _generator() + + else: + csize = max(min(int(self.estimated_num_tiles/pool._processes), 64), 1) + log.debug(f"Using imap chunksize={csize}") + i_mapped = map_fn( + partial(tile_worker, args=w_args), + non_roi_coord, + chunksize=csize + ) + + with sf.util.cleanup_progress(pbar): + for e, result in enumerate(i_mapped): + if show_progress: + pbar.advance(task, 1) + elif self.pb is not None: + self.pb.advance(0) + if result is None: + continue + else: + yield result + n_extracted += 1 + if max_tiles and n_extracted >= max_tiles: + break + + if should_close: + pool.close() + + # Reset stain normalizer context + if normalizer and context_normalize: + assert isinstance(normalizer, sf.norm.StainNormalizer) + normalizer.clear_context() + + name_msg = f'[green]{self.shortname}[/]' + num_msg = f'({n_extracted} tiles of {num_possible_tiles} possible)' + log_fn = log.info if self.verbose else log.debug + log_fn(f"Finished tile extraction for {name_msg} {num_msg}") + + return generator + + def coord_to_grid( + self, + x: int, + y: int, + *, + anchor: str = 'center' + ) -> Tuple[int, int]: + """Find the grid index of a tile by its base-level coordinates. + + Args: + x (int): x-coordinate of the tile, in base (level=0) dimension. + y (int): y-coordinate of the tile, in base (level=0) dimension. + + Keyword args: + anchor (str): Anchor point for the coordinates. Either 'topleft' + or 'center'. Defaults to 'center'. + + Returns: + Tuple[int, int]: Grid index of the tile. + + Raises: + ValueError: If anchor is not 'topleft' or 'center'. + IndexError: If tile is not found at the given coordinates. + + """ + if anchor not in ('topleft', 'center'): + raise ValueError("anchor must be 'topleft' or 'center'") + if anchor == 'center': + x -= int(self.full_extract_px/2) + y -= int(self.full_extract_px/2) + coord_idx, = np.where(( + (self.coord[:, 0] == x) + & (self.coord[:, 1] == y) + )) + if not len(coord_idx): + raise IndexError(f"Tile at coord=({x}, {y}) not found") + assert len(coord_idx) == 1 + x, y, grid_x, grid_y = self.coord[coord_idx[0]] + return grid_x, grid_y + + def dim_to_mpp(self, dimensions: Tuple[float, float]) -> float: + return (self.dimensions[0] * self.mpp) / dimensions[0] + + def export_rois(self, dest: Optional[str] = None) -> str: + """Export loaded ROIs to a given destination, in CSV format. + + ROIs are exported with the columns 'roi_name', 'x_base', and 'y_base'. + Coordinates are in base dimension (level 0) of the slide. + + Args: + dest (str): Path to destination folder. If not provided, will + export ROIs in the current folder. Defaults to None. + + Returns: + None + + """ + names, labels, x, y = [], [], [], [] + + def append_roi(roi): + nonlocal names, labels, x, y + c = np.array(roi.coordinates) + assert len(c.shape) == 2 + names += [roi.name] * c.shape[0] + labels += [roi.label] * c.shape[0] + x += list(c[:, 0]) + y += list(c[:, 1]) + + for roi in self.rois: + append_roi(roi) + for hole in roi.holes.values(): + append_roi(hole) + + df = pd.DataFrame({ + 'roi_name': names, + 'label': labels, + 'x_base': x, + 'y_base': y + }) + if dest is None: + dest = f'{self.name}.csv' + df.to_csv(dest, index=False) + log.info(f"{len(self.rois)} ROIs exported to {abspath(dest)}") + return abspath(dest) + + def get_qc_mask(self, roi: bool = True) -> Optional[np.ndarray]: + """Return the combined QC mask for the slide. + + Args: + roi (bool): Whether to include ROI masks. Defaults to True. + + """ + _all_masks = [m for m in self.qc_masks if (roi or (not m.is_roi))] + if not _all_masks: + return None + elif len(_all_masks) == 1: + return _all_masks[0].mask + else: + _, smallest = min((m.shape[0], idx) + for (idx, m) in enumerate(_all_masks)) + shape = _all_masks[smallest].shape + mask = skimage.transform.resize(_all_masks[0].mask, shape).astype(bool) + for _next in _all_masks[1:]: + _next_m = skimage.transform.resize(_next.mask, shape).astype(bool) + mask = np.logical_or(mask, _next_m) + return mask + + def get_masked_coord(self) -> np.ma.core.MaskedArray: + """Get a masked array of the coordinate grid, masked by QC. + + The returned masked array is of shape (n, 4), where n is the number of tiles. + The columns are (x, y, grid_x, grid_y), where x and y are the + top-left coordinates of the tile, and grid_x and grid_y are the + grid indices of the tile. + + """ + true_grid_indices = np.flatnonzero(self.grid) + linear_indices_of_coord = np.ravel_multi_index( + self.coord[:, 2:4].T, + dims=self.grid.shape + ) + unmasked_coord_indices = np.in1d( + linear_indices_of_coord, + true_grid_indices + ) + return np.ma.masked_array( + self.coord, + mask=~np.repeat(unmasked_coord_indices[:, None], 4, axis=1) + ) + + def get_rois(self, ignore_artifact: bool = False) -> List[ROI]: + """Get a list of ROIs. + + Args: + ignore_artifact (bool): Ignore artifact ROIs. Defaults to False. + + Returns: + List[ROI]: List of ROI objects. + + """ + if ignore_artifact: + return [roi for roi in self.rois if roi.label not in self.artifact_labels] + return self.rois + + def get_artifacts(self) -> List[ROI]: + """Get a list of artifact ROIs. + + Returns: + List[ROI]: List of artifact ROI objects. + + """ + return [roi for roi in self.rois if roi.label in self.artifact_labels] + + def get_roi_by_name(self, name: str) -> Optional[ROI]: + """Get an ROI by its name. + + Args: + name (str): Name of the ROI. + + Returns: + ROI: ROI object. + + """ + for roi in self.rois: + if roi.name == name: + return roi + return None + + def get_tile_coord(self, anchor='topleft') -> np.ndarray: + """Get a coordinate grid of all tiles, restricted to those that pass QC + and any ROI filtering. + + The returned array is of shape (n, 4), where n is the number of tiles. + The columns are (x, y, grid_x, grid_y), where x and y are the + top-left coordinates of the tile, and grid_x and grid_y are the + grid indices of the tile. + + """ + if anchor not in ('center', 'topleft'): + raise ValueError("Expected `anchor` to be 'center' or 'topleft'") + c = self.coord[ + self.grid[tuple(self.coord[:, 2:4].T)].astype(bool) + ].copy() + if anchor == 'center': + c[:, 0] += int(self.full_extract_px/2) + c[:, 1] += int(self.full_extract_px/2) + return c + + def get_tile_dataframe(self) -> pd.DataFrame: + """Build a dataframe of tiles and associated ROI labels. + + Returns: + Pandas dataframe of all tiles, with the following columns: + - ``loc_x``: X-coordinate of tile center + - ``loc_y``: Y-coordinate of tile center + - ``grid_x``: X grid index of the tile + - ``grid_y``: Y grid index of the tile + - ``roi_name``: Name of the ROI if tile is in an ROI, else None + - ``roi_desc``: Description of the ROI if tile is in ROI, else None + - ``label``: ROI label, if present. + + """ + roi_names = [] + roi_desc = [] + labels = [] + index = [] + loc = [] + grid = [] + for x, y, xi, yi in self.coord: + if not self.grid[xi, yi]: + continue + _, roi = self.get_tile_roi(grid=(xi, yi)) + + # Convert from top-left to center coordinates + x += int(self.full_extract_px/2) + y += int(self.full_extract_px/2) + + loc.append([x, y]) + grid.append([xi, yi]) + roi_names.append(None if not roi else roi.name) + roi_desc.append(None if not roi else roi.description) + labels.append(None if not roi else roi.label) + index.append(f'{self.name}-{x}-{y}') + loc = np.array(loc) + grid = np.array(grid) + df = pd.DataFrame({ + 'loc_x': loc[:, 0], + 'loc_y': loc[:, 1], + 'grid_x': grid[:, 0], + 'grid_y': grid[:, 1], + 'roi_name': roi_names, + 'roi_desc': roi_desc, + 'label': labels + }, index=index) + return df + + def get_tile_roi_mask( + self, + *, + grid: Optional[Tuple[int, int]] = None, + loc: Optional[Tuple[int, int]] = None, + mode: str = 'binary', + roi_labels: Optional[List[str]] = None + ) -> np.ndarray: + """Get the ROI mask for a tile at the given location. + + Keyword Args: + grid (tuple[int, int], optional): Grid indices of the tile. + Must supply either ``grid`` or ``loc``. Defaults to None. + loc (tuple[int, int], optional): Location of the tile center. + Must supply either ``grid`` or ``loc``. Defaults to None. + mode (str, optional): 'binary', 'multiclass', or 'multilabel'. + Defaults to 'binary'. + roi_labels (list[str], optional): List of ROI labels to include. + Defaults to None. + + Returns: + np.ndarray: ROI mask for the tile, with dtype int and shape + (n, tile_px, tile_px), where n is the number of ROI labels. + + """ + if grid is None and loc is None: + raise ValueError("Either grid or loc must be provided.") + + # Definitions. + fe = self.full_extract_px + fs = self.full_stride + scale = self.tile_px / fe + + # Get the polygon vertices for the tile. + if grid is not None: + # Convert from grid to top-left coordinates + gx, gy = grid + topleft = (gx * fs, gy * fs) + bottomleft = (gx * fs, (gy * fs) + fe) + bottomright = ((gx * fs) + fe, (gy * fs) + fe) + topright = ((gx * fs) + fe, gy * fs) + else: + # Convert from center to top-left coordinates + cx, cy = loc + cx -= int(fe / 2) + cy -= int(fe / 2) + topleft = (cx, cy) + bottomleft = (cx, cy + fe) + bottomright = (cx + fe, cy + fe) + topright = (cx + fe, cy) + + # Get a polygon for the tile, used for determining overlapping ROIs. + tile = sg.Polygon([topleft, bottomleft, bottomright, topright]) + + # Compute the mask from ROIs. + if len(self.rois) == 0: + if roi_labels: + mask = np.zeros((len(roi_labels), self.tile_px, self.tile_px), dtype=int) + else: + mask = np.zeros((1, self.tile_px, self.tile_px), dtype=int) + + # Handle ROIs with labels (multilabel or multiclass) + elif roi_labels: + labeled_masks = [] + for label in roi_labels: + wsi_polys = [p.poly for p in self.rois if p.label == label] + if len(wsi_polys) == 0: + mask = np.zeros((self.tile_px, self.tile_px), dtype=int) + labeled_masks.append(mask) + else: + all_polys = unary_union(wsi_polys) + polys = get_scaled_and_intersecting_polys( + all_polys, tile, scale, topleft + ) + if isinstance(polys, sg.Polygon) and polys.is_empty: + mask = np.zeros((self.tile_px, self.tile_px), dtype=int) + else: + # Rasterize to an int mask. + mask = rasterio.features.rasterize( + [polys], + out_shape=[self.tile_px, self.tile_px] + ) + mask = mask.astype(int) + labeled_masks.append(mask) + mask = np.stack(labeled_masks, axis=0) + + # Handle ROIs without labels (binary) + else: + # Determine the intersection at the given tile location. + all_polys = unary_union([p.poly for p in self.rois]) + polys = get_scaled_and_intersecting_polys( + all_polys, tile, scale, topleft + ) + + if isinstance(polys, sg.Polygon) and polys.is_empty: + mask = np.zeros((self.tile_px, self.tile_px), dtype=int) + else: + # Rasterize to an int mask. + try: + mask = rasterio.features.rasterize( + [polys], + out_shape=[self.tile_px, self.tile_px] + ) + mask = mask.astype(bool).astype(np.int32) + except ValueError: + mask = np.zeros((self.tile_px, self.tile_px), dtype=int) + + # Add a dummy channel dimension. + mask = mask[None, :, :] + + # Process according to the mode. + if mode == 'multiclass': + mask = mask * np.arange(1, mask.shape[0]+1)[:, None, None] + mask = mask.max(axis=0) + elif mode == 'binary' and mask.ndim == 3: + mask = np.any(mask, axis=0)[None, :, :].astype(int) + + return mask + + def has_non_roi_qc(self) -> bool: + """Check if the slide has any non-ROI QC masks.""" + return any(not m.is_roi for m in self.qc_masks) + + def extract_tiles( + self, + tfrecord_dir: Optional[str] = None, + tiles_dir: Optional[str] = None, + img_format: str = 'jpg', + report: bool = True, + **kwargs + ) -> Optional[SlideReport]: + """Extracts tiles from slide using the build_generator() method, + saving tiles into a TFRecord file or as loose JPG tiles in a directory. + + Args: + tfrecord_dir (str): If provided, saves tiles into a TFRecord file + (named according to slide name) here. + tiles_dir (str): If provided, saves loose images in a subdirectory + (per slide name) here. + img_format (str): 'png' or 'jpg'. Format of images for internal + storage in tfrecords. PNG (lossless) format recommended for + fidelity, JPG (lossy) for efficiency. Defaults to 'jpg'. + + Keyword Args: + whitespace_fraction (float, optional): Range 0-1. Defaults to 1. + Discard tiles with this fraction of whitespace. If 1, will not + perform whitespace filtering. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. If 1, will not + perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this threshold are + considered grayspace. + normalizer (str, optional): Normalization to use on image tiles. + Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + full_core (bool, optional): Extract an entire detected core, rather + than subdividing into image tiles. Defaults to False. + shuffle (bool): Shuffle images during extraction. + num_threads (int): Number of threads to allocate to workers. + yolo (bool, optional): Export yolo-formatted tile-level ROI + annotations (.txt) in the tile directory. Requires that + tiles_dir is set. Defaults to False. + draw_roi (bool, optional): Draws ROIs onto extracted tiles. + Defaults to False. + dry_run (bool, optional): Determine tiles that would be extracted, + but do not export any images. Defaults to None. + num_threads (int): If specified, will extract tiles with a + ThreadPool using the specified number of threads. Cannot + supply both `num_threads` and `num_processes`. Libvips is + particularly slow with ThreadPools. Defaults to None in the + Libvips backend, and the number of CPU cores when using cuCIM. + num_processes (int): If specified, will extract tiles with a + multiprocessing pool using the specified number of processes. + Cannot supply both `num_threads` and `num_processes`. + With the libvips backend, this defaults to half the number of + CPU cores, and with cuCIM, this defaults to None. + """ + if img_format not in ('png', 'jpg', 'jpeg'): + raise ValueError(f"Invalid image format {img_format}") + + dry_run = kwargs['dry_run'] if 'dry_run' in kwargs else False + + # Make base directories + if tfrecord_dir and not dry_run: + if not exists(tfrecord_dir): + os.makedirs(tfrecord_dir) + if tiles_dir and not dry_run: + tiles_dir = os.path.join(tiles_dir, self.name) + if not os.path.exists(tiles_dir): + os.makedirs(tiles_dir) + + # Log to keep track of when tiles have finished extracting + # To be used in case tile extraction is interrupted, so the slide + # can be flagged for re-extraction + + if (tfrecord_dir or tiles_dir) and not dry_run: + unfinished_marker = join( + (tfrecord_dir if tfrecord_dir else tiles_dir), # type: ignore + f'{self.name}.unfinished' + ) + with open(unfinished_marker, 'w') as marker_file: + marker_file.write(' ') + if tfrecord_dir and not dry_run: + writer = sf.io.TFRecordWriter(join( + tfrecord_dir, + self.name+".tfrecords" + )) + + generator = self.build_generator( + img_format=img_format, + **kwargs + ) + if not generator: + if tfrecord_dir: + os.remove(join(tfrecord_dir, self.name+".tfrecords")) + return None + + sample_tiles = [] # type: List + generator_iterator = generator() + locations = [] + grid_locations = [] + ws_fractions = [] + gs_fractions = [] + num_wrote_to_tfr = 0 + slide_bytes = bytes(self.name, 'utf-8') + + for index, tile_dict in enumerate(generator_iterator): + x, y = location = tile_dict['loc'] + locations += [location] + grid_locations += [tile_dict['grid']] + if 'ws_fraction' in tile_dict: + ws_fractions += [tile_dict['ws_fraction']] + if 'gs_fraction' in tile_dict: + gs_fractions += [tile_dict['gs_fraction']] + + if dry_run: + continue + + img_str = tile_dict['image'] + if len(sample_tiles) < 10: + sample_tiles += [img_str] + elif (not tiles_dir and not tfrecord_dir) and not dry_run: + break + if tiles_dir: + img_f = join( + tiles_dir, + f'{self.shortname}-{x}-{y}.{img_format}' + ) + with open(img_f, 'wb') as outfile: + outfile.write(img_str) + if 'yolo' in tile_dict and len(tile_dict['yolo']): + yolo_f = join(tiles_dir, f'{self.shortname}-{x}-{y}.txt') + with open(yolo_f, 'w') as outfile: + for ann in tile_dict['yolo']: + yolo_str_fmt = "0 {:.3f} {:.3f} {:.3f} {:.3f}\n" + outfile.write(yolo_str_fmt.format( + ann[0], + ann[1], + ann[2], + ann[3] + )) + if tfrecord_dir: + record = sf.io.serialized_record(slide_bytes, img_str, x, y) + writer.write(record) + num_wrote_to_tfr += 1 + if tfrecord_dir and not dry_run: + writer.close() + if not num_wrote_to_tfr: + os.remove(join(tfrecord_dir, self.name+".tfrecords")) + log.info(f'No tiles extracted for [green]{self.name}') + if self.pb is None: + generator_iterator.close() + + if (tfrecord_dir or tiles_dir) and not dry_run: + try: + os.remove(unfinished_marker) + except OSError: + log.error(f"Unable to mark slide {self.name} as complete") + + # Generate extraction report + if report: + log.debug("Generating slide report") + loc_np = np.array(locations, dtype=np.int64) + grid_np = np.array(grid_locations, dtype=np.int64) + df_dict = { + 'loc_x': [] if not len(loc_np) else pd.Series(loc_np[:, 0], dtype=int), + 'loc_y': [] if not len(loc_np) else pd.Series(loc_np[:, 1], dtype=int), + 'grid_x': [] if not len(grid_np) else pd.Series(grid_np[:, 0], dtype=int), + 'grid_y': [] if not len(grid_np) else pd.Series(grid_np[:, 1], dtype=int) + } + if ws_fractions: + df_dict.update({'ws_fraction': pd.Series(ws_fractions, dtype=float)}) + if gs_fractions: + df_dict.update({'gs_fraction': pd.Series(gs_fractions, dtype=float)}) + report_data = dict( + blur_burden=self.blur_burden, + num_tiles=len(locations), + qc_mask=self.qc_mask, + locations=pd.DataFrame(df_dict), + num_rois=(0 if self.roi_method == 'ignore' else len(self.rois)), + tile_px=self.tile_px, + tile_um=self.tile_um, + ) + slide_report = SlideReport( + sample_tiles, + self.slide.path, + data=report_data, + thumb_coords=locations, + tile_px=self.tile_px, + tile_um=self.tile_um, + ) + return slide_report + else: + log.debug("Skipping slide report") + return None + + def extract_cells( + self, + tfrecord_dir: Optional[str] = None, + tiles_dir: Optional[str] = None, + img_format: str = 'jpg', + report: bool = True, + apply_masks: bool = True, + **kwargs + ) -> Optional[SlideReport]: + """Extract tiles from cell segmentation centroids. + + Args: + tfrecord_dir (str): If provided, saves tiles into a TFRecord file + (named according to slide name) here. + tiles_dir (str): If provided, saves loose images into a + subdirectory (per slide name) here. + img_format (str): 'png' or 'jpg'. Format of images for internal + storage in tfrecords. PNG (lossless) format recommended for + fidelity, JPG (lossy) for efficiency. Defaults to 'jpg'. + report (bool): Generate and return PDF report of tile extraction. + apply_masks (bool): Apply cell segmentation masks to the extracted + tiles. Defaults to True. + + Keyword Args: + **kwargs: All keyword arguments are passed to :meth:`WSI.extract_tiles()`. + """ + if self.segmentation is None: + raise ValueError( + "Cannot build generator from segmentation centroids; " + "segmentation not yet applied. Use WSI.apply_segmentation()." + ) + return self.extract_tiles( + tfrecord_dir, + tiles_dir, + img_format, + report, + apply_masks=apply_masks, + from_centroids=True, + **kwargs + ) + + def get_tile_roi( + self, + coord: Optional[Tuple[int, int]] = None, + grid: Optional[Tuple[int, int]] = None, + ) -> Tuple[Optional[int], Optional[str]]: + """Find the ROI that contains a given tile. + + Args: + coord (Tuple[int, int], optional): Base-level coordinates of the + tile. Cannot supply both ``coord`` and ``grid``. Defaults to None. + grid (Tuple[int, int], optional): Grid index of the tile. + Cannot supply both ``coord`` and ``grid``. Defaults to None. + + Returns: + Tuple[int, ROI]: ROI index (index of WSI.rois) and + the :class:`slideflow.slide.ROI` that contains the tile. + If no ROI contains the tile, returns (None, None). + + """ + if coord is not None and grid is not None: + raise ValueError("Cannot specify both coord and grid") + if coord is not None: + grid = self.coord_to_grid(*coord) + elif grid is None: + raise ValueError("Must specify either coord or grid") + if self.roi_grid is None: + return None, None + grid_x, grid_y = grid + roi_idx = self.roi_grid[grid_x, grid_y] - 1 + if roi_idx == -1: + return None, None + else: + return roi_idx, self.rois[roi_idx] + + def grid_to_coord( + self, + grid_x: int, + grid_y: int, + *, + anchor: str = 'center' + ) -> Tuple[int, int]: + """Find the base-level coordinates of a tile by its grid index. + + Args: + grid_x (int): x-index of the tile in the grid. + grid_y (int): y-index of the tile in the grid. + + Keyword args: + anchor (str): Anchor point for the coordinates. Either 'topleft' + or 'center'. Defaults to 'center'. + + Returns: + Tuple[int, int]: Base-level coordinates of the tile. + + Raises: + ValueError: If anchor is not 'topleft' or 'center'. + IndexError: If tile is not found at the given coordinates. + + """ + if anchor not in ('topleft', 'center'): + raise ValueError("anchor must be 'topleft' or 'center'") + grid_idx, = np.where(( + (self.coord[:, 2] == grid_x) + & (self.coord[:, 3] == grid_y) + )) + if not len(grid_idx): + raise IndexError(f"Tile at grid=({grid_x}, {grid_y}) not found") + assert len(grid_idx) == 1 + x, y, grid_x, grid_y = self.coord[grid_idx[0]] + if anchor == 'center': + x += int(self.full_extract_px/2) + y += int(self.full_extract_px/2) + return x, y + + def get_tile_mask(self, index, sparse_mask) -> np.ndarray: + """Get a mask for a tile, given a sparse mask. + + Examples + Get a mask for a tile, given a sparse mask. + + >>> from slideflow.cellseg import seg_utils, Segmentation + >>> segmentation = Segmentation(...) + >>> wsi = sf.WSI(...) + >>> wsi.apply_segmentation(segmentation) + >>> sparse_mask = seg_utils.sparse_mask(segmentation.masks) + >>> wsi.get_tile_mask(0, sparse_mask) + <numpy.ndarray> + + Args: + index (int): Index of tile. + sparse_mask (scipy.sparse.csr_matrix): Sparse mask. + + Returns: + numpy.ndarray: Mask for tile. + + """ + # Get the corresponding segmentation mask, reading from the sparse matrix + seg = self.segmentation + if seg is None: + raise ValueError("Segmentation not yet applied to slide.") + mask_idx = self.seg_coord[index][2] + 1 # sparse mask index starts at 1 + mask_y, mask_x = np.unravel_index(sparse_mask[mask_idx].data, seg.masks.shape) + + # This is the top-left coordinate, in WSI base dimension, + # of the tile extraction window. + wsi_tile_top_left = self.seg_coord[index][0:2] + + # Determine the mask array offset (top-left), in mask coordinate space. + wsi_mask_x_offset = np.round(seg.wsi_offset[0] / seg.wsi_ratio).astype(np.int32) + wsi_mask_y_offset = np.round(seg.wsi_offset[1] / seg.wsi_ratio).astype(np.int32) + + # Offset the mask to reflect WSI space (but still in mask coordinates). + wsi_mask_x = mask_x + wsi_mask_x_offset + wsi_mask_y = mask_y + wsi_mask_y_offset + + # Determine the tile window offset (top-left), in mask coordinate space. + tile_offset_x_in_mask_space = np.round(wsi_tile_top_left[0] / seg.wsi_ratio).astype(np.int32) + tile_offset_y_in_mask_space = np.round(wsi_tile_top_left[1] / seg.wsi_ratio).astype(np.int32) + + # Adjust the mask coordinate space, using the tile window offset as origin. + tile_mask_x = (wsi_mask_x - tile_offset_x_in_mask_space) + tile_mask_y = (wsi_mask_y - tile_offset_y_in_mask_space) + + # Calculate the size of the tile window, in mask coordinate space. + mask_tile_size = int(self.full_extract_px / seg.wsi_ratio) + + # Clip the mask to the tile window view. + tile_mask_x = tile_mask_x.clip(0, mask_tile_size-1) + tile_mask_y = tile_mask_y.clip(0, mask_tile_size-1) + + # Convert mask coordinates (in sparse format) to 2D array. + unsized = np.zeros((mask_tile_size, mask_tile_size), dtype=np.int32) + unsized[tile_mask_y, tile_mask_x] = 1 + + # Resize mask from mask coordinates to tile extraction WSI coordinates. + return unsized + + def has_rois(self) -> bool: + """Checks if the slide has loaded ROIs and they are not being ignored.""" + return (self.roi_method != 'ignore' + and len(self.rois)) + + def get_next_roi_name(self) -> str: + """Get the next available name for an ROI.""" + existing = [ + int(r.name[4:]) for r in self.rois + if r.name.startswith('ROI_') and r.name[4:].isnumeric() + ] + hole_ids = [ + int(hole.name[4:]) for r in self.rois + for hole in r.holes.values() + if hole.name.startswith('ROI_') and hole.name[4:].isnumeric() + ] + existing += hole_ids + roi_id = max(existing) + 1 if existing else 0 + name = f'ROI_{roi_id}' + return name + + def load_roi_array( + self, + array: np.ndarray, + *, + process: bool = True, + label: Optional[str] = None, + name: Optional[str] = None, + allow_errors: bool = False, + simplify_tolerance: Optional[float] = None + ) -> int: + """Load an ROI from a numpy array. + + Args: + array (np.ndarray): Array of shape (n_points, 2) containing + the coordinates of the ROI shape, in base (level=0) dimension. + + Keyword Args: + process (bool): Process ROIs after loading. Defaults to True. + + """ + name = name or self.get_next_roi_name() + try: + roi = ROI(name, array, label=label) + except errors.InvalidROIError as e: + if allow_errors: + log.warn("Unable to load ROI: {}".format(e)) + return + else: + raise + if simplify_tolerance is not None: + roi.simplify(simplify_tolerance) + self.rois.append(roi) + if self.roi_method == 'auto': + self.roi_method = 'inside' + if process: + self.process_rois() + for i, _roi in enumerate(self.rois): + if _roi == roi: + return i + for hole in _roi.holes.values(): + if hole == roi: + return i + return None + + def load_csv_roi( + self, + path: str, + *, + process: bool = True, + scale: int = 1, + skip_invalid: bool = True, + simplify_tolerance: Optional[float] = None + ) -> int: + """Load ROIs from a CSV file. + + CSV file must contain headers 'ROI_name', 'X_base', and 'Y_base'. + + Any previously loaded ROIs are cleared prior to loading. + + Args: + path (str): Path to CSV file. + + Keyword Args: + process (bool): Process ROIs after loading. Defaults to True. + scale (int): Scale factor to apply to ROI coordinates. Defaults to 1. + + """ + # Clear any previously loaded ROIs. + self.rois = [] + + roi_dict = {} + with open(path, "r") as csvfile: + reader = csv.reader(csvfile, delimiter=',') + try: + headers = next(reader, None) + if headers is None: + raise Exception + headers = [h.lower() for h in headers] + index_name = headers.index("roi_name") + index_x = headers.index("x_base") + index_y = headers.index("y_base") + except Exception: + raise errors.ROIError( + f'Unable to read CSV ROI [green]{path}[/]. Please ensure ' + 'headers contain "ROI_name", "X_base and "Y_base".' + ) + index_label = None if not "label" in headers else headers.index("label") + for row in reader: + roi_name = row[index_name] + x_coord = int(float(row[index_x]) * scale) + y_coord = int(float(row[index_y]) * scale) + label = None if index_label is None else row[index_label] + + if roi_name not in roi_dict: + roi_dict[roi_name] = { + 'coords': [], + 'label': label + } + roi_dict[roi_name]['coords'].append((x_coord, y_coord)) + + for roi_name in roi_dict: + try: + roi = ROI( + roi_name, + np.array(roi_dict[roi_name]['coords']), + label=roi_dict[roi_name]['label'] + ) + except errors.InvalidROIError as e: + if skip_invalid: + log.warn("Skipping invalid ROI ({}): {}".format(roi_name, e)) + continue + else: + raise + else: + if simplify_tolerance is not None: + roi.simplify(simplify_tolerance) + self.rois.append(roi) + if process: + self.process_rois() + log.debug(f"Loaded ROIs from {path}") + return len(self.rois) + + def load_json_roi( + self, + path: str, + *, + scale: int = 1, + process: bool = True, + skip_invalid: bool = True + ) -> int: + """Load ROIs from a JSON file. + + JSON file must contain a 'shapes' key, with a list of dictionaries + containing a 'points' key, whose value is a list of (x, y) coordinates. + + Args: + path (str): Path to JSON file. + scale (int): Scale factor to apply to ROI coordinates. Defaults to 1. + process (bool): Process ROIs after loading. Defaults to True. + + """ + # Clear any previously loaded ROIs. + self.rois = [] + + with open(path, "r") as json_file: + json_data = json.load(json_file)['shapes'] + for shape in json_data: + area_reduced = np.multiply(shape['points'], scale).astype(np.int64) + roi_name = self.get_next_roi_name() + try: + self.rois.append(ROI(roi_name, area_reduced)) + except errors.InvalidROIError as e: + if skip_invalid: + log.warn("Skipping invalid ROI ({}): {}".format(roi_name, e)) + + if process: + self.process_rois() + if self.roi_method == 'auto': + self.roi_method = 'inside' + return len(self.rois) + + def masked_thumb(self, background: str = 'white', **kwargs) -> np.ndarray: + """Return a masked thumbnail of a slide, using QC and/or ROI masks. + + Args: + background (str, optional): Background color. Defaults to 'white'. + + Keyword args: + **kwargs: Keyword arguments passed to :meth:`WSI.thumb()`. + + Returns: + np.ndarray: Masked thumbnail image. + + """ + if background not in ('white', 'black'): + raise ValueError( + f"Unexpected background option: '{background}'. Expected " + "'black' or 'white'." + ) + qc_mask = self.qc_mask + roi_mask = self.roi_mask + image = np.asarray(self.thumb(**kwargs)) + if qc_mask is None and roi_mask is None: + # Apply Otsu's threshold to background area + # to prevent whitespace from interfering with normalization + from slideflow.slide.qc import Otsu, GaussianV2 + sf.log.debug( + "Applying Otsu's thresholding & Gaussian blur filter " + "to stain norm context" + ) + _blur_mask = GaussianV2()(image) + qc_mask = Otsu()(image, mask=_blur_mask) + # Mask by ROI and QC, if applied. + # Use white as background for masked areas. + if qc_mask is not None: + qc_img = img_as_ubyte(qc_mask) + mask = ~cv2.resize(qc_img, (image.shape[1], image.shape[0])) + if roi_mask is not None: + roi_img = img_as_ubyte(roi_mask) + roi_mask = cv2.resize(roi_img, (image.shape[1], image.shape[0])) + if qc_mask is not None: + mask = mask & roi_mask + else: + mask = roi_mask + if background == 'white': + white_bg = np.full(image.shape, 255, dtype=np.uint8) + white_mask = cv2.bitwise_or(white_bg, white_bg, mask=~mask) + return cv2.bitwise_or(image, white_mask) + else: + return cv2.bitwise_or(image, image, mask=mask) + + def mpp_to_dim(self, mpp: float) -> Tuple[int, int]: + width = int((self.mpp * self.dimensions[0]) / mpp) + height = int((self.mpp * self.dimensions[1]) / mpp) + return (width, height) + + def predict( + self, + model: str, + **kwargs + ) -> Tuple[np.ndarray, Optional[np.ndarray]]: + """Generate a whole-slide prediction from a saved model. + + Args: + model (str): Path to saved model trained in Slideflow. + + Keyword args: + batch_size (int, optional): Batch size for calculating predictions. + Defaults to 32. + num_threads (int, optional): Number of tile worker threads. Cannot + supply both ``num_threads`` (uses thread pool) and + ``num_processes`` (uses multiprocessing pool). Defaults to + CPU core count. + num_processes (int, optional): Number of child processes to spawn + for multiprocessing pool. Defaults to None (does not use + multiprocessing). + img_format (str, optional): Image format (png, jpg) to use when + extracting tiles from slide. Must match the image format + the model was trained on. If 'auto', will use the format + logged in the model params.json. Defaults to 'auto'. + device (torch.device, optional): PyTorch device. Defaults to + initializing a new CUDA device. + generator_kwargs (dict, optional): Keyword arguments passed to + the :meth:`slideflow.WSI.build_generator()`. + + Returns: + np.ndarray: Predictions for each outcome, with shape = (num_classes, ) + + np.ndarray, optional: Uncertainty for each outcome, if the model was + trained with uncertainty, with shape = (num_classes,) + + """ + from slideflow import Heatmap + + config = sf.util.get_model_config(model) + _compatible = sf.util.is_tile_size_compatible( + config['tile_px'], + config['tile_um'], + self.tile_px, + self.tile_um + ) + if not _compatible: + raise errors.IncompatibleTileSizeError( + "Slide tile size (tile_px={}, tile_um={}) does not match the " + "model (tile_px={}, tile_um={}).".format( + self.tile_px, self.tile_um, + config['tile_px'], config['tile_um'] + )) + log.info("Calculating whole-slide prediction...") + heatmap = Heatmap(self, model, generate=True, **kwargs) + preds = heatmap.predictions.reshape(-1, heatmap.predictions.shape[-1]) + preds = np.nanmean(preds, axis=0).filled() + if heatmap.uncertainty is not None: + unc = heatmap.uncertainty.reshape(-1, heatmap.uncertainty.shape[-1]) + unc = np.nanmean(unc, axis=0).filled() + return preds, unc + else: + return preds + + def preview( + self, + rois: bool = True, + thumb_kwargs: Optional[Dict] = None, + low_res: bool = True, + **kwargs + ) -> Optional[Image.Image]: + """Performs a dry run of tile extraction without saving any images, + returning a PIL image of the slide thumbnail annotated with a grid of + tiles that were marked for extraction. + + Args: + rois (bool, optional): Draw ROI annotation(s) onto the image. + Defaults to True. + + Keyword Args: + whitespace_fraction (float, optional): Range 0-1. Defaults to 1. + Discard tiles with this fraction of whitespace. If 1, will not + perform whitespace filtering. + whitespace_threshold (int, optional): Range 0-255. Defaults to 230. + Threshold above which a pixel (RGB average) is considered + whitespace. + grayspace_fraction (float, optional): Range 0-1. Defaults to 0.6. + Discard tiles with this fraction of grayspace. If 1, will not + perform grayspace filtering. + grayspace_threshold (float, optional): Range 0-1. Defaults to 0.05. + Pixels in HSV format with saturation below this threshold are + considered grayspace. + full_core (bool, optional): Extract an entire detected core, rather + than subdividing into image tiles. Defaults to False. + num_threads (int): Number of threads to allocate to workers. + yolo (bool, optional): Export yolo-formatted tile-level ROI + annotations (.txt) in the tile directory. Requires that + tiles_dir is set. Defaults to False. + thumb_kwargs (Optional[Dict], optional): Keyword arguments to pass + to the thumb method. Defaults to None. + low_res (bool, optional): Use low resolution thumbnail. Defaults to + True. + """ + if 'show_progress' not in kwargs: + kwargs['show_progress'] = (self.pb is None) + generator = self.build_generator( + dry_run=True, + deterministic=False, + **kwargs + ) + if thumb_kwargs is None: + thumb_kwargs = dict(low_res=low_res) + if generator is None: + return self.thumb(rois=rois, **thumb_kwargs) + locations = [] + for tile_dict in generator(): + locations += [tile_dict['loc']] + log.debug(f"Previewing with {len(locations)} extracted tile locations.") + return self.thumb( + coords=locations, rois=rois, **thumb_kwargs + ) + + def process_rois(self): + """Process loaded ROIs and apply to the slide grid. + + Returns: + int: Number of ROIs processed. + + """ + # Load annotations as shapely.geometry objects. + if self.roi_method != 'ignore': + self._find_and_process_holes() + + # Regenerate the grid to reflect the newly-loaded ROIs. + self._build_coord() + + # Re-apply any existing QC mask, now that the coordinates have changed. + if self.has_non_roi_qc(): + self.apply_qc_mask() + + return len(self.rois) + + def _find_and_process_holes(self): + """Find and process holes in ROIs.""" + + from shapely.strtree import STRtree + + self.rois.sort(key=lambda x: x.poly.area, reverse=True) + + outer_rois = [] + + labels = list(set([roi.label for roi in self.rois])) + + for label in labels: + + rois = [roi for roi in self.rois if roi.label == label] + polygons = [roi.poly for roi in self.rois if roi.label == label] + strtree = STRtree(polygons) + + for roi, poly in zip(rois, polygons): + + if version.parse(shapely_version) < version.parse('2.0.0'): + possible_containers = strtree.query(poly) + else: + possible_containers_idx = strtree.query(poly) + possible_containers = [polygons[i] for i in possible_containers_idx] + + # Filter out the polygon itself + possible_containers = [p for p in possible_containers if p != poly] + + # Check if the polygon is contained by another + contained_by = [p for p in possible_containers if p.contains(poly)] + + if not contained_by: + # Polygon is an outer polygon + outer_rois.append(roi) + else: + # Polygon is a hole, find its immediate outer polygon + # Sort by area (smallest to largest) to find the closets outer. + contained_by.sort(key=lambda x: x.area) + immediate_outer_poly = contained_by[0] + immediate_outer_roi = rois[polygons.index(immediate_outer_poly)] + + # If the immediate outer is not already listed as an outer, + # then the immediate outer is a hole and this polygon is a nested + # polygon within a hole and should be treated as an outer. + if immediate_outer_roi not in outer_rois: + outer_rois.append(roi) + else: + # Otherwise, add the polygon to the immediate outer as a hole + immediate_outer_roi.add_hole(roi) + + # Restrict the ROIs to only outer polygons, which have now had the holes applied. + self.rois = outer_rois + + def qc( + self, + method: Union[str, Callable, List[Callable]], + *, + blur_radius: int = 3, + blur_threshold: float = 0.02, + filter_threshold: float = 0.6, + blur_mpp: Optional[float] = None, + pool: Optional["mp.pool.Pool"] = None + ) -> Optional[Image.Image]: + """Applies quality control to a slide, performing filtering based on + a whole-slide image thumbnail. + + 'blur' method filters out blurry or out-of-focus slide sections. + 'otsu' method filters out background based on automatic saturation + thresholding in the HSV colorspace. + 'both' applies both methods of filtering. + + Args: + method (str, Callable, list(Callable)): Quality control method(s). + If a string, may be 'blur', 'otsu', or 'both'. + If a callable (or list of callables), each must accept a sf.WSI + object and return a np.ndarray (dtype=np.bool). + blur_radius (int, optional): Blur radius. Only used if method is + 'blur' or 'both'. + blur_threshold (float, optional): Blur threshold. Only used if + method is 'blur' or 'both.' + filter_threshold (float): Percent of a tile detected as + background that will trigger a tile to be discarded. + Defaults to 0.6. + blur_mpp (float, optional): Size of WSI thumbnail on which to + perform blur QC, in microns-per-pixel. Defaults to 4 times the + tile extraction MPP (e.g. for a tile_px/tile_um combination + at 10X effective magnification, where tile_px=tile_um, the + default blur_mpp would be 4, or effective magnification 2.5x). + Only used if method is 'blur' or 'both'. + + Returns: + Image: Image of applied QC mask. + """ + + # Prepare known QC methods - 'blur', 'otsu', and 'both'. + if not isinstance(method, list): + method = [method] # type: ignore + if 'both' in method: + idx = method.index('both') # type: ignore + method.remove('both') # type: ignore + method.insert(idx, 'otsu') # type: ignore + # Blur should be performed before Otsu's thresholding + method.insert(idx, 'blur') # type: ignore + if 'blur' in method: + idx = method.index('blur') # type: ignore + method.remove('blur') # type: ignore + method.insert(idx, sf.slide.qc.GaussianV2(mpp=blur_mpp, + sigma=blur_radius, + threshold=blur_threshold)) + if 'otsu' in method: + idx = method.index('otsu') # type: ignore + method.remove('otsu') # type: ignore + method.insert(idx, sf.slide.qc.Otsu()) + + starttime = time.time() + img = None + log.debug(f"Applying QC: {method}") + for qc in method: + if isinstance(method, str): + raise errors.QCError(f"Unknown QC method {method}") + if pool is not None: + try: + qc.pool = pool # type: ignore + except Exception as e: + log.debug(f"Unable to set pool for QC method {qc}") + mask = qc(self) + if mask is not None: + img = self.apply_qc_mask(mask, filter_threshold=filter_threshold) + dur = f'(time: {time.time()-starttime:.2f}s)' + log.debug(f'QC ({method}) complete for slide {self.shortname} {dur}') + return img + + def remove_qc(self) -> None: + self.qc_masks = [m for m in self.qc_masks if m.is_roi] + self._build_coord() + log.debug(f'QC removed from slide {self.shortname}') + + def remove_roi_qc(self) -> None: + """Remove ROI-based QC from the slide.""" + self.qc_masks = [m for m in self.qc_masks if not m.is_roi] + if len(self.qc_masks): + self.apply_qc_mask() + + def remove_roi( + self, + idx: Union[int, List[int]], + *, + process: bool = True + ) -> None: + """Remove an ROI from the slide. + + Args: + idx (int, list(int)): Index or indices of the ROI(s) to remove. + + Keyword Args: + process (bool): Process ROIs after removing. Defaults to True. + + """ + if isinstance(idx, int): + idx = [idx] + for i in sorted(idx, reverse=True): + del self.rois[i] + if process: + self.process_rois() + + def set_artifacts( + self, + artifact_labels: Optional[Union[str, List[str]]] + ) -> None: + """Set artifact labels for all ROIs in the slide. + + Rebuilds the ROI grid after setting the artifacts. + + Args: + artifact_labels (str, list(str)): Artifact label(s) to set. + ROIs with these labels will be marked as artifacts. + + """ + if isinstance(artifact_labels, str): + artifact_labels = [artifact_labels] + if artifact_labels is not None and not all(isinstance(label, str) for label in artifact_labels): + raise TypeError("Artifact labels must be strings.") + self.artifact_labels = artifact_labels if artifact_labels is not None else [] + self.process_rois() + + def show_alignment( + self, + slide: "WSI", + mpp: float = 4 + ) -> Image.Image: + """Show aligned thumbnail of another slide.""" + if not isinstance(slide, WSI): + raise TypeError("Can only align to another slide.") + + # Calculate thumbnails for alignment. + our_thumb = np.array(self.thumb(mpp=mpp)) + their_thumb = np.array(slide.thumb(mpp=mpp)) + + # Return an image of a thumbnail of the given slide, + # aligned to this slide. + return Image.fromarray(align_image(their_thumb, our_thumb)) + + def square_thumb( + self, + width: int = 512, + use_associated_image: bool = True, + **kwargs + ) -> Image.Image: + '''Returns a square thumbnail of the slide, with black bar borders. + + Args: + width (int): Width/height of thumbnail in pixels. + + Returns: + PIL image + ''' + thumb = self.thumb( + width=width, + use_associated_image=use_associated_image, + **kwargs) + height = int(width / (thumb.width / thumb.height)) + thumb = thumb.resize((width, height)) + square_thumb = Image.new("RGB", (width, width)) + square_thumb.paste(thumb, (0, int((width-height)/2))) + return square_thumb + + def thumb( + self, + mpp: Optional[float] = None, + width: Optional[int] = None, + *, + coords: Optional[List[int]] = None, + rect_linewidth: int = 2, + rect_color: str = 'black', + rois: bool = False, + linewidth: int = 2, + color: str = 'black', + use_associated_image: bool = False, + low_res: bool = False, + ) -> Image.Image: + """Generate a PIL Image of the slide thumbnail, with ROI overlay. + + Args: + mpp (float, optional): Microns-per-pixel, used to determine + thumbnail size. + width (int, optional): Goal thumbnail width (alternative to mpp). + coords (list(int), optional): List of tile extraction coordinates + to show as rectangles on the thumbnail, in [(x_center, + y_center), ...] format. Defaults to None. + rois (bool, optional): Draw ROIs onto thumbnail. Defaults to False. + linewidth (int, optional): Width of ROI line. Defaults to 2. + color (str, optional): Color of ROI. Defaults to black. + use_associated_image (bool): Use the associated thumbnail image + in the slide, rather than reading from a pyramid layer. + low_res (bool): Create thumbnail from the lowest-mangnification + pyramid layer. Defaults to False. + + Returns: + PIL image + + """ + if rois and len(self.rois): + if (mpp is not None and width is not None): + raise ValueError( + "Either mpp or width must be given, but not both" + f" (got mpp={mpp}, width={width})" + ) + # If no values provided, create thumbnail of width 1024 + if mpp is None and width is None: + width = 1024 + if mpp is not None: + roi_scale = (self.dimensions[0] + / (int((self.mpp * self.dimensions[0]) / mpp))) + else: + roi_scale = self.dimensions[0] / width # type: ignore + + # If no values provided, create thumbnail of width 1024 + if mpp is None and width is None: + width = 1024 + if (mpp is not None and width is not None): + raise ValueError( + "Either mpp or width must be given, but not both" + f" (got mpp={mpp}, width={width})" + ) + + # Calculate goal width/height according to specified microns-per-pixel + if mpp: + width = int((self.mpp * self.dimensions[0]) / mpp) + # Otherwise, calculate approximate mpp based on provided width + # (to generate proportional height) + else: + assert width is not None + mpp = (self.mpp * self.dimensions[0]) / width + # Calculate appropriate height + height = int((self.mpp * self.dimensions[1]) / mpp) + + if use_associated_image: + log.debug("Requesting thumbnail using associated image") + thumb_kw = dict(associated='thumbnail') + elif low_res: + log.debug("Requesting thumbnail at level={}, width={}".format( + self.slide.level_count-1, width + )) + thumb_kw = dict(level=self.slide.level_count-1, width=width) + else: + ds = self.dimensions[0] / width + level = self.slide.best_level_for_downsample(ds) + log.debug("Requesting thumbnail at level={}, width={}".format( + level, width + )) + thumb_kw = dict(level=level, width=width) + + np_thumb = self.slide.thumbnail(**thumb_kw) + thumb = Image.fromarray(np_thumb).resize((width, height)) + + if coords: + draw = ImageDraw.Draw(thumb) + ratio = width / self.dimensions[0] + wh = (self.full_extract_px * ratio) / 2 + for (x, y) in coords: # type: ignore + x, y = x * ratio, y * ratio # type: ignore + coords = (x-wh, y-wh, x+wh, y+wh) # type: ignore + draw.rectangle(coords, outline=rect_color, width=rect_linewidth) + + if rois and len(self.rois): + draw = ImageDraw.Draw(thumb) + roi_polys = [r.scaled_poly(roi_scale) for r in self.rois] + for roi in self.rois: + for hole in roi.holes.values(): + roi_polys.append(hole.scaled_poly(roi_scale)) + for i, poly in enumerate(roi_polys): + if poly.geom_type == 'Polygon': + x, y = poly.exterior.coords.xy + zipped = list(zip(x.tolist(), y.tolist())) + draw.line(zipped, joint='curve', fill=color, width=linewidth) + elif poly.geom_type in ('MultiPolygon', 'GeometryCollection'): + for part in poly.geoms: + if part.is_empty or part.geom_type != 'Polygon': + continue + x, y = part.exterior.coords.xy + zipped = list(zip(x.tolist(), y.tolist())) + draw.line(zipped, joint='curve', fill=color, width=linewidth) + else: + sf.log.error(f"Unable to plot ROI {i}, unknown geometry type: {poly.geom_type}") + return thumb + else: + return thumb + + def tensorflow( + self, + img_format: str = 'numpy', + incl_slidenames: bool = False, + incl_loc: Optional[str] = None, + shuffle: bool = True, + **kwargs + ) -> Any: + """Create a Tensorflow Dataset which extractes tiles from this slide. + + Args: + img_format (str, optional): Image format for returned image tiles. + Options include 'png', 'jpg', and 'numpy'. Defaults to 'numpy'. + incl_slidenames (bool, optional): Yield slide names for each + image tile. Defaults to False. + incl_loc (Optional[str], optional): Yield image tile location + with each image tile. Options include True, 'coord', or 'grid'. + If True or 'coord', will return X/Y coordinates of the tile center + in the slide's highest magnification layer. If 'grid', returns + the grid indices for the tile. Defaults to None. + shuffle (bool, optional): Shuffle image tiles. Defaults to True. + + Returns: + tf.data.Dataset + + Yields: + Iterator[Any]: Items yielded by the Dataset are in dictionary + format, with the keys: + + 'image_raw': Contains the image (jpg, png, or numpy) + 'slide': Slide name (if ``incl_slidenames=True``) + 'loc_x' Image tile center x location (if ``incl_loc`` provided) + 'loc_y' Image tile center y location (if ``incl_loc`` provided) + """ + + import tensorflow as tf + + def tile_generator(): + for image_dict in self.build_generator( + shuffle=shuffle, + show_progress=False, + img_format=img_format, + **kwargs + )(): + if not (incl_slidenames or incl_loc): + yield image_dict['image'] + else: + to_return = { + 'image_raw': image_dict['image'] + } + if incl_slidenames: + to_return['slide'] = self.name + if incl_loc == 'coord' or incl_loc == True: + to_return['loc_x'] = image_dict['loc'][0] + to_return['loc_y'] = image_dict['loc'][1] + if incl_loc == 'grid': + to_return['loc_x'] = image_dict['grid'][0] + to_return['loc_y'] = image_dict['grid'][1] + yield to_return + + # Generate dataset from the generator + with tf.name_scope('dataset_input'): + # Signatures for imaging data + if img_format == 'numpy': + image_sig = tf.TensorSpec( + shape=(self.tile_px, self.tile_px, 3), + dtype=tf.uint8 + ) + else: + image_sig = tf.TensorSpec(shape=(), dtype=tf.string) + + # Rest of the signatures + if incl_slidenames or incl_loc: + sig = {'image_raw': image_sig} + if incl_slidenames: + sig['slide'] = tf.TensorSpec(shape=(), dtype=tf.string) + if incl_loc: + sig['loc_x'] = tf.TensorSpec(shape=(), dtype=tf.int32) + sig['loc_y'] = tf.TensorSpec(shape=(), dtype=tf.int32) + else: + sig = image_sig + + # Assemble dataset + dataset = tf.data.Dataset.from_generator( + tile_generator, + output_signature=sig + ) + + return dataset + + def torch( + self, + img_format: str = 'numpy', + incl_slidenames: bool = False, + incl_loc: Optional[str] = None, + shuffle: bool = True, + infinite: bool = False, + to_tensor: bool = True, + **kwargs + ) -> Any: + """Create a PyTorch iterator which extractes tiles from this slide. + + Args: + img_format (str, optional): Image format for returned image tiles. + Options include 'png', 'jpg', and 'numpy'. Defaults to 'numpy'. + incl_slidenames (bool, optional): Yield slide names for each + image tile. Defaults to False. + incl_loc (Optional[str], optional): Yield image tile location + with each image tile. Options include True, 'coord', or 'grid'. + If True or 'coord', will return X/Y coordinates of the tile center + in the slide's highest magnification layer. If 'grid', returns + the grid indices for the tile. Defaults to None. + shuffle (bool, optional): Shuffle image tiles. Defaults to True. + + Returns: + An iterator which yields image tiles as Torch tensors. + + Yields: + Iterator[Any]: Items yielded by the Dataset are in dictionary + format, with the keys: + + 'image_raw': Contains the image as a Tensor (jpg, png, or numpy) + 'slide': Slide name (if ``incl_slidenames=True``) + 'loc_x' Image tile center x location (if ``incl_loc`` provided) + 'loc_y' Image tile center y location (if ``incl_loc`` provided) + """ + import torch + + def tile_generator(): + while True: + for image_dict in self.build_generator( + shuffle=shuffle, + show_progress=False, + img_format=img_format, + **kwargs + )(): + if not (incl_slidenames or incl_loc): + if to_tensor: + yield torch.from_numpy(image_dict['image']) + else: + yield image_dict['image'] + else: + if to_tensor: + to_return = {'image_raw': torch.from_numpy(image_dict['image'])} + else: + to_return = {'image_raw': image_dict['image']} + if incl_slidenames: + to_return['slide'] = self.name + if incl_loc == 'coord' or incl_loc == True: + to_return['loc_x'] = image_dict['loc'][0] + to_return['loc_y'] = image_dict['loc'][1] + if incl_loc == 'grid': + to_return['loc_x'] = image_dict['grid'][0] + to_return['loc_y'] = image_dict['grid'][1] + yield to_return + if not infinite: + break + + return tile_generator() + + def verify_alignment( + self, + slide: "WSI", + mpp: float = 4 + ) -> float: + """Verify alignment to another slide by calculating MSE.""" + if not isinstance(slide, WSI): + raise TypeError("Can only align to another slide.") + + # Calculate thumbnails for alignment. + our_thumb = np.array(self.thumb(mpp=mpp)) + their_thumb = np.array(slide.thumb(mpp=mpp)) + + aligned_theirs = align_image(their_thumb, our_thumb) + + theirs_gray = cv2.cvtColor(aligned_theirs, cv2.COLOR_BGR2GRAY) + ours_gray = cv2.cvtColor(our_thumb, cv2.COLOR_BGR2GRAY) + + return compute_alignment_mse(theirs_gray, ours_gray) + + def view(self): + """Open the slide in Slideflow Studio for interactive display. + + See :ref:`studio` for more information. + + """ + from slideflow.studio import Studio + + studio = Studio() + studio.load_slide(self) + studio.run()
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/stats/metrics/index.html b/docs/_modules/slideflow/stats/metrics/index.html new file mode 100644 index 000000000..1c049c2ab --- /dev/null +++ b/docs/_modules/slideflow/stats/metrics/index.html @@ -0,0 +1,1407 @@ + + + + + + + + + + + + slideflow.stats.metrics — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.stats.metrics

+"""Classification, regression, and survival metrics for predictions."""
+
+import multiprocessing as mp
+import warnings
+import numpy as np
+import pandas as pd
+from pandas.core.frame import DataFrame
+from sklearn import metrics
+from os.path import join
+from types import SimpleNamespace
+from typing import (TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union,
+                    Callable)
+
+
+import slideflow as sf
+from slideflow import errors
+from slideflow.util import log
+
+from .delong import delong_roc_variance
+from .concordance import concordance_index as c_index
+
+if TYPE_CHECKING:
+    import neptune.new as neptune
+    import tensorflow as tf
+    import torch
+
+
+class ClassifierMetrics:
+    def __init__(self, y_true, y_pred, neptune_run=None, autofit=True):
+        self.y_true = y_true
+        self.y_pred = y_pred
+        self.neptune_run = neptune_run
+
+        self.fpr = None
+        self.tpr = None
+        self.threshold = None
+        self.auroc = None
+        self.precision = None
+        self.recall = None
+        self.ap = None
+
+        if autofit:
+            self.roc_fit()
+            self.prc_fit()
+
+    def roc_fit(self):
+        self.fpr, self.tpr, self.threshold = metrics.roc_curve(
+            self.y_true,
+            self.y_pred
+        )
+        self.auroc = metrics.auc(self.fpr, self.tpr)
+        try:
+            max_youden = max(zip(self.tpr, self.fpr), key=lambda x: x[0]-x[1])
+            opt_thresh_index = list(zip(self.tpr, self.fpr)).index(max_youden)
+            self.opt_thresh = self.threshold[opt_thresh_index]
+        except Exception:
+            self.opt_thresh = None
+
+    def auroc_ci(self, alpha=0.05):
+        from scipy import stats
+        delong_auc, auc_cov = delong_roc_variance(self.y_true, self.y_pred)
+        auc_std = np.sqrt(auc_cov)
+        lower_upper_q = np.abs(np.array([0, 1]) - alpha / 2)
+        ci = stats.norm.ppf(lower_upper_q, loc=delong_auc, scale=auc_std)
+        ci[ci > 1] = 1
+        return tuple(ci)
+
+    def auroc_pval(self, mu=0.5, alpha=0.05):
+        from scipy.stats import norm
+        lo, up = self.auroc_ci(alpha=alpha)
+        se = (up - lo) / (2 * 1.96)
+        z = (self.auroc - mu) / se
+        return 2 * norm.cdf(-abs(z))
+
+    def prc_fit(self):
+        self.precision, self.recall, _ = metrics.precision_recall_curve(
+            self.y_true,
+            self.y_pred
+        )
+        self.ap = metrics.average_precision_score(self.y_true, self.y_pred)
+
+    def save_roc(self, outdir, name):
+        with sf.util.matplotlib_backend('Agg'):
+            import matplotlib.pyplot as plt
+
+            auroc_str = 'NA' if not self.auroc else f'{self.auroc:.2f}'
+            sf.stats.plot.roc(self.fpr, self.tpr, f'AUC = {auroc_str}')
+            full_path = join(outdir, f'{name}.png')
+            plt.savefig(full_path)
+            plt.close()
+            if self.neptune_run:
+                self.neptune_run[f'results/graphs/{name}'].upload(full_path)
+
+    def save_prc(self, outdir, name):
+        with sf.util.matplotlib_backend('Agg'):
+            import matplotlib.pyplot as plt
+
+            ap_str = 'NA' if not self.ap else f'{self.ap:.2f}'
+            sf.stats.plot.prc(self.precision, self.recall, label=f'AP = {ap_str}')
+            full_path = join(outdir, f'{name}.png')
+            plt.savefig(full_path)
+            plt.close()
+            if self.neptune_run:
+                self.neptune_run[f'results/graphs/{name}'].upload(full_path)
+
+
+def _assert_model_type(model_type: str) -> None:
+    """Raises a ValueError if the model type is invalid."""
+    if model_type not in ('classification', 'regression', 'survival'):
+        raise ValueError(f"Unrecognized model_type {model_type}, must be "
+                         "'classification', 'regression', or 'survival'")
+
+
+def _generate_tile_roc(
+    yt_and_yp: Tuple[np.ndarray, np.ndarray],
+    neptune_run: Optional["neptune.Run"] = None
+) -> ClassifierMetrics:
+    """Generate tile-level ROC. Defined separately for multiprocessing.
+
+    Args:
+        yt_and_yp (Tuple[np.ndarray, np.ndarray]): y_true and y_pred.
+        neptune_run (neptune.Run, optional): Neptune run. Defaults to None.
+
+    Returns:
+        ClassifierMetrics: Contains metrics (AUROC, AP).
+    """
+    y_true, y_pred = yt_and_yp
+    class_metrics = ClassifierMetrics(y_true, y_pred, neptune_run=neptune_run)
+    return class_metrics
+
+
+def _merge_metrics(metrics_by_level: Dict[str, Dict]) -> Dict[str, Dict]:
+    """Merge dictionary of levels into a dictionary by metric.
+
+    Function accepts a dictionary organized as such:
+
+    {
+        'tile':  {'auc': [...], 'ap': [...]},
+        'slide': {'auc': [...], 'ap': [...]},
+        ...
+    }
+
+    and converts it to:
+
+    {
+        'auc': {'tile': [...], 'slide': [...]},
+        'ap':  {'tile': [...], 'slide': [...]},
+        ...
+    }
+    """
+    levels = list(metrics_by_level.keys())
+    metrics = list(metrics_by_level[levels[0]].keys())
+    return {
+        metric: {
+            level: metrics_by_level[level][metric]
+            for level in levels
+        } for metric in metrics
+    }
+
+
+def basic_metrics(y_true: np.ndarray, y_pred: np.ndarray) -> Dict[str, float]:
+    """Generates metrics, including sensitivity, specificity, and accuracy.
+
+    Args:
+        y_true (np.ndarray): True labels.
+        y_pred (np.ndarray): Predictions.
+
+    Returns:
+        Dict[str, float]: Dict with metrics including accuracy, sensitivity,
+        specificity, precision, recall, f1_score, and kappa.
+    """
+    assert(len(y_true) == len(y_pred))
+    assert([y in (0, 1) for y in y_true])
+    assert([y in (0, 1) for y in y_pred])
+
+    TP = 0  # True positive
+    TN = 0  # True negative
+    FP = 0  # False positive
+    FN = 0  # False negative
+
+    for i, yt in enumerate(y_true):
+        yp = y_pred[i]
+        if yt == 1 and yp == 1:
+            TP += 1
+        elif yt == 1 and yp == 0:
+            FN += 1
+        elif yt == 0 and yp == 1:
+            FP += 1
+        elif yt == 0 and yp == 0:
+            TN += 1
+
+    results = {}
+    results['accuracy'] = (TP + TN) / (TP + TN + FP + FN)
+    results['sensitivity'] = TP / (TP + FN)
+    results['specificity'] = TN / (TN + FP)
+    results['precision'] = metrics.precision_score(y_true, y_pred)
+    results['recall'] = metrics.recall_score(y_true, y_pred)
+    results['f1_score'] = metrics.f1_score(y_true, y_pred)
+    results['kappa'] = metrics.cohen_kappa_score(y_true, y_pred)
+    return results
+
+
+def classification_metrics(
+    df: DataFrame,
+    label: str = '',
+    level: str = 'tile',
+    data_dir: Optional[str] = '',
+    neptune_run: Optional["neptune.Run"] = None
+) -> Dict[str, Dict[str, float]]:
+    """Generates categorical metrics (AUC/AP) from a set of predictions.
+
+    Args:
+        df (pd.DataFrame): Pandas DataFrame containing labels, predictions,
+            and optionally uncertainty, as returned by sf.stats.df_from_pred()
+
+    Keyword args:
+        label (str, optional): Label prefix/suffix for ROCs.
+            Defaults to an empty string.
+        level (str, optional): Group-level for the predictions. Used for
+            labeling plots. Defaults to 'tile'.
+        data_dir (str, optional): Path to data directory for saving plots.
+            If None, plots are not saved. Defaults to the current directory.
+
+    Returns:
+        Dict containing metrics, with the keys 'auc' and 'ap'.
+    """
+
+    label_start = "" if label == '' else f"{label}_"
+
+    # Detect the number of outcomes and confirm that the number of outcomes
+    # match the provided outcome names
+    outcome_names = [c[:-8] for c in df.columns if c.endswith('-y_pred0')]
+
+    if not len(outcome_names):
+        raise errors.StatsError("No outcomes detected from dataframe.")
+
+    all_auc = {outcome: [] for outcome in outcome_names}  # type: Dict
+    all_ap = {outcome: [] for outcome in outcome_names}  # type: Dict
+
+    def y_true_onehot(_df, i):
+        return (_df.y_true == i).astype(int)
+
+    def y_pred_onehot(_df, i):
+        return (_df.y_pred_cat == i).astype(int)
+
+    # Perform analysis separately for each outcome column
+    for outcome in outcome_names:
+        outcome_cols = [c for c in df.columns if c.startswith(f'{outcome}-')]
+
+        # Remove the outcome name from the dataframe temporarily
+        outcome_df = df[outcome_cols].rename(
+            columns={
+                orig_col: orig_col.replace(f'{outcome}-', '', 1)
+                for orig_col in outcome_cols
+            }
+        )
+        log.info(f"Validation metrics for outcome [green]{outcome}[/]:")
+        y_pred_cols = [c for c in outcome_df.columns if c.startswith('y_pred')]
+        num_cat = len(y_pred_cols)
+        if not num_cat:
+            raise errors.StatsError(
+                f"Could not find predictions column for outcome {outcome}"
+            )
+
+        # Sort the prediction columns so that argmax will work as expected
+        y_pred_cols = [f'y_pred{i}' for i in range(num_cat)]
+        if len(y_pred_cols) != num_cat:
+            raise errors.StatsError(
+                "Malformed dataframe, unable to find all prediction columns"
+            )
+        if not all(col in outcome_df.columns for col in y_pred_cols):
+            raise errors.StatsError("Malformed dataframe, invalid column names")
+
+        # Convert to one-hot encoding
+        outcome_df['y_pred_cat'] = outcome_df[y_pred_cols].values.argmax(1)
+
+        log.debug(f"Calculating metrics with a thread pool")
+        p = mp.dummy.Pool(8)
+        yt_and_yp = [
+            ((outcome_df.y_true == i).astype(int), outcome_df[f'y_pred{i}'])
+            for i in range(num_cat)
+        ]
+        try:
+            for i, fit in enumerate(p.imap(_generate_tile_roc, yt_and_yp)):
+                if data_dir is not None:
+                    fit.save_roc(data_dir, f"{label_start}{outcome}_{level}_ROC{i}")
+                    fit.save_prc(data_dir, f"{label_start}{outcome}_{level}_PRC{i}")
+                all_auc[outcome] += [fit.auroc]
+                all_ap[outcome] += [fit.ap]
+                auroc_str = 'NA' if not fit.auroc else f'{fit.auroc:.3f}'
+                ap_str = 'NA' if not fit.ap else f'{fit.ap:.3f}'
+                thresh = 'NA' if not fit.opt_thresh else f'{fit.opt_thresh:.3f}'
+                log.info(
+                    f"{level}-level AUC (cat #{i:>2}): {auroc_str} "
+                    f"AP: {ap_str} (opt. threshold: {thresh})"
+                )
+        except ValueError as e:
+            # Occurs when predictions contain NaN
+            log.error(f'Error encountered when generating AUC: {e}')
+            all_auc[outcome] = -1
+            all_ap[outcome] = -1
+        p.close()
+
+        # Calculate tile-level accuracy.
+        # Category-level accuracy is determined by comparing
+        # one-hot predictions to one-hot y_true.
+        for i in range(num_cat):
+            try:
+                yt_in_cat =  y_true_onehot(outcome_df, i)
+                n_in_cat = yt_in_cat.sum()
+                correct = y_pred_onehot(outcome_df.loc[yt_in_cat == 1], i).sum()
+                category_accuracy = correct / n_in_cat
+                perc = category_accuracy * 100
+                log.info(f"Category {i} acc: {perc:.1f}% ({correct}/{n_in_cat})")
+            except IndexError:
+                log.warning(f"Error with category accuracy for cat # {i}")
+    return {
+        'auc': all_auc,
+        'ap': all_ap,
+    }
+
+
+def concordance_index(y_true: np.ndarray, y_pred: np.ndarray) -> float:
+    '''Calculates concordance index from a given y_true and y_pred.'''
+    E = y_pred[:, -1]
+    y_pred = y_pred[:, :-1]
+    y_pred = y_pred.flatten()
+    E = E.flatten()
+    y_true = y_true.flatten()
+    return c_index(y_true, y_pred, E)
+
+
+def survival_metrics(
+    df: DataFrame,
+    level: str = 'tile',
+    label: str = '',
+    data_dir: str = '',
+    neptune_run: Optional["neptune.Run"] = None
+) -> Dict[str, float]:
+    """Generates survival metrics (concordance index) from a set of predictions.
+
+    Args:
+        df (pd.DataFrame): Pandas DataFrame containing labels, predictions,
+            and optionally uncertainty, as returned by sf.stats.df_from_pred().
+            The dataframe columns should be appropriately named using
+            sf.stats.name_columns().
+
+    Keyword args:
+        label (str, optional): Label prefix/suffix for ROCs.
+            Defaults to an empty string.
+        level (str, optional): Group-level for the predictions. Used for
+            labeling plots. Defaults to 'tile'.
+        data_dir (str, optional): Path to data directory for saving plots.
+            Defaults to None.
+
+    Returns:
+        Dict containing metrics, with the key 'c_index'.
+    """
+    survival_cols = ('time-y_true', 'time-y_pred', 'event-y_true')
+    if any(c not in df.columns for c in survival_cols):
+        raise ValueError(
+            "Improperly formatted dataframe to survival_metrics(), "
+            f"must have columns {survival_cols}. Got: {list(df.columns)}"
+        )
+
+    # Calculate metrics
+    try:
+        c_index = concordance_index(
+            df['time-y_true'].values,
+            df[['time-y_pred', 'event-y_true']].values,
+        )
+        c_str = 'NA' if not c_index else f'{c_index:.3f}'
+        log.info(f"C-index ({level}-level): {c_str}")
+    except ZeroDivisionError as e:
+        log.error(f"Error calculating concordance index: {e}")
+        c_index = -1
+    return {
+        'c_index': c_index
+    }
+
+
+
[docs]def df_from_pred( + y_true: Optional[List[Any]], + y_pred: List[Any], + y_std: Optional[List[Any]], + tile_to_slides: Union[List, np.ndarray], + locations: Optional[Union[List, np.ndarray]] = None +) -> DataFrame: + """Converts arrays of model predictions to a pandas dataframe. + + Args: + y_true (list(np.ndarray)): List of y_true numpy arrays, one array for + each outcome. For continuous outcomes, the length of the outer + list should be one, and the second shape dimension of the numpy + array should be the number of continuous outcomes. + y_pred (list(np.ndarray)): List of y_pred numpy arrays, one array for + each outcome. For continuous outcomes, the length of the outer + list should be one, and the second shape dimension of the numpy + array should be the number of continuous outcomes. + y_std (list(np.ndarray)): List of uncertainty numpy arrays, formatted + in the same way as y_pred. + tile_to_slides (np.ndarray): Array of slide names for each tile. Length + should match the numpy arrays in y_true, y_pred, and y_std. + + Returns: + DataFrame: DataFrame of predictions. + """ + len_err_msg = "{} must be a list of length equal to number of outcomes" + if y_true is not None and not isinstance(y_true, (list, tuple)): + raise ValueError(len_err_msg.format('y_true')) + if y_true is not None and not len(y_true) == len(y_pred): + raise ValueError('Length of y_pred and y_true must be equal') + if not isinstance(y_pred, (list, tuple)): + raise ValueError(len_err_msg.format('y_pred')) + if y_std is not None and not isinstance(y_std, (list, tuple)): + raise ValueError(len_err_msg.format('y_std')) + if y_std is not None and len(y_std) != len(y_pred): + raise ValueError('If y_std is provided, length must equal y_pred') + if locations is not None and len(locations) != len(tile_to_slides): + raise ValueError( + 'If locations is provided, length must equal tile_to_slides ' + f'(got: {len(locations)} and {len(tile_to_slides)})') + + n_outcomes = len(y_pred) + series = { + 'slide': pd.Series(tile_to_slides) + } + if locations is not None: + if not isinstance(locations, np.ndarray): + locations = np.array(locations) + series.update({ + 'loc_x': locations[:, 0], + 'loc_y': locations[:, 1] + }) + # Iterate through each outcome in y_pred + for oi in range(n_outcomes): + # Add y_pred columns + series.update({ + f'out{oi}-y_pred{n}': y_pred[oi][:, n] + for n in range(y_pred[oi].shape[1]) + }) + # Add y_true columns + if y_true is not None: + if len(y_true[oi].shape) == 1: + series.update({ + f'out{oi}-y_true': y_true[oi] + }) + else: + series.update({ + f'out{oi}-y_true{n}': y_true[oi][:, n] + for n in range(y_true[oi].shape[1]) + }) + # Add uncertainty columns + if y_std is not None: + series.update({ + f'out{oi}-uncertainty{n}': y_std[oi][:, n] + for n in range(y_std[oi].shape[1]) + }) + return pd.DataFrame(series)
+ + +def eval_from_dataset(*args, **kwargs): + warnings.warning( + "`sf.stats.metrics.eval_from_dataset() is deprecated. Please use " + "`sf.stats.metrics.eval_dataset()` instead.", + DeprecationWarning) + return eval_dataset(*args, **kwargs) + + +
[docs]def eval_dataset( + model: Union["tf.keras.Model", "torch.nn.Module"], + dataset: Union["tf.data.Dataset", "torch.utils.data.DataLoader"], + model_type: str, + num_tiles: int = 0, + uq: bool = False, + uq_n: int = 30, + reduce_method: Union[str, Callable] = 'average', + patients: Optional[Dict[str, str]] = None, + outcome_names: Optional[List[str]] = None, + loss: Optional[Callable] = None, + torch_args: Optional[SimpleNamespace] = None, +) -> Tuple[DataFrame, float, float]: + """Generates predictions and accuracy/loss from a given model and dataset. + + Args: + model (str): Path to PyTorch model. + dataset (tf.data.Dataset): PyTorch dataloader. + model_type (str, optional): 'classification', 'regression', or 'survival'. + If multiple continuous outcomes are present, y_true is stacked into a + single vector for each image. Defaults to 'classification'. + num_tiles (int, optional): Used for progress bar with Tensorflow. + Defaults to 0. + uq_n (int, optional): Number of forward passes to perform + when calculating MC Dropout uncertainty. Defaults to 30. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + patients (dict, optional): Dictionary mapping slide names to patient + names. Required for generating patient-level metrics. + outcome_names (list, optional): List of str, names for outcomes. + Defaults to None (outcomes will not be named). + torch_args (namespace): Used for PyTorch models. Namespace containing + num_slide_features, slide_input, update_corrects, and + update_loss functions. + + Returns: + pd.DataFrame, accuracy, loss + """ + if model_type != 'classification' and reduce_method == 'proportion': + raise ValueError( + f'Reduction method {reduce_method} incompatible with ' + f'model_type {model_type}' + ) + if sf.model.is_tensorflow_model(model): + from slideflow.model import tensorflow_utils + df, acc, total_loss = tensorflow_utils.eval_from_model( + model, + dataset, + model_type, + loss=loss, + num_tiles=num_tiles, + uq=uq, + uq_n=uq_n, + ) + else: + from slideflow.model import torch_utils + df, acc, total_loss = torch_utils.eval_from_model( + model, + dataset, + model_type, + torch_args=torch_args, + uq=uq, + uq_n=uq_n, + ) + + if outcome_names or model_type == 'survival': + df = name_columns(df, model_type, outcome_names) + dfs = group_reduce(df, method=reduce_method, patients=patients) + return dfs, acc, total_loss
+ + +
[docs]def group_reduce( + df: DataFrame, + method: Union[str, Callable] = 'average', + patients: Optional[Dict[str, str]] = None +) -> Dict[str, DataFrame]: + """Reduces tile-level predictions to group-level predictions. + + Args: + df (DataFrame): Tile-level predictions. + method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical outcomes. + Options include 'average', 'mean', 'proportion', 'median', 'sum', + 'min', 'max', or a callable function. 'average' and 'mean' are + synonymous, with both options kept for backwards compatibility. If + 'average' or 'mean', will reduce with average of each logit across + tiles. If 'proportion', will convert tile predictions into onehot + encoding then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + patients (dict, optional): Dictionary mapping slide names to patient + names. Required for generating patient-level metrics. + """ + # Validation. + if (method not in [ + 'average', 'proportion', 'mean', 'median', 'sum', 'min', 'max' + ] and not callable(method)): + raise ValueError( + f"Unknown method {method}. Expected 'average', 'proportion', " + "'mean', 'median', 'sum', 'min', 'max', or a callable function." + ) + log.debug(f"Using reduce_method={method}") + + if patients is not None: + df['patient'] = df['slide'].map(patients) + groups = ['slide', 'patient'] + else: + groups = ['slide'] + + group_dfs = { + 'tile': df + } + _df = df[[c for c in df.columns if c not in ('loc_x', 'loc_y')]].copy() + if method == 'proportion': + outcome_names = [c[:-8] for c in df.columns if c.endswith('-y_pred0')] + if not len(outcome_names): + raise errors.StatsError("No outcomes detected from dataframe.") + for outcome in outcome_names: + y_pred_cols = [c for c in df.columns + if c.startswith(f"{outcome}-y_pred")] + num_cat = len(y_pred_cols) + if not num_cat: + raise errors.StatsError( + f"Could not find predictions column for outcome {outcome}" + ) + if num_cat != df[f'{outcome}-y_true'].max()+1: + raise errors.StatsError( + "Model predictions have a different number of outcome " + f"categories ({df[f'{outcome}-y_true'].max()+1}) " + f"than provided annotations ({num_cat})" + ) + y_pred_cols = [f'{outcome}-y_pred{i}' for i in range(num_cat)] + if len(y_pred_cols) != num_cat: + raise errors.StatsError( + "Malformed dataframe, unable to find all prediction columns" + ) + if not all(col in df.columns for col in y_pred_cols): + raise errors.StatsError( + "Malformed dataframe, invalid column names" + ) + + outcome_pred_cat = df[y_pred_cols].values.argmax(1) + for i in range(num_cat): + _df[f'{outcome}-y_pred{i}'] = (outcome_pred_cat == i).astype(int) + + # Both 'average' and 'proportion' methods perform the same reduction, + # so we can use the same method for both + if method in ('average', 'proportion'): + method = 'mean' + + def _apply_reduce(_df, method, group): + nonlocal groups + if method in ['mean', 'median', 'sum', 'min', 'max']: + return _df.groupby(group, as_index=False).agg(method, numeric_only=True) + elif callable(method): + _numeric = _df.drop(columns=[g for g in groups if g != group]) + return _numeric.groupby(group, as_index=False).agg(method) + else: + raise ValueError(f"Unknown method {method}") + + + for group in groups: + group_dfs.update({ + group: _apply_reduce(_df, method, group) + }) + + return group_dfs
+ + +def regression_metrics( + df: DataFrame, + label: str = '', + level: str = 'tile', + data_dir: str = '', + neptune_run: Optional["neptune.Run"] = None +) -> Dict[str, List[float]]: + """Generates metrics (R^2, coefficient of determination) from predictions. + + Args: + df (pd.DataFrame): Pandas DataFrame containing labels, predictions, + and optionally uncertainty, as returned by sf.stats.df_from_pred() + + Keyword args: + label (str, optional): Label prefix/suffix for ROCs. + Defaults to an empty string. + level (str, optional): Group-level for the predictions. Used for + labeling plots. Defaults to 'tile'. + data_dir (str, optional): Path to data directory for saving. + Defaults to None. + neptune_run (:class:`neptune.Run`, optional): Neptune run in which to + log results. Defaults to None. + + Returns: + Dict containing metrics, with the key 'r_squared'. + """ + + label_end = "" if label == '' else f"_{label}" + + # Detect the outcome names + outcome_names = [c[:-7] for c in df.columns if c.endswith('-y_pred')] + _outcomes_by_true = [c[:-7] for c in df.columns if c.endswith('-y_true')] + if ((sorted(outcome_names) != sorted(_outcomes_by_true)) + or not len(outcome_names)): + raise ValueError("Improperly formatted dataframe to regression_metrics(); " + "could not detect outcome names. Ensure that " + "prediction columns end in '-y_pred' and ground-truth " + "columns end in '-y_true'. Try setting column names " + "with slideflow.stats.name_columns(). " + f"DataFrame columns: {list(df.columns)}") + + # Calculate metrics + y_pred_cols = [f'{o}-y_pred' for o in outcome_names] + y_true_cols = [f'{o}-y_true' for o in outcome_names] + r_squared = sf.stats.plot.scatter( + df[y_true_cols].values, + df[y_pred_cols].values, + data_dir, + f"{label_end}_by_{level}", + neptune_run=neptune_run + ) + + # Show results + for o, r in zip(outcome_names, r_squared): + r_str = "NA" if not r else f'{r:.3f}' + log.info(f"[green]{o}[/]: R-squared ({level}-level): {r_str}") + + return { + 'r_squared': r_squared, + } + + +
[docs]def metrics_from_dataset( + model: Union["tf.keras.Model", "torch.nn.Module"], + model_type: str, + patients: Dict[str, str], + dataset: Union["tf.data.Dataset", "torch.utils.data.DataLoader"], + num_tiles: int = 0, + outcome_names: Optional[List[str]] = None, + reduce_method: Union[str, Callable] = 'average', + label: str = '', + save_predictions: Union[str, bool] = False, + data_dir: str = '', + uq: bool = False, + loss: Optional[Callable] = None, + torch_args: Optional[SimpleNamespace] = None, + **kwargs +) -> Tuple[Dict, float, float]: + + """Evaluate performance of a given model on a given TFRecord dataset, + generating a variety of statistical outcomes and graphs. + + Args: + model (tf.keras.Model or torch.nn.Module): Keras/Torch model to eval. + model_type (str): 'classification', 'regression', or 'survival'. + patients (dict): Dictionary mapping slidenames to patients. + dataset (tf.data.Dataset or torch.utils.data.DataLoader): Dataset. + num_tiles (int, optional): Number of total tiles expected in dataset. + Used for progress bar. Defaults to 0. + + Keyword args: + outcome_names (list, optional): List of str, names for outcomes. + Defaults to None. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + label (str, optional): Label prefix/suffix for saving. + Defaults to None. + save_predictions (bool, optional): Save tile, slide, and patient-level + predictions to CSV. Defaults to True. + data_dir (str): Path to data directory for saving. + Defaults to empty string (current directory). + neptune_run (:class:`neptune.Run`, optional): Neptune run in which to + log results. Defaults to None. + + Returns: + metrics [dict], accuracy [float], loss [float] + """ + _assert_model_type(model_type) + dfs, acc, total_loss = eval_dataset( + model, + dataset, + model_type, + uq=uq, + loss=loss, + num_tiles=num_tiles, + patients=patients, + outcome_names=outcome_names, + reduce_method=reduce_method, + torch_args=torch_args, + ) + + # Save predictions + if save_predictions: + if isinstance(save_predictions, str): + fmt_kw = dict(format=save_predictions) + else: + fmt_kw = {} # type: ignore + save_dfs(dfs, outdir=data_dir, label=label, **fmt_kw) + + # Calculate metrics + def metrics_by_level(metrics_function): + return _merge_metrics({ + level: metrics_function( + _df, + level=level, + data_dir=data_dir, + label=label, + **kwargs + ) for level, _df in dfs.items() + }) + + if model_type == 'classification': + metrics = metrics_by_level(classification_metrics) + elif model_type == 'regression': + metrics = metrics_by_level(regression_metrics) + else: + metrics = metrics_by_level(survival_metrics) + + log.debug(f'Metrics generation complete.') + return metrics, acc, total_loss
+ + +
[docs]def name_columns( + df: DataFrame, + model_type: str, + outcome_names: Optional[List[str]] = None +): + """Renames columns in a DataFrame to correspond to the given outcome names. + + Assumes the DataFrame supplied was generated by sf.stats.df_from_pred(). + + Args: + df (DataFrame): DataFrame from sf.stats.df_from_pred(), containing + predictions and labels. + model_type (str): Type of model ('classification', 'regression', or 'survival'). + outcome_names (list(str)), optional): Outcome names to apply to the + DataFrame. If this is from a survival model, the standard names "time" + and "event" will be used. + + Raises: + ValueError: If outcome_names are not supplied and it is not a survival model. + errors.StatsError: If the length of outcome_names is incompatible + with the DataFrame. + + Returns: + DataFrame: DataFrame with renamed columns. + """ + _assert_model_type(model_type) + + if outcome_names is None and model_type != 'survival': + raise ValueError("Must supply outcome names for classification " + "or regression models.") + if (not isinstance(outcome_names, (list, tuple)) + and outcome_names is not None): + outcome_names = [outcome_names] + + if model_type == 'classification' and outcome_names is not None: + # Update dataframe column names with outcome names + outcome_cols_to_replace = {} + for oi, outcome in enumerate(outcome_names): + outcome_cols_to_replace.update({ + c: c.replace(f'out{oi}', outcome) + for c in df.columns + if c.startswith(f'out{oi}-') + }) + df = df.rename(columns=outcome_cols_to_replace) + + elif model_type == 'regression': + n_outcomes = len([c for c in df.columns if c.startswith('out0-y_pred')]) + if not outcome_names: + outcome_names = [f"Outcome {i}" for i in range(n_outcomes)] + elif len(outcome_names) != n_outcomes: + raise errors.StatsError( + f"Number of outcome names {len(outcome_names)} does not " + f"match y_true {n_outcomes}" + ) + + # Rename columns + outcome_cols_to_replace = {} + def replace_dict(target, oi, ending_not_needed=False): + return { + c: f'{outcome}-{target}' + for c in df.columns + if c.startswith(f'out0-{target}') and (c.endswith(str(oi)) + or ending_not_needed) + } + for oi, outcome in enumerate(outcome_names): + outcome_cols_to_replace.update(replace_dict( + 'y_true', oi, ending_not_needed=(len(outcome_names) == 1) + )) + outcome_cols_to_replace.update(replace_dict('y_pred', oi)) + outcome_cols_to_replace.update(replace_dict('uncertainty', oi)) + df = df.rename(columns=outcome_cols_to_replace) + + else: + df = df.rename(columns={ + 'out0-y_pred0': 'time-y_pred', + 'out0-y_pred1': 'event-y_true', + 'out0-y_true0': 'time-y_true', + + }) + return df
+ + +def predict_from_dataset(*args, **kwargs): + warnings.warning( + "`sf.stats.metrics.predict_from_dataset() is deprecated. Please use " + "`sf.stats.metrics.predict_dataset()` instead.", + DeprecationWarning) + return predict_dataset(*args, **kwargs) + + +
[docs]def predict_dataset( + model: Union["tf.keras.Model", "torch.nn.Module"], + dataset: Union["tf.data.Dataset", "torch.utils.data.DataLoader"], + model_type: str, + num_tiles: int = 0, + uq: bool = False, + uq_n: int = 30, + reduce_method: Union[str, Callable] = 'average', + patients: Optional[Dict[str, str]] = None, + outcome_names: Optional[List[str]] = None, + torch_args: Optional[SimpleNamespace] = None, +) -> Dict[str, DataFrame]: + """Generates predictions from model and dataset. + + Args: + model (str): Path to PyTorch model. + dataset (tf.data.Dataset): PyTorch dataloader. + model_type (str, optional): 'classification', 'regression', or 'survival'. + If multiple continuous outcomes are present, y_true is stacked into a + single vector for each image. Defaults to 'classification'. + num_tiles (int, optional): Used for progress bar with Tensorflow. + Defaults to 0. + uq_n (int, optional): Number of forward passes to perform + when calculating MC Dropout uncertainty. Defaults to 30. + reduce_method (str, optional): Reduction method for calculating + slide-level and patient-level predictions for categorical + outcomes. Options include 'average', 'mean', 'proportion', + 'median', 'sum', 'min', 'max', or a callable function. + 'average' and 'mean' are synonymous, with both options kept + for backwards compatibility. If 'average' or 'mean', will + reduce with average of each logit across tiles. If + 'proportion', will convert tile predictions into onehot encoding + then reduce by averaging these onehot values. For all other + values, will reduce with the specified function, applied via + the pandas ``DataFrame.agg()`` function. Defaults to 'average'. + patients (dict, optional): Dictionary mapping slide names to patient + names. Required for generating patient-level metrics. + outcome_names (list, optional): List of str, names for outcomes. + Defaults to None (outcomes will not be named). + torch_args (namespace): Used for PyTorch backend. Namespace containing + num_slide_features and slide_input. + + Returns: + Dict[str, pd.DataFrame]: Dictionary with keys 'tile', 'slide', and + 'patient', and values containing DataFrames with tile-, slide-, + and patient-level predictions. + """ + if model_type != 'classification' and reduce_method == 'proportion': + raise ValueError( + f'Reduction method {reduce_method} incompatible with ' + f'model_type {model_type}' + ) + + if sf.model.is_tensorflow_model(model): + from slideflow.model import tensorflow_utils + df = tensorflow_utils.predict_from_model( + model, + dataset, + num_tiles=num_tiles, + uq=uq, + uq_n=uq_n, + ) + else: + from slideflow.model import torch_utils + df = torch_utils.predict_from_model( + model, + dataset, + model_type, + torch_args=torch_args, + uq=uq, + uq_n=uq_n, + ) + if outcome_names is not None or model_type == 'survival': + df = name_columns(df, model_type, outcome_names) + return group_reduce(df, method=reduce_method, patients=patients)
+ + +def save_dfs( + dfs: Dict[str, DataFrame], + format: str = 'parquet', + outdir: str = '', + label: str = '' +) -> None: + """Save DataFrames of predictions to files.""" + label_end = f'_{label}' if label else '' + for level, _df in dfs.items(): + path = join(outdir, f"{level}_predictions{label_end}") + + # Convert half-floats to float32 + half_floats = _df.select_dtypes(include='float16') + _df[half_floats.columns] = half_floats.astype('float32') + + if format == 'csv': + _df.to_csv(path+'.csv') + elif format == 'feather': + try: + import pyarrow.feather as feather + except ImportError: + raise ImportError("Saving to a feather file requires the package `pyarrow`. " + "Please install with `pip install pyarrow`") + feather.write_feather(_df, path+'.feather') + else: + _df.to_parquet(path+'.parquet.gzip', compression='gzip') +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/stats/slidemap/index.html b/docs/_modules/slideflow/stats/slidemap/index.html new file mode 100644 index 000000000..62fb68bcd --- /dev/null +++ b/docs/_modules/slideflow/stats/slidemap/index.html @@ -0,0 +1,1549 @@ + + + + + + + + + + + + slideflow.stats.slidemap — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.stats.slidemap

+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
+
+import os
+import pickle
+import numpy as np
+import pandas as pd
+import slideflow as sf
+import warnings
+from os.path import join, exists, isdir
+from pandas.core.frame import DataFrame
+from sklearn.cluster import KMeans
+from slideflow import errors
+from slideflow.stats import stats_utils
+from slideflow.util import log
+
+if TYPE_CHECKING:
+    import umap
+    from matplotlib.axes import Axes
+    from matplotlib.figure import Figure
+    from slideflow.model import DatasetFeatures
+
+
+
[docs]class SlideMap: + """Two-dimensional slide map for visualization & backend for mosaic maps. + + Slides are mapped in 2D either explicitly with pre-specified coordinates, + or with dimensionality reduction from post-convolutional layer weights, + provided from :class:`slideflow.DatasetFeatures`. + """ + + def __init__( + self, + *, + parametric_umap: bool = False + ) -> None: + """Backend for mapping slides into two dimensional space. Can use a + DatasetFeatures object to map slides according to UMAP of features, or + map according to pre-specified coordinates. + + Can be initialized with three methods: from precalculated X/Y + coordinates, from a DatasetFeatures object, or from a saved map. + + Examples + Build a SlideMap from a DatasetFeatures object + + .. code-block:: python + + dts_ftrs = sf.DatasetFeatures(model, dataset) + slidemap = sf.SlideMap.from_features(dts_ftrs) + + Build a SlideMap from prespecified coordinates + + .. code-block:: python + + x = np.array(...) + y = np.array(...) + slides = ['slide1', 'slide1', 'slide5', ...] + slidemap = sf.SlideMap.from_xy( + x=x, y=y, slides=slides + ) + + Load a saved SlideMap + + .. code-block:: python + + slidemap = sf.SlideMap.load('map.parquet') + + Args: + slides (list(str)): List of slide names + """ + assert isinstance(parametric_umap, bool), "Expected <bool> for argument 'parametric_umap'" + self.data = None # type: DataFrame + self.ftrs = None # type: Optional[DatasetFeatures] + self.slides = None # type: List[str] + self.tfrecords = None # type: List[str] + self.parametric_umap = parametric_umap + self._umap_normalized_range = None + self.map_meta = {} # type: Dict[str, Any] + + @classmethod + def load(cls, path: str): + """Load a previously saved SlideMap (UMAP and coordinates). + + Loads a ``SlideMap`` previously saved with ``SlideMap.save()``. + + Expects a directory with ``slidemap.parquet``, ``range_clip.npz``, + and either ``umap.pkl`` (non-parametric models) or a folder named + ``parametric_model``. + + Examples + Save a SlideMap, then load it. + + .. code-block:: python + + slidemap.save('/directory/') + new_slidemap = sf.SlideMap.load('/directory/') + + Args: + path (str): Directory from which to load a previously saved UMAP. + + """ + log.debug(f"Loading SlideMap from {path}") + obj = cls() + if isdir(path): + # Load coordinates + if exists(join(path, 'slidemap.parquet')): + obj.load_coordinates(join(path, 'slidemap.parquet')) + else: + log.warn("Could not find slidemap.parquet; no data loaded.") + # Load UMAP + if exists(join(path, 'parametric_model')): + obj.parametric_umap = True + obj.load_umap(path) + elif exists(join(path, 'umap.pkl')): + obj.load_umap(join(path, 'umap.pkl')) + else: + log.warn(f"Could not find a valid umap model at {path}. Ensure " + "the path is a valid directory with either 'parametric_umap' " + "subdirectory or a valid 'umap.pkl'.") + # Load range/clip + try: + obj.load_range_clip(path) + except FileNotFoundError: + log.warn("Could not find range_clip.npz; results from " + "umap_transform() will not be normalized.") + if exists(join(path, 'tfrecords.json')): + obj.tfrecords = sf.util.load_json(join(path, 'tfrecords.json')) + elif path.endswith('.parquet'): + obj.load_coordinates(path) + else: + raise ValueError( + f"Unable to determine how to load {path}. Expected " + "a path to a directory, or a slidemap.parquet file." + ) + obj.slides = obj.data.slide.unique() + return obj + + @classmethod + def from_xy( + cls, + x: Union[np.ndarray, List[int], str], + y: Union[np.ndarray, List[int], str], + slides: Union[np.ndarray, List[str], str], + tfr_index: Union[np.ndarray, List[int], str], + data: Optional[DataFrame] = None, + parametric_umap: bool = False, + cache: Optional[str] = None + ) -> "SlideMap": + """Initializes map from precalculated (x, y) coordinates. + + Args: + slides (list(str)): List of slide names. + x (list(int)): X coordinates for each point on the map. Can either + be a list of int, or the name of a column in the DataFrame + provided to the argument 'data'. + y (list(int)): Y coordinates for tfrecords. Can either + be a list of int, or the name of a column in the DataFrame + provided to the argument 'data'. + slides (list(str)): Slide names for each point on the map. Can + either be a list of str, or the name of a column in the + DataFrame provided to the argument 'data'. + tfr_index (list(int)): TFRecord indicies for each point on + the map. Can either be a list of int, or the name of a column + in the DataFrame provided to the argument 'data'. + data (DataFrame, optional): Optional DataFrame which can be used + to supply the 'x', 'y', 'slides', and 'tfr_index' data. + cache (str, optional): Deprecated + """ + if cache is not None: + warnings.warn( + 'Argument "cache" is deprecated for SlideMap. ' + 'Instead of using/recalculating SlideMaps with cache, manually ' + 'save and load maps with SlideMap.save() and SlideMap.load()', + DeprecationWarning + ) + # Read and verify provided input + cols = {'x': x, 'y': y, 'slides': slides, 'tfr_index': tfr_index} + for col, col_val in cols.items(): + if isinstance(col_val, str) and data is None: + raise ValueError( + f"Could not interpret input {col_val} for arg {col}. " + "Did you mean to supply a DataFrame via 'data'?") + elif data is not None: + if isinstance(col_val, str) and col_val not in data.columns: + raise ValueError(f"Could not find column {col_val}.") + elif isinstance(col_val, str): + cols[col] = data[col_val].values + else: + cols[col] = col_val + + # Verify lengths of provided input + if not all(len(cols[c]) == len(cols['x']) for c in cols): + raise ValueError( + "Length of x, y, slides, and tfr_index must all be equal." + ) + + obj_data = pd.DataFrame({ + 'x': pd.Series(cols['x']), + 'y': pd.Series(cols['y']), + 'slide': pd.Series(cols['slides']), + 'tfr_index': pd.Series(cols['tfr_index']) + }) + obj = cls() + obj.slides = obj_data.slide.unique() + obj.data = obj_data + obj.parametric_umap = parametric_umap + return obj + + @classmethod + def from_features( + cls, + ftrs: "DatasetFeatures", + *, + exclude_slides: Optional[List[str]] = None, + map_slide: Optional[str] = None, + parametric_umap: bool = False, + umap_dim: int = 2, + umap: Optional[Any] = None, + recalculate: Optional[bool] = None, # Deprecated + cache: Optional[str] = None, # Deprecated + **umap_kwargs: Any + ) -> "SlideMap": + """Initializes map from dataset features. + + Args: + ftrs (:class:`slideflow.DatasetFeatures`): DatasetFeatures. + exclude_slides (list, optional): List of slides to exclude. + map_slide (str, optional): Either None, 'centroid', or 'average'. + If None, will map all tiles from each slide. Defaults to None. + umap_dim (int, optional): Number of dimensions for UMAP. Defaults + to 2. + umap (umap.UMAP, optional): Fit UMAP, to be used instead of fitting + a new UMAP. + cache (str, optional): Deprecated. + recalculate (bool, optional): Deprecated + """ + if recalculate or cache: + warnings.warn( + 'Arguments "recalculate" and "cache" are deprecated for SlideMap. ' + 'Instead of using/recalculating SlideMaps with cache, manually ' + 'save and load maps with SlideMap.save() and SlideMap.load()', + DeprecationWarning + ) + if map_slide is not None and map_slide not in ('centroid', 'average'): + raise errors.SlideMapError( + "map_slide must be None, 'centroid' or 'average', (got " + f"{map_slide})" + ) + if not exclude_slides: + slides = ftrs.slides + else: + slides = [s for s in ftrs.slides if s not in exclude_slides] + + obj = cls() + obj.slides = slides + obj.ftrs = ftrs + obj.umap = umap # type: ignore + obj.parametric_umap = parametric_umap + if map_slide: + obj._calculate_from_slides( + method=map_slide, + **umap_kwargs + ) + else: + obj._calculate_from_tiles( + dim=umap_dim, + **umap_kwargs + ) + return obj + + @classmethod + def from_precalculated(cls, *args, **kwargs) -> "SlideMap": + """Deprecated class initializer.""" + warnings.warn( + "sf.SlideMap.from_precalculated() deprecated. Please use " + "sf.SlideMap.from_xy() instead.", + DeprecationWarning + ) + return cls.from_xy(*args, **kwargs) + + @property + def x(self): + """X coordinates of map.""" + return self.data.x.values + + @property + def y(self): + """Y coordinates of map.""" + return self.data.y.values + + def _calculate_from_tiles( + self, + **umap_kwargs: Any + ) -> None: + """Internal function to guide calculation of UMAP from final layer + features / activations, as provided by DatasetFeatures. + + Keyword Args: + dim (int): Number of dimensions for UMAP. Defaults to 2. + n_neighbors (int): Number of neighbors for UMAP. Defaults to 50. + min_dist (float): Minimum distance for UMAP. Defaults to 0.1. + metric (str): UMAP metric. Defaults to 'cosine'. + **umap_kwargs (optional): Additional keyword arguments for the + UMAP function. + """ + assert self.ftrs is not None + + # Calculate UMAP + node_activations = np.concatenate([ + self.ftrs.activations[slide] for slide in self.slides + ]) + + self.map_meta['num_features'] = self.ftrs.num_features + log.info("Calculating UMAP...") + + coordinates = self.umap_transform(node_activations, **umap_kwargs) + + # Assemble dataframe + tfrecord_indices = np.concatenate([ + np.arange(self.ftrs.activations[slide].shape[0]) + for slide in self.slides + ]) + slides = np.array([ + slide + for slide in self.slides + for _ in range(self.ftrs.activations[slide].shape[0]) + ]) + data_dict = { + 'slide': pd.Series(slides), + 'x': pd.Series(coordinates[:, 0]), + 'tfr_index': pd.Series(tfrecord_indices), + } + if self.ftrs.locations: + locations = np.concatenate([ + self.ftrs.locations[slide] for slide in self.slides + ]) + data_dict['location'] = pd.Series([l for l in locations]).astype(object) + + if self.ftrs.predictions and isinstance(self.ftrs, sf.DatasetFeatures): + predictions = np.concatenate([ + self.ftrs.predictions[slide] for slide in self.slides + ]) + data_dict.update({ + 'predicted_class': pd.Series(np.argmax(predictions, axis=1)), + 'predictions': pd.Series([l for l in predictions]).astype(object), + }) + if self.ftrs.uq and self.ftrs.uncertainty != {}: # type: ignore + uncertainty = np.concatenate([ + self.ftrs.uncertainty[slide] for slide in self.slides + ]) + data_dict.update({ + 'uncertainty': pd.Series( + [u for u in uncertainty] + ).astype(object) + }) + if 'dim' not in umap_kwargs or umap_kwargs['dim'] > 1: + data_dict.update({ + 'y': pd.Series(coordinates[:, 1]), + }) + self.data = pd.DataFrame(data_dict) + + def _calculate_from_slides( + self, + method: str = 'centroid', + **umap_kwargs: Any + ) -> None: + """ Internal function to guide calculation of UMAP from final layer + activations for each tile, as provided via DatasetFeatures, and + then map only the centroid tile for each slide. + + Args: + method (str, optional): 'centroid' or 'average'. If centroid, will + calculate UMAP only from centroid tiles for each slide. + If average, will calculate UMAP based on average node + activations across all tiles within the slide, then display the + centroid tile for each slide. + + Keyword Args: + dim (int): Number of dimensions for UMAP. Defaults to 2. + n_neighbors (int): Number of neighbors for UMAP. Defaults to 50. + min_dist (float): Minimum distance for UMAP. Defaults to 0.1. + metric (str): UMAP metric. Defaults to 'cosine'. + **umap_kwargs (optional): Additional keyword arguments for the + UMAP function. + """ + if method not in ('centroid', 'average'): + _m = f'Method must be either "centroid" or "average", not {method}' + raise errors.SlideMapError(_m) + assert self.ftrs is not None + + # Calculate optimal slide indices and centroid activations + log.info("Calculating centroid indices...") + opt_idx, centroid_activations = stats_utils.calculate_centroid(self.ftrs.activations) + + # Restrict mosaic to only slides that had enough tiles to calculate + # an optimal index from centroid + successful_slides = list(opt_idx.keys()) + num_warned = 0 + for slide in self.ftrs.slides: + if slide not in successful_slides: + log.debug(f"No centroid for [green]{slide}[/]; skipping") + if num_warned: + log.warning(f"No centroid for {num_warned} slides.") + log.info(f"Calculating UMAP from slide-level {method}...") + + if method == 'centroid': + umap_input = np.array([ + centroid_activations[slide] for slide in self.slides + ]) + elif method == 'average': + umap_input = np.array([ + np.mean(self.ftrs.activations[slide], axis=0) + for slide in self.slides + ]) + + # Calculate UMAP + coordinates = self.umap_transform( + umap_input, + **umap_kwargs + ) + + # Create dataframe + locations = np.stack([ + self.ftrs.locations[slide][opt_idx[slide]] for slide in self.slides + ]) + data_dict = { + 'slide': pd.Series(self.slides), + 'x': pd.Series(coordinates[:, 0]), + 'tfr_index': pd.Series(opt_idx[slide] for slide in self.slides), + 'location': pd.Series([l for l in locations]).astype(object) + } + if self.ftrs.predictions: + predictions = np.stack([ + self.ftrs.predictions[slide][opt_idx[slide]] for slide in self.slides + ]) + data_dict.update({ + 'predictions': pd.Series([l for l in predictions]).astype(object), + 'predicted_class': pd.Series(np.argmax(predictions, axis=1)), + }) + if self.ftrs.uq and self.ftrs.uncertainty != {}: # type: ignore + uncertainty = np.stack([ + self.ftrs.uncertainty[slide][opt_idx[slide]] + for slide in self.slides + ]) + data_dict.update({ + 'uncertainty': pd.Series( + [u for u in uncertainty] + ).astype(object) + }) + if 'dim' not in umap_kwargs or umap_kwargs['dim'] > 1: + data_dict.update({ + 'y': pd.Series(coordinates[:, 1]), + }) + self.data = pd.DataFrame(data_dict) + + def activations(self) -> np.ndarray: + """Return associated DatasetFeatures activations as a numpy array + corresponding to the points on this SlideMap.""" + if self.ftrs is None: + raise ValueError( + "No associated DatasetFeatures object for reading activations." + ) + return np.array([ + self.ftrs.activations[row.slide][row.tfr_index] + for row in self.data.itertuples() + ]) + + def build_mosaic( + self, + tfrecords: Optional[List[str]] = None, + **kwargs + ) -> "sf.Mosaic": + """Build a mosaic map. + + Args: + tfrecords (list(str), optional): List of tfrecord paths. If SlideMap + was created using DatasetFeatures, this argument is not required. + + Keyword args: + num_tiles_x (int, optional): Mosaic map grid size. Defaults to 50. + tile_select (str, optional): 'first', 'nearest', or 'centroid'. + Determines how to choose a tile for display on each grid space. + If 'first', will display the first valid tile in a grid space + (fastest; recommended). If 'nearest', will display tile nearest + to center of grid space. If 'centroid', for each grid, will + calculate which tile is nearest to centroid tile_meta. + Defaults to 'nearest'. + tile_meta (dict, optional): Tile metadata, used for tile_select. + Dictionary should have slide names as keys, mapped to list of + metadata (length of list = number of tiles in slide). + Defaults to None. + normalizer ((str or :class:`slideflow.norm.StainNormalizer`), optional): + Normalization strategy to use on image tiles. Defaults to None. + normalizer_source (str, optional): Stain normalization preset or + path to a source image. Valid presets include 'v1', 'v2', and + 'v3'. If None, will use the default present ('v3'). + Defaults to None. + + """ + if self.ftrs is None and tfrecords is None: + raise ValueError( + "If SlideMap was not created using DatasetFeatures, then the " + "`tfrecords` argument (list of TFRecord paths) must be supplied " + "to `SlideMap.build_mosaic()`" + ) + elif ((self.ftrs is not None and not len(self.ftrs.tfrecords)) + and tfrecords is None): + raise ValueError( + "The DatasetFeatures object used to create this SlideMap " + "did not have paths to TFRecords stored. Please supply a list " + "of TFRecord paths to the `tfrecords` argument " + "of `SlideMap.build_mosaic()`" + ) + elif (tfrecords is None + and self.ftrs is not None + and len(self.ftrs.tfrecords)): + return sf.Mosaic(self, tfrecords=self.ftrs.tfrecords, **kwargs) + else: + return sf.Mosaic(self, tfrecords=tfrecords, **kwargs) + + def cluster(self, n_clusters: int) -> None: + """Performs K-means clustering on data and adds to metadata labels. + + Clusters are saved to self.data['cluster']. Requires that SlideMap + was generated via DatasetFeatures. + + Examples + Perform K-means clustering and apply cluster labels. + + slidemap.cluster(n_clusters=5) + slidemap.plot() + + Args: + n_clusters (int): Number of clusters for K means clustering. + """ + + if self.ftrs is None: + raise errors.SlideMapError( + "Unable to cluster; no DatasetFeatures provided" + ) + activations = [ + self.ftrs.activations[row.slide][row.tfr_index] + for row in self.data.itertuples() + ] + log.info(f"Calculating K-means clustering (n={n_clusters})") + kmeans = KMeans(n_clusters=n_clusters, n_init=10).fit(activations) + self.data['cluster'] = kmeans.labels_ + self.label('cluster') + + def neighbors( + self, + slide_categories: Optional[Dict] = None, + algorithm: str = 'kd_tree', + method: str = 'map', + pca_dim: int = 100 + ) -> None: + """Calculates neighbors among tiles in this map, assigning neighboring + statistics to tile metadata 'num_unique_neighbors' and + 'percent_matching_categories'. + + Args: + slide_categories (dict, optional): Maps slides to categories. + Defaults to None. If provided, will be used to calculate + 'percent_matching_categories' statistic. + algorithm (str, optional): NearestNeighbor algorithm, either + 'kd_tree', 'ball_tree', or 'brute'. Defaults to 'kd_tree'. + method (str, optional): Either 'map', 'pca', or 'features'. How + neighbors are determined. If 'map', calculates neighbors based + on UMAP coordinates. If 'features', calculates neighbors on the + full feature space. If 'pca', reduces features into `pca_dim` + space. Defaults to 'map'. + """ + from sklearn.decomposition import PCA + from sklearn.neighbors import NearestNeighbors + if self.ftrs is None: + raise errors.SlideMapError( + "Unable perform neighbor search; no DatasetFeatures provided" + ) + log.info(f"Initializing neighbor search (method={method})...") + if method == 'map': + X = np.stack((self.data.x.values, self.data.y.values), axis=-1) + elif method == 'features': + X = self.activations() + elif method == 'pca': + log.info(f"Reducing dimensionality with PCA (dim={pca_dim})...") + pca = PCA(n_components=pca_dim) + features = self.activations() + pca.fit(features) + X = pca.transform(features) + + else: + raise ValueError(f'Unknown neighbor method {method}.') + nbrs = NearestNeighbors( + n_neighbors=100, + algorithm=algorithm, + n_jobs=-1 + ).fit(X) + log.info("Calculating nearest neighbors...") + _, indices = nbrs.kneighbors(X) + + def num_category_matching(idx_list, idx): + list_cat = np.array([ + slide_categories[self.data.loc[_i].slide] for _i in idx_list + ]) + idx_cat = slide_categories[self.data.loc[idx].slide] + return (list_cat == idx_cat).sum() + + log.info('Matching neighbors...') + #TODO: accelerate this step with multiprocessing + self.data['num_unique_neighbors'] = [ + len(self.data.loc[ind].slide.unique()) + for ind in indices + ] + if slide_categories: + self.data['percent_matching_categories'] = [ + num_category_matching(ind, i) / len(ind) + for i, ind in enumerate(indices) + ] + + def filter(self, slides: List[str]) -> None: + """Filters map to only show tiles from the given slides. + + Args: + slides (list(str)): List of slide names. + """ + + self.data = self.data.loc[self.data.slide.isin(slides)] + + def umap_transform( + self, + array: np.ndarray, + *, + dim: int = 2, + n_neighbors: int = 50, + min_dist: float = 0.1, + metric: str = 'cosine', + **kwargs: Any + ) -> np.ndarray: + """Transforms a given array using UMAP projection. If a UMAP has not + yet been fit, this will fit a new UMAP on the given data. + + Args: + array (np.ndarray): Array to transform with UMAP dimensionality + reduction. + + Keyword Args: + dim (int, optional): Number of dimensions for UMAP. Defaults to 2. + n_neighbors (int, optional): Number of neighbors for UMAP + algorithm. Defaults to 50. + min_dist (float, optional): Minimum distance argument for UMAP + algorithm. Defaults to 0.1. + metric (str, optional): Metric for UMAP algorithm. Defaults to + 'cosine'. + **kwargs (optional): Additional keyword arguments for the + UMAP function. + """ + import umap # Imported in this function due to long import time + if not len(array): + raise errors.StatsError("Unable to perform UMAP on empty array.") + if self.umap is None: # type: ignore + fn = umap.UMAP if not self.parametric_umap else umap.ParametricUMAP + self.umap = fn( + n_components=dim, + verbose=(sf.getLoggingLevel() <= 20), + n_neighbors=n_neighbors, + min_dist=min_dist, + metric=metric, + **kwargs + ) + layout = self.umap.fit_transform(array) # type: ignore + (normalized, + self._umap_normalized_range, + self._umap_normalized_clip) = stats_utils.normalize_layout(layout) + else: + layout = self.umap.transform(array) # type: ignore + if self._umap_normalized_range is not None: + normalized = stats_utils.normalize( + layout, + norm_range=self._umap_normalized_range, + norm_clip=self._umap_normalized_clip) + else: + log.info("No range/clip information available; unable to " + "normalize UMAP output.") + return layout + + return normalized + + def label_by_uncertainty(self, index: int = 0) -> None: + """Labels each point with the tile-level uncertainty, if available. + + Args: + index (int, optional): Uncertainty index. Defaults to 0. + """ + if 'label' in self.data.columns: + self.data.drop(columns='label', inplace=True) + if self.ftrs is None: + raise errors.SlideMapError("DatasetFeatures not provided.") + if not self.ftrs.uq or self.ftrs.uncertainty == {}: # type: ignore + raise errors.DatasetError( + 'Unable to label by uncertainty; UQ estimates not available.' + ) + else: + uq_labels = np.stack(self.data['uncertainty'].values)[:, index] + self.data['label'] = uq_labels + + def label_by_preds(self, index: int) -> None: + """Displays each point with label equal to the prediction value (from 0-1) + + Args: + index (int): Logit index. + """ + if 'label' in self.data.columns: + self.data.drop(columns='label', inplace=True) + self.data['label'] = np.stack(self.data['predictions'].values)[:, index] + + def label_by_slide(self, slide_labels: Optional[Dict] = None) -> None: + """Displays each point as the name of the corresponding slide. + If slide_labels is provided, will use this dict to label slides. + + Args: + slide_labels (dict, optional): Dict mapping slide names to labels. + """ + if 'label' in self.data.columns: + self.data.drop(columns='label', inplace=True) + if slide_labels: + self.data['label'] = self.data.slide.map(slide_labels) + else: + self.data['label'] = self.data.slide.values + + def label(self, meta: str, translate: Optional[Dict] = None) -> None: + """Displays each point labeled by tile metadata (e.g. 'predicted_class') + + Args: + meta (str): Data column from which to assign labels. + translate (dict, optional): If provided, will translate the + read metadata through this dictionary. + """ + if 'label' in self.data.columns: + self.data.drop(columns='label', inplace=True) + self.data['label'] = self.data[meta].values + if translate: + self.data['label'] = self.data['label'].map(translate) + + def plot( + self, + subsample: Optional[int] = None, + title: Optional[str] = None, + cmap: Optional[Dict] = None, + xlim: Tuple[float, float] = (-0.05, 1.05), + ylim: Tuple[float, float] = (-0.05, 1.05), + xlabel: Optional[str] = None, + ylabel: Optional[str] = None, + legend: Optional[str] = None, + ax: Optional["Axes"] = None, + loc: Optional[str] = 'center right', + ncol: Optional[int] = 1, + categorical: Union[str, bool] = 'auto', + legend_kwargs: Optional[Dict] = None, + **scatter_kwargs: Any, + ) -> None: + """Plots calculated map. + + Args: + subsample (int, optional): Subsample to only include this many + tiles on plot. Defaults to None. + title (str, optional): Title for plot. + cmap (dict, optional): Dict mapping labels to colors. + xlim (list, optional): List of float indicating limit for x-axis. + Defaults to (-0.05, 1.05). + ylim (list, optional): List of float indicating limit for y-axis. + Defaults to (-0.05, 1.05). + xlabel (str, optional): Label for x axis. Defaults to None. + ylabel (str, optional): Label for y axis. Defaults to None. + legend (str, optional): Title for legend. Defaults to None. + ax (matplotlib.axes.Axes, optional): Figure axis. If not supplied, + will prepare a new figure axis. + loc (str, optional): Location for legend, as defined by + matplotlib.axes.Axes.legend(). Defaults to 'center right'. + ncol (int, optional): Number of columns in legend, as defined + by matplotlib.axes.Axes.legend(). Defaults to 1. + categorical (str, optional): Specify whether labels are categorical. + Determines the colormap. Defaults to 'auto' (will attempt to + automatically determine from the labels). + legend_kwargs (dict, optional): Dictionary of additional keyword + arguments to the matplotlib.axes.Axes.legend() function. + **scatter_kwargs (optional): Additional keyword arguments to the + seaborn scatterplot function. + """ + import seaborn as sns + import matplotlib.pyplot as plt + + if legend_kwargs is None: + legend_kwargs = dict() + + # Make plot + if ax is None: + fig = plt.figure(figsize=(6, 4.5)) + ax = fig.add_subplot(111) + + # Subsampling + if subsample: + plot_df = self.data.sample(subsample) + else: + plot_df = self.data + + x = plot_df.x + y = plot_df.y + + if 'label' in self.data.columns: + labels = plot_df.label + + # Check for categorical labels + if (categorical is True + or not pd.to_numeric(labels, errors='coerce').notnull().all()): + + log.debug("Interpreting labels as categorical") + scatter_kwargs.update( + dict(hue=labels.astype('category')) + ) + unique = list(labels.unique()) + try: + unique.sort() + except TypeError: + log.error( + "Unable to sort categories; are some values NaN?" + ) + if len(unique) >= 12: + sns_pal = sns.color_palette("Paired", len(unique)) + else: + sns_pal = sns.color_palette('hls', len(unique)) + if cmap is None: + cmap = {unique[i]: sns_pal[i] for i in range(len(unique))} + else: + log.debug("Interpreting labels as continuous") + scatter_kwargs.update(dict(hue=labels)) + + umap_2d = sns.scatterplot( + x=x, + y=y, + palette=cmap, + ax=ax, + **scatter_kwargs + ) + ax.set_ylim(*((None, None) if not ylim else ylim)) + ax.set_xlim(*((None, None) if not xlim else xlim)) + if 'hue' in scatter_kwargs: + ax.legend( + loc=loc, + ncol=ncol, + title=legend, + **legend_kwargs + ) + umap_2d.set(xlabel=xlabel, ylabel=ylabel) + if title: + ax.set_title(title) + + def plot_3d( + self, + z: Optional[np.ndarray] = None, + feature: Optional[int] = None, + subsample: Optional[int] = None, + fig: Optional["Figure"] = None, + ) -> None: + """Saves a plot of a 3D umap, with the 3rd dimension representing + values provided by argument "z". + + Args: + z (list, optional): Values for z axis. Must supply z or feature. + Defaults to None. + feature (int, optional): Int, feature to plot on 3rd axis. + Must supply z or feature. Defaults to None. + subsample (int, optional): Subsample to only include this many + tiles on plot. Defaults to None. + fig (matplotlib.figure.Figure, optional): Figure. If not supplied, + will prepare a new figure. + """ + import matplotlib.pyplot as plt + from mpl_toolkits.mplot3d import Axes3D + + if fig is None: + fig = plt.figure() + + title = f"UMAP with feature {feature} focus" + if self.ftrs is None: + raise errors.SlideMapError("DatasetFeatures not provided.") + if (z is None) and (feature is None): + raise errors.SlideMapError("Must supply either 'z' or 'feature'.") + + # Subsampling + if subsample: + plot_df = self.data.sample(subsample) + else: + plot_df = self.data + + # Get feature activations for 3rd dimension + if z is None: + z = np.array([ + self.ftrs.activations[row.slide][row.tfr_index][feature] + for row in plot_df.itertuples() + ]) + + # Plot tiles on a 3D coordinate space with 2 coordinates from UMAP + # and 3rd from the value of the excluded feature + ax = Axes3D(fig, auto_add_to_figure=False) + fig.add_axes(ax) + scatter_kw = dict(c=z, cmap='viridis', linewidth=0.5, edgecolor="k") + ax.scatter(plot_df.x, plot_df.y, z, **scatter_kw) + ax.set_title(title) + + def save( + self, + path: str, + dpi: int = 300, + **kwargs, + ): + """Save UMAP, plot, coordinates, and normalization values to a directory. + + The UMAP, plot, coordinates, and normalization values can all be + loaded from this directory after saving with ``sf.SlideMap.load(path)``. + + Args: + path (str): Directory in which to save the plot and UMAP. + The UMAP image will be saved with the filename "slidemap.png". + dpi (int, optional): DPI for final image. Defaults to 300. + + Keyword args: + subsample (int, optional): Subsample to only include this many + tiles on plot. Defaults to None. + title (str, optional): Title for plot. + cmap (dict, optional): Dict mapping labels to colors. + xlim (list, optional): List of float indicating limit for x-axis. + Defaults to (-0.05, 1.05). + ylim (list, optional): List of float indicating limit for y-axis. + Defaults to (-0.05, 1.05). + xlabel (str, optional): Label for x axis. Defaults to None. + ylabel (str, optional): Label for y axis. Defaults to None. + legend (str, optional): Title for legend. Defaults to None. + **scatter_kwargs (optional): Additional keyword arguments to the + seaborn scatterplot function. + + """ + if not exists(path): + os.makedirs(path) + if path.endswith('.png') or path.endswith('.jpg') or path.endswith('.jpeg'): + log.warning( + "Path provided to `SlideMap.save()` is a file name, " + "not a directory. Will save the figure plot to this location, " + "but will not save the associated UMAP. To save both plot and " + "UMAP, provide a path to a directory instead." + ) + self.save_plot(path, dpi=dpi, **kwargs) + else: + self.save_plot(join(path, "slidemap.png"), dpi=dpi, **kwargs) + if self.umap is not None: + self.save_umap(path) + + def save_plot( + self, + filename: str, + dpi: int = 300, + **kwargs + ): + """Save plot of slide map. + + Args: + filename (str): File path to save the image. + dpi (int, optional): DPI for final image. Defaults to 300. + + Keyword args: + subsample (int, optional): Subsample to only include this many + tiles on plot. Defaults to None. + title (str, optional): Title for plot. + cmap (dict, optional): Dict mapping labels to colors. + xlim (list, optional): List of float indicating limit for x-axis. + Defaults to (-0.05, 1.05). + ylim (list, optional): List of float indicating limit for y-axis. + Defaults to (-0.05, 1.05). + xlabel (str, optional): Label for x axis. Defaults to None. + ylabel (str, optional): Label for y axis. Defaults to None. + legend (str, optional): Title for legend. Defaults to None. + **scatter_kwargs (optional): Additional keyword arguments to the + seaborn scatterplot function. + + """ + import matplotlib.pyplot as plt + + with sf.util.matplotlib_backend('Agg'): + self.plot(**kwargs) + plt.savefig(filename, bbox_inches='tight', dpi=dpi) + plt.close() + log.info(f"Saved 2D UMAP to [green]{filename}") + + def save_3d( + self, + filename: str, + dpi: int = 300, + **kwargs + + ): + """Save 3D plot of slide map. + + Args: + filename (str): _description_ + dpi (int, optional): _description_. Defaults to 300. + + Keyword args: + z (list, optional): Values for z axis. Must supply z or feature. + Defaults to None. + feature (int, optional): Int, feature to plot on 3rd axis. + Must supply z or feature. Defaults to None. + subsample (int, optional): Subsample to only include this many + tiles on plot. Defaults to None. + + """ + import matplotlib.pyplot as plt + + with sf.util.matplotlib_backend('Agg'): + self.plot_3d(**kwargs) + plt.savefig(filename, bbox_inches='tight', dpi=dpi) + plt.close() + log.info(f"Saved 3D UMAP to [green]{filename}") + + def save_coordinates(self, path: str) -> None: + """Save coordinates only to parquet file. + + Args: + path (str, optional): Save coordinates to this location. + """ + self.data.to_parquet(path) + log.info(f"Wrote slide map coordinates to [green]{path}") + + def save_umap(self, path: str) -> None: + """Save UMAP, coordinates, and normalization information to a directory. + + Args: + path (str, optional): Save UMAP and coordinates to this directory. + Coordinates will be saved in this directory with the filename + ``slidemap.parquet`` Model will be saved as umap.pkl (parametric) + or model.pkl (parametric). + """ + if self.parametric_umap: + self.umap.save(path) + else: + with open(join(path, 'umap.pkl'), 'wb') as f: + pickle.dump(self.umap, f) + log.info(f"Wrote UMAP coordinates to [green]{path}") + self.save_coordinates(join(path, 'slidemap.parquet')) + self.save_range_clip(path) + + def save_encoder(self, path: str) -> None: + """Save Parametric UMAP encoder only.""" + if not self.parametric_umap: + raise ValueError("SlideMap not built with Parametric UMAP.") + self.umap.encoder.save(join(path, 'encoder')) + self.save_coordinates(join(path, 'slidemap.parquet')) + self.save_range_clip(path) + + def save_range_clip(self, dest: str) -> None: + """Save range/clip information. + + If ZIP saving is enabled, will save to range_clip.npz, with the + attributes ``"range"`` and ``"clip"``. + + If ZIP saving is disabled (SF_ALLOW_ZIP=0, for databricks compatibility), + will save these attributes to range.npy and clip.npy, separately. + + Args: + dest (str): Destination directory. + + """ + if sf.util.zip_allowed(): + np.savez( + dest + 'range_clip.npz', + range=self._umap_normalized_range, + clip=self._umap_normalized_clip + ) + else: + np.save(dest + 'range.npy', self._umap_normalized_range) + np.save(dest + 'clip.npy', self._umap_normalized_clip) + + def load_range_clip(self, path: str) -> None: + """Load a saved range_clip.npz file for normalizing raw UMAP output. + + Args: + path (str): Path to numpy file (\*.npz) with 'clip' and 'range' keys + as generated from ``SlideMap.save()``. + + """ + rc_path, r_path, c_path = None, None, None + if exists(path) and path.endswith('.npz'): + rc_path = path + elif exists(join(path, 'range_clip.npz')): + rc_path = join(path, 'range_clip.npz') + elif exists(join(path, 'range.npy')) and exists(join(path, 'clip.npy')): + r_path = join(path, 'range.npy') + c_path = join(path, 'clip.npy') + else: + raise FileNotFoundError( + f"Unable to find range/clip information at {path}." + ) + if rc_path: + loaded = np.load(path) + if not ('range' in loaded and 'clip' in loaded): + raise ValueError(f"Unable to load {path}; did not find values " + "'range' and 'clip'.") + self._umap_normalized_clip = loaded['clip'] + self._umap_normalized_range = loaded['range'] + else: + self._umap_normalized_clip = np.load(c_path) + self._umap_normalized_range = np.load(r_path) + log.info("Loaded range={}, clip={}".format( + self._umap_normalized_range, + self._umap_normalized_clip + )) + + def load_umap(self, path: str) -> "umap.UMAP": + """Load only a UMAP model and not slide coordinates or range_clip.npz. + + Args: + path (str): Path to either umap.pkl or directory with saved + parametric UMAP. + + """ + log.debug(f"Loading UMAP at {path}") + if self.parametric_umap: + from umap.parametric_umap import load_ParametricUMAP + self.umap = load_ParametricUMAP(path) + else: + with open(path, 'rb') as f: + self.umap = pickle.load(f) + log.info(f"Loaded UMAP from [green]{path}") + + def load_coordinates(self, path: str) -> None: + """Load coordinates from parquet file. + + Args: + path (str, optional): Path to parquet file (.parquet) with SlideMap + coordinates. + + """ + log.debug(f"Loading coordinates at {path}") + self.data = pd.read_parquet(path) + log.info(f"Loaded coordinates from [green]{path}")
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/stats/stats_utils/index.html b/docs/_modules/slideflow/stats/stats_utils/index.html new file mode 100644 index 000000000..87f3102db --- /dev/null +++ b/docs/_modules/slideflow/stats/stats_utils/index.html @@ -0,0 +1,514 @@ + + + + + + + + + + + + slideflow.stats.stats_utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.stats.stats_utils

+from typing import Dict, Tuple
+
+import numpy as np
+from sklearn.cluster import KMeans
+from sklearn.metrics import pairwise_distances_argmin_min
+
+
+
[docs]def calculate_centroid( + act: Dict[str, np.ndarray] +) -> Tuple[Dict[str, int], Dict[str, np.ndarray]]: + """Calcultes slide-level centroid indices for a provided activations dict. + + Args: + activations (dict): Dict mapping slide names to ndarray of activations + across tiles, of shape (n_tiles, n_features) + + Returns: + A tuple containing + + dict: Dict mapping slides to index of tile nearest to centroid + + dict: Dict mapping slides to activations of tile nearest to centroid + """ + + optimal_indices = {} + centroid_activations = {} + for slide in act: + if not len(act[slide]): + continue + km = KMeans(n_clusters=1, n_init=10).fit(act[slide]) + closest, _ = pairwise_distances_argmin_min( + km.cluster_centers_, + act[slide] + ) + closest_index = closest[0] + closest_activations = act[slide][closest_index] + optimal_indices.update({slide: closest_index}) + centroid_activations.update({slide: closest_activations}) + return optimal_indices, centroid_activations
+ + +
[docs]def get_centroid_index(arr: np.ndarray) -> int: + """Calculate index nearest to centroid from a given 2D input array.""" + km = KMeans(n_clusters=1, n_init=10).fit(arr) + closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, arr) + return closest[0]
+ + +def normalize_layout( + layout: np.ndarray, + min_percentile: int = 1, + max_percentile: int = 99, + relative_margin: float = 0.1 +) -> Tuple[np.ndarray, Tuple[float, float], Tuple[float, float]]: + """Removes outliers and scales layout to between [0,1]. + + Args: + layout (np.ndarray): 2D array containing data to be scaled. + min_percentile (int, optional): Percentile for scaling. Defaults to 1. + max_percentile (int, optional): Percentile for scaling. Defaults to 99. + relative_margin (float, optional): Add an additional margin (fraction + of total plot width). Defaults to 0.1. + + Returns: + np.ndarray: layout array, re-scaled and clipped. + + tuple(float, float): Range in original space covered by this layout. + + tuple(float, float): Clipping values (min, max) used for this layout + """ + + # Compute percentiles + mins = np.percentile(layout, min_percentile, axis=(0)) + maxs = np.percentile(layout, max_percentile, axis=(0)) + # Add margins + mins -= relative_margin * (maxs - mins) + maxs += relative_margin * (maxs - mins) + # `clip` broadcasts, `[None]`s added only for readability + clipped = np.clip(layout, mins, maxs) + # embed within [0,1] along both axes + _min = clipped.min(axis=0) + _max = clipped.max(axis=0) + clipped -= _min + clipped /= (_max - _min) + return clipped, (_min, _max), (mins, maxs) + +def normalize( + array: np.ndarray, + norm_range: Tuple[np.ndarray, np.ndarray], + norm_clip: Tuple[np.ndarray, np.ndarray], +) -> np.ndarray: + """Normalize and clip an array.""" + _min, _max = norm_range + mins, maxs = norm_clip + clipped = np.clip(array, mins, maxs) + clipped -= _min + clipped /= (_max - _min) + return clipped + +def denormalize( + array: np.ndarray, + norm_range: Tuple[np.ndarray, np.ndarray], +) -> np.ndarray: + """De-normalize an array.""" + _min, _max = norm_range + transformed = array * (_max - _min) + transformed += _min + return transformed +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/studio/index.html b/docs/_modules/slideflow/studio/index.html new file mode 100644 index 000000000..32fee6aa1 --- /dev/null +++ b/docs/_modules/slideflow/studio/index.html @@ -0,0 +1,2811 @@ + + + + + + + + + + + + slideflow.studio — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.studio

+import os
+import time
+import numpy as np
+import webbrowser
+import pyperclip
+import imgui
+import glfw
+import OpenGL.GL as gl
+from contextlib import contextmanager
+from typing import List, Any, Optional, Dict, Union, Tuple
+from os.path import join, dirname, abspath
+from PIL import Image
+from tkinter import Tk
+from tkinter.filedialog import askopenfilename, askdirectory
+
+import slideflow as sf
+from slideflow import log
+
+from .gui import imgui_utils
+from .gui import gl_utils
+from .gui import text_utils
+from .gui.theme import StudioTheme
+from .gui.window import ImguiWindow
+from .gui.viewer import SlideViewer
+from .widgets import (
+    ProjectWidget, SlideWidget, ModelWidget, HeatmapWidget, PerformanceWidget,
+    CaptureWidget, SettingsWidget, ExtensionsWidget, Widget
+)
+from .utils import EasyDict, prediction_to_string, StatusMessage
+from ._renderer import Renderer
+from ._render_manager import AsyncRenderManager, Renderer, CapturedException
+
+OVERLAY_GRID    = 0
+OVERLAY_WSI     = 1
+OVERLAY_VIEW    = 2
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Studio(ImguiWindow): + +
[docs] def __init__( + self, + low_memory: bool = False, + widgets: Optional[List[Any]] = None, + skip_tk_init: bool = False, + theme: Optional[StudioTheme] = None, + ) -> None: + """Create the main Studio window. + + Slideflow Studio is started by running the studio module. + + .. code-block:: bash + + python -m slideflow.studio + + Args: + low_memory (bool): Enable low memory mode, which uses thread pools + instead of multiprocessing pools when applicable to reduce + memory footprint, at the cost of decreased performance. + widgets (list(Any), optional): Additional widgets to render. + """ + + # Initialize TK window in background (for file dialogs) + if not skip_tk_init: + Tk().withdraw() + + if theme is None: + theme = StudioTheme() + + super().__init__( + title=f'Slideflow Studio', + background=theme.main_background + ) + + # Internals. + self._dx = 0 + self._dy = 0 + self._last_error_print = None + self._render_manager = AsyncRenderManager() + self._addl_renderers = dict() + self._defer_rendering = 0 + self._tex_img = None + self._tex_obj = None + self._norm_tex_img = None + self._norm_tex_obj = None + self._heatmap_tex_img = None + self._heatmap_tex_obj = None + self._wsi_tex_obj = None + self._wsi_tex_img = None + self._about_tex_obj = None + self._predictions = None + self._model_path = None + self._model_config = None + self._normalizer = None + self._normalize_wsi = False + self._uncertainty = None + self._content_width = None + self._content_height = None + self._pane_w = None + self._refresh_view = False + self._overlay_wsi_dim = None + self._overlay_offset_wsi_dim = (0, 0) + self._thumb_params = None + self._use_model = None + self._use_uncertainty = None + self._use_saliency = None + self._use_model_img_fmt = False + self._tex_to_delete = [] + self._defer_tile_refresh = None + self._should_close_slide = False + self._should_close_model = False + self._bg_logo = None + self._message = None + self._pred_message = None + self.low_memory = low_memory + self._suspend_mouse_input = False + self._suspend_keyboard_input = False + self._status_message = None + self._force_enable_tile_preview = False + + # Interface. + self._show_about = False + self._show_tile_preview = True + self._tile_preview_is_new = True + self._tile_preview_image_is_new = True + self._show_overlays = True + self._show_mpp_zoom_popup = False + self._input_mpp = 1. + self._box_color = [1, 0, 0] + self.theme = theme + + # Widget interface. + self.wsi = None + self.wsi_thumb = None + self.viewer = None + self.saliency = None + self.box_x = None + self.box_y = None + self.tile_px = None + self.tile_um = None + self.tile_zoom = 1 + self.heatmap = None + self.rendered_heatmap = None + self.overlay = None + self.overlay_original = None + self.rendered_qc = None + self.overlay_qc = None + self.args = EasyDict(use_model=False, use_uncertainty=False, use_saliency=False) + self.result = EasyDict(predictions=None, uncertainty=None) + self.message = None + self.pane_w = 0 + self.label_w = 0 + self.button_w = 0 + self.x = None + self.y = None + self.mouse_x = None + self.mouse_y = None + self._mouse_screen_x = 0 + self._mouse_screen_y = 0 + self.menu_bar_height = self.font_size + self.spacing + + # Control sidebar. + self.sidebar = Sidebar(self) + + # Core widgets. + self.project_widget = ProjectWidget(self) + self.slide_widget = SlideWidget(self) + self.model_widget = ModelWidget(self) + self.heatmap_widget = HeatmapWidget(self) + self.performance_widget = PerformanceWidget(self) + self.capture_widget = CaptureWidget(self) + self.settings_widget = SettingsWidget(self) + + # User-defined widgets. + self.widgets = [] + if widgets is None: + widgets = self.get_default_widgets() + self.add_widgets(widgets) + + # Extensions widget. + self.extensions_widget = ExtensionsWidget(self) + + # Initialize window. + self.set_window_icon(imgui_utils.logo_image()) + self.set_position(0, 0) + self._update_window_limits() + self._set_default_font_size() + self.skip_frame() # Layout may change after first frame. + self.load_slide('')
+ + @property + def show_overlay(self): + """An overlay (e.g. tile filter or heatmap) is currently being shown + over the main view. + """ + return ((self.slide_widget.show_overlay or self.heatmap_widget.show) + and self._show_overlays) + + @property + def model(self): + """Tensorflow/PyTorch model currently in use.""" + return self._render_manager._model + + @property + def P(self): + """Slideflow project currently in use.""" + return self.project_widget.P + + @property + def offset_x(self): + """Main window offset (x), in points.""" + return self.pane_w + + @property + def offset_y(self): + """Main window offset (y), in points.""" + return self.menu_bar_height + + @property + def offset_x_pixels(self): + """Main window offset (x), in pixels.""" + return int(self.offset_x * self.pixel_ratio) + + @property + def offset_y_pixels(self): + """Main window offset (y), in pixels.""" + return int(self.offset_y * self.pixel_ratio) + + @property + def status_bar_height(self): + return self.font_size + self.spacing + + @property + def mouse_is_over_viewer(self): + """Mouse is currently over the main viewer.""" + cx, cy = imgui.get_mouse_pos() + cx -= self.offset_x + cy -= self.offset_y + return (self.viewer is not None + and self.viewer.is_in_view(cx, cy)) + + @property + def tile_preview_enabled(self): + """Show a tile preview when right clicking.""" + return self._model_path or self._force_enable_tile_preview + + # --- Internals ----------------------------------------------------------- + + def _set_default_font_size(self) -> None: + """Change the interface font size.""" + old = self.font_size + self.set_font_size(int(18 / self.pixel_ratio)) + if self.font_size != old: + self.skip_frame() # Layout changed. + + def _clear_textures(self) -> None: + """Remove all textures.""" + for tex in self._tex_to_delete: + tex.delete() + self._tex_to_delete = [] + + def _close_model_now(self) -> None: + """Close the currently loaded model now.""" + self._render_manager.clear_result() + self._use_model = False + self._use_uncertainty = False + self._use_saliency = False + self._model_path = None + self._model_config = None + self._normalizer = None + self.tile_px = None + self.tile_um = None + self.heatmap = None + self.x = None + self.y = None + self._render_manager.clear_model() + self.clear_model_results() + self.heatmap_widget.reset() + + def _close_slide_now(self) -> None: + """Close the currently loaded slide now.""" + self.wsi = None + self.viewer = None + self.wsi_thumb = None + self.x = None + self.y = None + self.mouse_x = None + self.mouse_y = None + self.clear_result() + self._render_manager._live_updates = False + self._heatmap_tex_img = None + self._heatmap_tex_obj = None + self.heatmap_widget.reset() + self.set_title("Slideflow Studio") + + def _draw_about_dialog(self) -> None: + """Draw the About dialog.""" + if self._show_about: + import platform + try: + import pyvips + from pyvips.base import version as lv + libvips_version = f'{lv(0)}.{lv(1)}.{lv(2)}' + pyvips_version = pyvips.__version__ + except Exception: + libvips_version = 'NA' + pyvips_version = 'NA' + + imgui.open_popup('about_popup') + version_width = imgui.calc_text_size("Version: " + sf.__version__).x + width = max(200, version_width + self.spacing) + height = 315 + imgui.set_next_window_content_size(width, 0) + imgui.set_next_window_position(self.content_width/2 - width/2, self.content_height/2 - height/2) + + about_text = f"Version: {sf.__version__}\n" + about_text += f"Python: {platform.python_version()}\n" + about_text += f"Slide Backend: {sf.slide_backend()}\n" + about_text += f"Libvips: {libvips_version}\n" + about_text += f"Pyvips: {pyvips_version}\n" + about_text += f"OS: {platform.system()} {platform.release()}\n" + + if imgui.begin_popup('about_popup'): + + if self._about_tex_obj is None: + about_img = imgui_utils.logo_image().resize((96, 96)) + self._about_tex_obj = gl_utils.Texture(image=about_img) + imgui.text('') + imgui.text('') + imgui.same_line(imgui.get_content_region_max()[0]/2 - 48 + self.spacing) + imgui.image(self._about_tex_obj.gl_id, 96, 96) + + imgui.text('') + with self.bold_font(): + self.center_text('Slideflow Studio') + imgui.text('') + + for line in about_text.split('\n'): + self.center_text(line) + imgui.text('') + imgui.same_line(self.spacing) + if imgui_utils.button('Copy', width=self.button_w/2): + pyperclip.copy(about_text) + imgui.same_line(imgui.get_content_region_max()[0] + self.spacing - self.button_w/2) + if imgui_utils.button('Close', width=self.button_w/2): + self._show_about = False + imgui.end_popup() + + def _draw_mpp_zoom_dialog(self): + """Show a dialog that prompts the user to specify microns-per-pixel.""" + if not self._show_mpp_zoom_popup: + return + + window_size = (self.font_size * 18, self.font_size * 7) + self.center_next_window(*window_size) + imgui.set_next_window_size(*window_size) + _, opened = imgui.begin('Zoom to Microns-Per-Pixel (MPP)', closable=True, flags=imgui.WINDOW_NO_RESIZE) + if not opened: + self._show_mpp_zoom_popup = False + + imgui.text("Zoom the current view to a given MPP.") + imgui.separator() + imgui.text('') + imgui.same_line(self.font_size*4) + with imgui_utils.item_width(self.font_size*4): + _changed, self._input_mpp = imgui.input_float('MPP##input_mpp', self._input_mpp, format='%.3f') + imgui.same_line() + if self._input_mpp: + mag = f'{10/self._input_mpp:.1f}x' + else: + mag = '-' + imgui.text(mag) + if self.sidebar.full_button("Zoom", width=-1): + self.viewer.zoom_to_mpp(window_size[0] / 2, window_size[1] / 2, self._input_mpp) + self._show_mpp_zoom_popup = False + imgui.end() + + def _draw_control_pane(self) -> None: + """Draw the control pane and widgets.""" + self.sidebar.draw() + + def _draw_empty_background(self): + """Render an empty background with the Studio logo.""" + if self._bg_logo is None: + bg_path = join(dirname(abspath(__file__)), 'gui', 'logo_dark_outline.png') + img = np.array(Image.open(bg_path)) + self._bg_logo = gl_utils.Texture(image=img, bilinear=True) + self._bg_logo.draw( + pos=(self.content_frame_width//2, self.content_frame_height//2), + zoom=0.75, + align=0.5, + rint=True, + anchor='center' + ) + + def _draw_main_view(self, inp: EasyDict, window_changed: bool) -> None: + """Update the main window view. + + Draws the slide / picam view, overlay heatmap, overlay box, and ROIs. + + Args: + inp (EasyDict): Dictionary of user input. + window_changed (bool): Window size has changed (force refresh). + """ + + max_w = self.content_frame_width - self.offset_x_pixels + max_h = self.content_frame_height - self.offset_y_pixels + + # Update the viewer in response to user input. + if self.viewer and self.viewer.movable: + # Update WSI focus location & zoom values + # If shift-dragging or scrolling. + dz = None + if not inp.dragging: + inp.dx, inp.dy = None, None + if inp.wheel > 0: + dz = 1/1.5 + if inp.wheel < 0: + dz = 1.5 + if inp.wheel or inp.dragging or self._refresh_view: + if inp.dx is not None: + self.viewer.move(inp.dx, inp.dy) + if inp.wheel: + self.viewer.zoom(inp.cx, inp.cy, dz) + if self._refresh_view and inp.dx is None and not inp.wheel: + self.viewer.refresh_view() + self._refresh_view = False + self.mouse_x, self.mouse_y = self.viewer.display_coords_to_wsi_coords(inp.cx, inp.cy, offset=False) + + # Render slide view. + self.viewer.render(max_w, max_h) + + # Render overlay heatmap. + if self.overlay is not None and self.show_overlay: + self.viewer.render_overlay( + self.overlay, + dim=self._overlay_wsi_dim, + offset=self._overlay_offset_wsi_dim) + + # Render overlay tooltip, if hovered. + if self.overlay_original is not None and self.show_overlay: + self.viewer.render_overlay_tooltip(self.overlay_original) + + # Calculate location for model display. + if (self.tile_preview_enabled + and inp.clicking + and not inp.dragging + and self.viewer.is_in_view(inp.cx, inp.cy)): + + wsi_x, wsi_y = self.viewer.display_coords_to_wsi_coords(inp.cx, inp.cy, offset=False) + self.x = wsi_x - (self.viewer.full_extract_px/2) + self.y = wsi_y - (self.viewer.full_extract_px/2) + + # Show box around location that a tile is being extracted for preview. + if self.x is not None and self.y is not None: + if inp.clicking or inp.dragging or inp.wheel or window_changed: + self.box_x, self.box_y = self.viewer.wsi_coords_to_display_coords(self.x, self.y) + tw = self.viewer.full_extract_px / self.viewer.view_zoom + + # Draw box on main display. + gl.glPolygonMode(gl.GL_FRONT_AND_BACK, gl.GL_LINE) + gl.glLineWidth(3) + box_pos = np.array([self.box_x, self.box_y]) + gl_utils.draw_rect(pos=box_pos, size=np.array([tw, tw]), color=self._box_color, mode=gl.GL_LINE_LOOP) + gl.glPolygonMode(gl.GL_FRONT_AND_BACK, gl.GL_FILL) + gl.glLineWidth(1) + + # Render ROIs. + self.viewer.late_render() + + def _draw_menu_bar(self) -> None: + """Draw the main menu bar (File, View, Help)""" + + if imgui.begin_main_menu_bar(): + # --- File -------------------------------------------------------- + if imgui.begin_menu('File', True): + if imgui.menu_item('New Project...', 'Ctrl+N')[1]: + self.project_widget.new_project() + imgui.separator() + if imgui.menu_item('Open Project...', 'Ctrl+P')[1]: + self.ask_load_project() + if imgui.menu_item('Open Slide...', 'Ctrl+O')[1]: + self.ask_load_slide() + if imgui.menu_item('Load Model...', 'Ctrl+M')[1]: + self.ask_load_model() + if imgui.menu_item('Load Heatmap...', 'Ctrl+H', enabled=self._model_path is not None)[1]: + self.ask_load_heatmap() + + # Widgets with "Open" menu options. + for w in self.widgets: + if hasattr(w, 'open_menu_options'): + w.open_menu_options() + + imgui.separator() + if imgui.begin_menu('Export...', True): + if imgui.menu_item('Main view')[1]: + self.capture_widget.save_view() + if imgui.menu_item('Tile view')[1]: + self.capture_widget.save_tile() + if imgui.menu_item('GUI view')[1]: + self.capture_widget.save_gui() + if imgui.menu_item('Heatmap (PNG)', enabled=(self.rendered_heatmap is not None))[0]: + h_img = Image.fromarray(self.rendered_heatmap) + h_img.resize(np.array(h_img.size) * 16, Image.NEAREST).save(f'{self.heatmap.slide.name}.png') + self.create_toast(f"Saved heatmap image to {self.heatmap.slide.name}.png", icon='success') + if imgui.menu_item('Heatmap (NPZ)', enabled=(self.heatmap is not None))[0]: + loc = self.heatmap.save_npz() + self.create_toast(f"Saved heatmap .npz to {loc}", icon='success') + imgui.end_menu() + imgui.separator() + if imgui.menu_item('Close Slide')[1]: + self.close_slide(True) + if imgui.menu_item('Close Model')[1]: + self.close_model(True) + + # Widgets with "File" menu. + for w in self.widgets: + if hasattr(w, 'file_menu_options'): + imgui.separator() + w.file_menu_options() + + imgui.separator() + if imgui.menu_item('Exit', 'Ctrl+Q')[1]: + self._exit_trigger = True + imgui.end_menu() + + # --- View -------------------------------------------------------- + has_wsi = self.viewer and isinstance(self.viewer, SlideViewer) + if imgui.begin_menu('View', True): + if imgui.menu_item('Fullscreen', 'Alt+Enter')[0]: + self.toggle_fullscreen() + imgui.separator() + + # --- Show sub-menu ------------------------------------------- + if imgui.begin_menu('Show', True): + if imgui.menu_item('Tile Preview', 'Ctrl+Shift+T', selected=self._show_tile_preview)[0]: + self._show_tile_preview = not self._show_tile_preview + imgui.separator() + if imgui.menu_item('Thumbnail', selected=(has_wsi and self.viewer.show_thumbnail), enabled=has_wsi)[0]: + self.viewer.show_thumbnail = not self.viewer.show_thumbnail + if imgui.menu_item('Scale', selected=(has_wsi and self.viewer.show_scale), enabled=has_wsi)[0]: + self.viewer.show_scale = not self.viewer.show_scale + + # Widgets with "View" menu. + for w in self.widgets: + if hasattr(w, 'show_menu_options'): + imgui.separator() + w.show_menu_options() + + imgui.end_menu() + # ------------------------------------------------------------- + + imgui.separator() + if imgui.menu_item('Increase Font Size', 'Ctrl+=')[1]: + self.increase_font_size() + if imgui.menu_item('Decrease Font Size', 'Ctrl+-')[1]: + self.decrease_font_size() + + imgui.separator() + if imgui.menu_item('Increase Tile Zoom', 'Ctrl+]')[1]: + self.increase_tile_zoom() + if imgui.menu_item('Decrease Tile Zoom', 'Ctrl+[')[1]: + self.decrease_tile_zoom() + if imgui.menu_item('Zoom to MPP', 'Ctrl+/')[1]: + self.ask_zoom_to_mpp() + if imgui.menu_item('Reset Tile Zoom', 'Ctrl+\\')[1]: + self.reset_tile_zoom() + + # Widgets with "View" menu. + for w in self.widgets: + if hasattr(w, 'view_menu_options'): + imgui.separator() + w.view_menu_options() + + imgui.end_menu() + + # --- Help -------------------------------------------------------- + if imgui.begin_menu('Help', True): + if imgui.menu_item('Get Started')[1]: + webbrowser.open('https://slideflow.dev/studio') + if imgui.menu_item('Documentation')[1]: + webbrowser.open('https://slideflow.dev') + + # Widgets with "Help" menu. + for w in self.widgets: + if hasattr(w, 'help_menu_options'): + imgui.separator() + w.help_menu_options() + + imgui.separator() + if imgui.menu_item('Release Notes')[1]: + webbrowser.open(join(sf.__github__, 'releases/tag', sf.__version__)) + if imgui.menu_item('Report Issue')[1]: + webbrowser.open(join(sf.__github__, 'issues')) + imgui.separator() + if imgui.menu_item('View License')[1]: + webbrowser.open(join(sf.__github__, 'blob/master/LICENSE')) + if imgui.menu_item('About')[1]: + self._show_about = True + imgui.end_menu() + + version_text = f'slideflow {sf.__version__}' + imgui_utils.right_aligned_text(version_text, spacing=self.spacing) + imgui.end_main_menu_bar() + + def _draw_status_bar(self) -> None: + """Draw the bottom status bar.""" + + h = self.status_bar_height + r = self.pixel_ratio + y_pos = int((self.content_frame_height - (h * r)) / r) + imgui.set_next_window_position(0-2, y_pos) + imgui.set_next_window_size(self.content_width+4, h) + imgui.push_style_color(imgui.COLOR_WINDOW_BACKGROUND, *self.theme.main_background) + imgui.push_style_var(imgui.STYLE_WINDOW_PADDING, [10, 5]) + + imgui.begin('Status bar', closable=True, flags=(imgui.WINDOW_NO_RESIZE + | imgui.WINDOW_NO_COLLAPSE + | imgui.WINDOW_NO_TITLE_BAR + | imgui.WINDOW_NO_MOVE + | imgui.WINDOW_NO_SCROLLBAR)) + + # Backend + backend = sf.slide_backend() + if backend == 'cucim': + tex = self.sidebar._button_tex[f'small_cucim'].gl_id + imgui.image(tex, self.font_size, self.font_size) + imgui.same_line() + imgui.text_colored('cuCIM', 0.55, 1, 0.47, 1) + elif backend == 'libvips': + tex = self.sidebar._button_tex[f'small_vips'].gl_id + imgui.image(tex, self.font_size, self.font_size) + imgui.same_line() + imgui.text_colored('VIPS', 0.47, 0.65, 1, 1) + else: + imgui.text(backend) + if imgui.is_item_hovered(): + imgui.set_tooltip("Slide backend") + + # Low memory mode + if self.low_memory: + tex = self.sidebar._button_tex[f'small_lowmem'].gl_id + imgui.same_line() + imgui.image(tex, self.font_size, self.font_size) + imgui.same_line() + imgui.text_colored("Low memory mode", 0.99, 0.75, 0.42, 1) + + # Status messages + if self._status_message: + self._status_message.render() + + # Location / MPP + if self.viewer and hasattr(self.viewer, 'mpp') and self.mouse_x is not None: + imgui_utils.right_aligned_text('x={:<8} y={:<8} mpp={:.3f}'.format( + int(self.mouse_x), int(self.mouse_y), self.viewer.mpp) + ) + elif self.viewer and self.mouse_x is not None: + imgui_utils.right_aligned_text( + 'x={:<8} y={:<8}'.format(int(self.mouse_x), int(self.mouse_y)) + ) + + imgui.end() + imgui.pop_style_color(1) + imgui.pop_style_var(1) + + def _draw_tile_view(self): + """Draw the tile view window, displaying the currently rendered tile(s). + + This window will show images rendered by a whole-slide viewer (image + tile extracted at some x/y location from the slide), or potentially an + image rendered via some other rendering mechanism as determined through + renderes set via ``.add_to_render_pipeline()``. For example, images + rendered by StyleGAN will be shown in this view. This view also shows + a post-processed, post-normalized rendered image, if available. + + Rendered images are expected to be stored in the OpenGL objects + ``.tex_obj`` and ``._norm_tex_obj``. + """ + if self._show_tile_preview: + has_raw_image = self._tex_obj is not None + has_norm_image = 'normalized' in self.result and self._norm_tex_obj is not None # self.model_widget.use_model and self._normalizer is not None and self._norm_tex_obj is not None and self.tile_px + if not has_raw_image: + return + + if not (has_raw_image or has_norm_image): + width = self.font_size * 8 + height = self.font_size * 3 + else: + raw_img_w = 0 if not has_raw_image else self._tex_img.shape[0] * self.tile_zoom + norm_img_w = 0 if not has_norm_image else self._norm_tex_img.shape[0] * self.tile_zoom + height = self.font_size * 2 + max(raw_img_w, norm_img_w) + width = raw_img_w + norm_img_w + self.spacing*2 + + imgui.set_next_window_size(width, height) + + if self._tile_preview_is_new: + imgui.set_next_window_position( + self.content_width - width - self.spacing, + self.content_height - height - self.spacing - self.status_bar_height + ) + self._tile_preview_is_new = False + + if self._tile_preview_image_is_new and (has_raw_image or has_norm_image): + imgui.set_next_window_position( + self.content_width - width - self.spacing, + self.content_height - height - self.spacing - self.status_bar_height + ) + self._tile_preview_image_is_new = False + + _, self._show_tile_preview = imgui.begin( + "##tile view", + closable=True, + flags=(imgui.WINDOW_NO_COLLAPSE + | imgui.WINDOW_NO_RESIZE + | imgui.WINDOW_NO_SCROLLBAR) + ) + + # Image preview =================================================== + dim_color = list(imgui.get_style().colors[imgui.COLOR_TEXT]) + dim_color[-1] *= 0.5 + imgui.begin_child('##pred_image', border=False) + imgui.image(self._tex_obj.gl_id, raw_img_w, raw_img_w) + if imgui.is_item_hovered(): + imgui.set_tooltip("Raw image") + imgui.same_line() + if has_norm_image: + imgui.image(self._norm_tex_obj.gl_id, norm_img_w, norm_img_w) + if imgui.is_item_hovered(): + imgui.set_tooltip("Stain-normalized image") + elif self._tex_obj is not None and self.tile_px: + imgui.text_colored('Normalizer not used', *dim_color) + imgui.end_child() + imgui.same_line() + imgui.end() + + def _glfw_key_callback(self, _window, key, _scancode, action, _mods): + """Callback for handling keyboard input.""" + super()._glfw_key_callback(_window, key, _scancode, action, _mods) + + if self._suspend_keyboard_input: + return + + if self._control_down and action == glfw.PRESS and key == glfw.KEY_N: + self.project_widget.new_project() + if self._control_down and self._shift_down and action == glfw.PRESS and key == glfw.KEY_T: + self._show_tile_preview = not self._show_tile_preview + if self._control_down and action == glfw.PRESS and key == glfw.KEY_Q: + self._exit_trigger = True + if self._control_down and action == glfw.PRESS and key == glfw.KEY_O: + self.ask_load_slide() + if self._control_down and not self._shift_down and action == glfw.PRESS and key == glfw.KEY_P: + self.ask_load_project() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_M: + self.ask_load_model() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_H: + self.ask_load_heatmap() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_SPACE: + self.heatmap_widget.show = True + if self._control_down and action == glfw.RELEASE and key == glfw.KEY_SPACE: + self.heatmap_widget.show = False + if self._control_down and action == glfw.PRESS and key == glfw.KEY_LEFT_BRACKET: + self.decrease_tile_zoom() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_RIGHT_BRACKET: + self.increase_tile_zoom() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_SLASH: + self.ask_zoom_to_mpp() + if self._control_down and action == glfw.PRESS and key == glfw.KEY_BACKSLASH: + self.reset_tile_zoom() + + self.slide_widget.keyboard_callback(key, action) + self.project_widget.keyboard_callback(key, action) + for widget in self.widgets: + if hasattr(widget, 'keyboard_callback'): + widget.keyboard_callback(key, action) + +
[docs] def suspend_mouse_input_handling(self): + """Suspend mouse input handling.""" + self._suspend_mouse_input = True
+ +
[docs] def resume_mouse_input_handling(self): + """Resume mouse input handling.""" + self._suspend_mouse_input = False
+ +
[docs] def suspend_keyboard_input(self) -> bool: + """Suspend keyboard input handling.""" + self._suspend_keyboard_input = True
+ +
[docs] def resume_keyboard_input(self) -> bool: + """Resume keyboard input handling.""" + self._suspend_keyboard_input = False
+ +
[docs] def mouse_input_is_suspended(self) -> bool: + """Check if mouse input handling is suspended.""" + return self._suspend_mouse_input
+ +
[docs] def is_mouse_down(self, mouse_idx: int = 0) -> bool: + """Check if the mouse is currently down.""" + if self._suspend_mouse_input: + return False + return imgui.is_mouse_down(mouse_idx)
+ +
[docs] def is_mouse_released(self, mouse_idx: int = 0) -> bool: + """Check if the mouse was released.""" + if self._suspend_mouse_input: + return False + return imgui.is_mouse_released(mouse_idx)
+ + def _handle_user_input(self): + """Handle user input to support clicking/dragging the main viewer.""" + + self._mouse_screen_x, self._mouse_screen_y = imgui.get_mouse_pos() + # Detect right mouse click in the main display. + clicking, cx, cy, wheel = imgui_utils.click_hidden_window( + '##result_area', + x=self.offset_x, + y=self.offset_y, + width=self.content_width - self.offset_x, + height=self.content_height - self.offset_y, + mouse_idx=1) + + # Ignore right click if the slide widget + # is capturing an ROI. + if self.slide_widget.editing_rois: + clicking = False + + # Detect dragging with left mouse in the main display. + dragging, dx, dy = imgui_utils.drag_hidden_window( + '##result_area', + x=self.offset_x, + y=self.offset_y, + width=self.content_width - self.offset_x, + height=self.content_height - self.offset_y) + + # Suspend mouse input handling if the user is interacting with a widget. + if self._suspend_mouse_input: + clicking, dragging, wheel = False, False, 0 + dx, dy = 0, 0 + + return EasyDict( + clicking=clicking, + dragging=dragging, + wheel=wheel, + cx=int(cx * self.pixel_ratio), + cy=int(cy * self.pixel_ratio), + dx=int(dx * self.pixel_ratio), + dy=int(dy * self.pixel_ratio) + ) + + def _load_and_return_wsi( + self, + path: Optional[str] = None, + stride: Optional[int] = None, + use_rois: bool = True, + tile_px: Optional[int] = None, + tile_um: Optional[Union[str, int]] = None, + **kwargs + ) -> Optional[sf.WSI]: + """Load and return a Whole-Slide Image, with modified parameters. + + Args: + path (str, optional): Path to the slide to reload. If not provided, + will reload the currently loaded slide. + stride (int, optional): Stride to use for the loaded slide. If not + provided, will use the stride value from the currently loaded + slide. + use_rois (bool): Use ROIs from the loaded project, if available. + + Returns: + slideflow.WSI: Reloaded slide. + + """ + if self.wsi is None and path is None: + return None + + # Path to slide. + if path is None: + path = self.wsi.path + + # Stride. + if stride is None and self.wsi is None: + stride = 1 + elif stride is None: + stride = self.wsi.stride_div + + # ROI filter method. + if self.wsi is None: + roi_filter_method = 'center' + else: + roi_filter_method = self.wsi.roi_filter_method + + # ROIs. + if self.wsi is not None and path == self.wsi.path: + roi_method = self.wsi.roi_method + prior_rois = self.wsi.rois + rois = None + elif self.P and use_rois: + rois = self.P.dataset().rois() + roi_method, prior_rois = None, None + else: + roi_method, prior_rois, rois = None, None, None + + # Cap the number of workers in the CUCIM backend. + if sf.slide_backend() == 'cucim': + kwargs['num_workers'] = sf.util.num_cpu(default=4) + + # Tile size. + if tile_px is None: + tile_px = (self.tile_px if self.tile_px else 256) + if tile_um is None: + tile_um = (self.tile_um if self.tile_um else 512) + + # Pass through QC mask if the slide is already loaded. + qc_mask = None if not self.wsi else self.wsi.get_qc_mask(roi=False) + + try: + wsi = sf.WSI( + path, + tile_px=tile_px, + tile_um=tile_um, + stride_div=stride, + rois=rois, + cache_kw=dict( + tile_width=512, + tile_height=512, + max_tiles=-1, + threaded=True, + persistent=True + ), + verbose=False, + mpp=self.slide_widget.manual_mpp, + use_bounds=self.settings_widget.use_bounds, + roi_filter_method=roi_filter_method, + simplify_roi_tolerance=self.settings_widget.simplify_tolerance, + **kwargs) + except sf.errors.IncompatibleBackendError: + self.create_toast( + title="Incompatbile slide", + message='Slide type "{}" is incompatible with the {} backend.'.format( + sf.util.path_to_ext(path), sf.slide_backend()), + icon='error' + ) + return None + else: + # Reapply QC + if qc_mask is not None: + wsi.apply_qc_mask(qc_mask) + + # Reapply ROIs + if prior_rois is not None: + wsi.rois = prior_rois + wsi.roi_method = roi_method + wsi.process_rois() + return wsi + +
[docs] def reload_wsi( + self, + slide: Optional[Union[str, sf.WSI]] = None, + stride: Optional[int] = None, + use_rois: bool = True, + tile_px: Optional[int] = None, + tile_um: Optional[Union[str, int]] = None, + **kwargs + ) -> bool: + """Reload the currently loaded Whole-Slide Image. + + Args: + path (str or sf.WSI, optional): Slide to reload. May be a path + or a sf.WSI object. If not provided, will reload the + currently loaded slide. + stride (int, optional): Stride to use for the loaded slide. If not + provided, will use the stride value from the currently loaded + slide. + use_rois (bool): Use ROIs from the loaded project, if available. + + Returns: + bool: True if slide loaded successfully, False otherwise. + + """ + if isinstance(slide, sf.WSI): + wsi = slide + else: + wsi = self._load_and_return_wsi( + slide, stride, use_rois, tile_px, tile_um, **kwargs + ) + + if wsi: + self.wsi = wsi + old_viewer = self.viewer + self.set_viewer(SlideViewer(wsi, **self._viewer_kwargs())) + self.set_title(os.path.basename(wsi.path)) + if isinstance(old_viewer, SlideViewer): + self.viewer.show_thumbnail = old_viewer.show_thumbnail + self.viewer.show_scale = old_viewer.show_scale + return True + else: + return False
+ + def _render_prediction_message(self, message: str) -> None: + """Render a prediction string to below the tile bounding box. + + Args: + message (str): Message to render. + """ + max_w = self.content_frame_width - self.offset_x_pixels + max_h = self.content_frame_height - self.offset_y_pixels + tex = text_utils.get_texture( + message, + size=self.gl_font_size, + max_width=max_w, + max_height=max_h, + outline=2 + ) + box_w = self.viewer.full_extract_px / self.viewer.view_zoom + text_pos = np.array([self.box_x + (box_w/2), self.box_y + box_w + self.font_size]) + tex.draw(pos=text_pos, align=0.5, rint=True, color=1) + + def _render_control_pane_contents(self) -> None: + """Perform rendering of control panel contents, such as WSI thumbnails, + widgets, and heatmaps.""" + + # Render WSI thumbnail in the widget. + if self.wsi_thumb is not None: + if self._wsi_tex_img is not self.wsi_thumb: + self._wsi_tex_img = self.wsi_thumb + if self._wsi_tex_obj is None or not self._wsi_tex_obj.is_compatible(image=self._wsi_tex_img): + if self._wsi_tex_obj is not None: + self._tex_to_delete += [self._wsi_tex_obj] + self._wsi_tex_obj = gl_utils.Texture(image=self._wsi_tex_img, bilinear=True, mipmap=True) + else: + self._wsi_tex_obj.update(self._wsi_tex_img) + + # Display rendered (non-transparent) heatmap in widget. + # Render overlay heatmap. + if self.heatmap: + if self._heatmap_tex_img is not self.rendered_heatmap: + self._heatmap_tex_img = self.rendered_heatmap + if self._heatmap_tex_obj is None or not self._heatmap_tex_obj.is_compatible(image=self._heatmap_tex_img): + if self._heatmap_tex_obj is not None: + self._tex_to_delete += [self._heatmap_tex_obj] + self._heatmap_tex_obj = gl_utils.Texture(image=self._heatmap_tex_img, bilinear=False, mipmap=False) + else: + self._heatmap_tex_obj.update(self._heatmap_tex_img) + + def _viewer_kwargs(self) -> Dict[str, Any]: + """Keyword arguments to use for loading a Viewer.""" + + return dict( + width=self.content_frame_width - self.offset_x_pixels, + height=self.content_frame_height - self.offset_y_pixels, + x_offset=self.offset_x_pixels, + y_offset=self.offset_y_pixels, + normalizer=(self._normalizer if self._normalize_wsi else None), + viz=self + ) + + def _update_window_limits(self): + """Update the minimum window size limits based on loaded widgets.""" + + minheight = (((len(self.sidebar.navbuttons) + 3) + * (self.sidebar.navbutton_width / (self.font_size / 22))) + + self.status_bar_height + + self.menu_bar_height) + + glfw.set_window_size_limits( + self._glfw_window, + minwidth=int(self.sidebar.content_width+100), + minheight=int(minheight), + maxwidth=-1, + maxheight=-1) + + # --- Imgui methods ------------------------------------------------------- + +
[docs] @contextmanager + def dim_text(self, dim=True): + """Render dim text. + + Examples + Render dim text. + + .. code-block:: python + + with studio.dim_text(): + imgui.text('This is dim') + + """ + if dim: + imgui.push_style_color(imgui.COLOR_TEXT, *self.theme.dim) + yield + if dim: + imgui.pop_style_color(1)
+ +
[docs] @contextmanager + def highlighted(self, enable: bool = True): + """Render highlighted text. + + Args: + enable (bool): Whether to enable highlighting. + + Examples + Render highlighted text. + + .. code-block:: python + + with studio.highlighted(True): + imgui.text('This is highlighted') + + """ + if enable: + imgui.push_style_color(imgui.COLOR_BUTTON, *self.theme.button_active) + yield + if enable: + imgui.pop_style_color(1)
+ +
[docs] def collapsing_header(self, text, **kwargs): + """Render a collapsing header using the active theme. + + Examples + Render a collapsing header that is open by default. + + .. code-block:: python + + if viz.collapsing_header("Header", default=True): + imgui.text("Text underneath") + + Args: + text (str): Header text. + + """ + imgui.push_style_color(imgui.COLOR_HEADER, *self.theme.header) + imgui.push_style_color(imgui.COLOR_HEADER_HOVERED, *self.theme.header_hovered) + imgui.push_style_color(imgui.COLOR_HEADER_ACTIVE, *self.theme.header_hovered) + imgui.push_style_color(imgui.COLOR_TEXT, *self.theme.header_text) + expanded = imgui_utils.collapsing_header(text.upper(), **kwargs)[0] + imgui.pop_style_color(4) + return expanded
+ +
[docs] def collapsing_header2(self, text, **kwargs): + """Render a second-level collapsing header using the active theme. + + Examples + Render a collapsing header that is open by default. + + .. code-block:: python + + if viz.collapsing_header("Header", default=True): + imgui.text("Text underneath") + + Args: + text (str): Header text. + + """ + imgui.push_style_color(imgui.COLOR_HEADER, *self.theme.header2) + imgui.push_style_color(imgui.COLOR_HEADER_HOVERED, *self.theme.header2_hovered) + imgui.push_style_color(imgui.COLOR_HEADER_ACTIVE, *self.theme.header2_hovered) + imgui.push_style_color(imgui.COLOR_TEXT, *self.theme.header2_text) + expanded = imgui_utils.collapsing_header(text.upper(), **kwargs)[0] + imgui.pop_style_color(4) + return expanded
+ +
[docs] def header(self, text): + """Render a header using the active theme. + + Args: + text (str): Text for the header. Text will be rendered in + uppercase. + + """ + with imgui_utils.header( + text.upper(), + hpad=self.font_size, + vpad=(int(self.font_size*0.4), int(self.font_size*0.75)) + ): + pass
+ +
[docs] @contextmanager + def header_with_buttons(self, text): + """Render a widget header with ability to add buttons. + + Examples + Render a header with a gear icon. + + .. code-block:: python + + with studio.header_with_buttons('Button'): + # Right align the button + x_width = imgui.get_content_region_max()[0] + imgui.same_line(x_width - 30) + cx, cy = imgui.get_cursor_pos() + imgui.set_cursor_position((cx, cy-5)) + + # Render the button + if sidebar.small_button('gear'): + do_something() + + Args: + text (str): Text for the header. Text will be rendered in + uppercase. + + """ + with imgui_utils.header( + text.upper(), + hpad=self.font_size, + vpad=(int(self.font_size*0.4), int(self.font_size*0.75)) + ): + yield
+ +
[docs] def center_next_window(self, width, height): + """Center the next imgui window. + + Args: + width (int): Width of the next window. + height (int): Height of the next window. + + """ + + imgui.set_next_window_position( + (self.content_width - width) / 2, + (self.content_height - height - self.status_bar_height) / 2 + )
+ + # --- Public methods ------------------------------------------------------ + +
[docs] def reset_background(self): + """Reset the Studio background to the default theme color.""" + self._background_color = self.theme.main_background
+ +
[docs] def add_widgets(self, widgets: Widget) -> None: + """Add widget extension(s). + + Add widgets to Studio and the sidebar. The ``.tag`` property is used + as a unique identifier for the widget. The ``.icon`` property should + be a path to an image file used for rendering the sidebar navigation + icon. ``.icon_highlighted`` property should be a path to an image file + used for rendering a hovered navigation icon. + + The widget should implement ``__call__()`` and ``.close()`` methods + for rendering the imgui GUI and cleanup, respectively. + + Args: + widgets (list(:class:`slideflow.studio.widgets.Widget`)): List of + widgets to add as extensions. These should be classes, not + instantiated objects. + + """ + if not isinstance(widgets, list): + widgets = [widgets] + for widget in widgets: + self.widgets += [widget(self)] + self.sidebar.add_widgets(widgets) + self._update_window_limits()
+ +
[docs] def remove_widget(self, widget: Widget) -> None: + """Remove a widget from Studio. + + Args: + widget (:class:`slideflow.studio.widgets.Widget`): Widget to remove. + This should be a class, not an instantiated object. + + """ + widget_obj = None + for w_idx, w in enumerate(self.widgets): + if isinstance(w, widget): + widget_obj = w + self.widgets.remove(w) + break + if widget_obj is None: + raise ValueError(f'Could not find widget "{widget}"') + widget_obj.close() + self.sidebar.remove_widget(widget_obj.tag) + self._update_window_limits()
+ +
[docs] def add_to_render_pipeline( + self, + renderer: Any, + name: Optional[str] = None + ) -> None: + """Add a renderer to the rendering pipeline.""" + if name is not None: + self._addl_renderers[name] = renderer + self._render_manager.add_to_render_pipeline(renderer)
+ +
[docs] def remove_from_render_pipeline(self, name: str): + """Remove a renderer from the render pipeline. + + Remove a renderer added with ``.add_to_render_pipeline()``. + + Args: + name (str): Name of the renderer to remove. + + """ + if name not in self._addl_renderers: + raise ValueError(f'Could not find renderer "{name}"') + renderer = self._addl_renderers[name] + if self._render_manager is not None: + self._render_manager.remove_from_render_pipeline(renderer) + del self._addl_renderers[name]
+ +
[docs] def ask_load_heatmap(self): + """Prompt user for location of exported heatmap (\*.npz) and load.""" + npz_path = askopenfilename(title="Load heatmap...", filetypes=[("*.npz", "*.npz")]) + if npz_path: + self.load_heatmap(npz_path)
+ +
[docs] def ask_load_model(self): + """Prompt user for location of a model and load.""" + if sf.backend() == 'tensorflow': + model_path = askdirectory(title="Load model (directory)...") + else: + model_path = askopenfilename(title="Load model...", filetypes=[("zip", ".zip"), ("All files", ".*")]) + if model_path: + self.load_model(model_path, ignore_errors=True)
+ +
[docs] def ask_load_project(self): + """Prompt user for location of a project and load.""" + project_path = askdirectory(title="Load project (directory)...") + if project_path: + self.load_project(project_path, ignore_errors=True)
+ +
[docs] def ask_load_slide(self): + """Prompt user for location of a slide and load.""" + slide_path = askopenfilename(title="Load slide...", filetypes=[("All files", ".*"), + ("Aperio ScanScope", ("*.svs", "*.svslide")), + ("Hamamatsu", ("*.ndpi", "*.vms", "*.vmu")), + ("Leica", "*.scn"), + ("MIRAX", "*.mrxs"), + ("Roche, Ventana", "*.bif"), + ("Pyramid TIFF", ("*.tiff", "*.tif")), + ("JPEG", (".jpg", "*.jpeg"))]) + if slide_path: + self.load_slide(slide_path, ignore_errors=True)
+ +
[docs] def autoload(self, path, ignore_errors=False): + """Automatically load a path, detecting the type of object to load. + + Supports slides, models, projects, and other items supported by + widgets if the widget has implemented a `.drag_and_drop_hook` function. + + Args: + path (str): Path to file to load. + ignore_errors (bool): Gracefully handle errors. + + """ + sf.log.info(f"Auto-loading [green]{path}[/]") + if sf.util.is_project(path): + self.load_project(path, ignore_errors=ignore_errors) + elif sf.util.is_slide(path): + self.load_slide(path, ignore_errors=ignore_errors) + elif sf.util.is_model(path) or path.endswith('tflite'): + self.load_model(path, ignore_errors=ignore_errors) + elif path.endswith('npz'): + self.load_heatmap(path) + else: + # See if any widgets implement a drag_and_drop_hook() method + handled = False + for widget in self.widgets: + sf.log.info(f"Attempting load with widget {widget}") + if hasattr(widget, 'drag_and_drop_hook'): + if widget.drag_and_drop_hook(path): + handled = True + break + if not handled: + self.create_toast(f"No loading handler found for {path}", icon="error")
+ +
[docs] def clear_overlay(self) -> None: + """Remove the current overlay image, include heatmaps and masks.""" + self.overlay = None + self.overlay_original = None + if self.viewer is not None: + self.viewer.clear_overlay()
+ +
[docs] def clear_result(self) -> None: + """Clear all shown results and images.""" + self.clear_model_results() + self.clear_overlay() + self.result = EasyDict() + self.args = EasyDict() + self._wsi_tex_img = None + if self.viewer: + self.viewer.clear()
+ +
[docs] def clear_message(self, msg: str = None) -> bool: + """Clear a specific message from display, if the message is being shown. + + Args: + msg (str): Message to clear. + + Returns: + bool: Whether message was cleared from display. + """ + if msg is None or self._message == msg: + self._message = None + return True + return False
+ + def clear_prediction_message(self) -> None: + self._pred_message = None + +
[docs] def clear_model_results(self) -> None: + """Clear all model results and associated images.""" + if self._render_manager is not None: + self._render_manager.clear_result() + self._predictions = None + self._norm_tex_img = None + self._norm_tex_obj = None + self._heatmap_tex_img = None + self._heatmap_tex_obj = None + if self.viewer is not None: + self.viewer.clear_overlay()
+ +
[docs] def close(self) -> None: + """Close the application and renderer.""" + super().close() + if self._render_manager is not None: + self._render_manager.close() + self._render_manager = None + if hasattr(self.viewer, 'close'): + self.viewer.close() + for w in self.widgets: + if hasattr(w, 'close'): + w.close()
+ +
[docs] def close_model(self, now: bool = False) -> None: + """Close the currently loaded model. + + Args: + now (bool): Close the model now, instead of at the end of the frame. + Defaults to False (closes model at frame end). + """ + if now: + self._close_model_now() + self._should_close_model = False + else: + self._should_close_model = True
+ +
[docs] def close_slide(self, now: bool = False) -> None: + """Close the currently loaded slide. + + Args: + now (bool): Close the slide now, instead of at the end of the frame. + Defaults to False (closes slide at frame end). + """ + if now: + self._close_slide_now() + self._should_close_slide = False + else: + self._should_close_slide = True
+ +
[docs] def defer_rendering(self, num_frames: int = 1) -> None: + """Defer rendering for a number of frames.""" + self._defer_rendering = max(self._defer_rendering, num_frames)
+ +
[docs] def draw_frame(self) -> None: + """Main draw loop.""" + + self.begin_frame() + + self.args = EasyDict(use_model=False, use_uncertainty=False, use_saliency=False) + self.button_w = self.font_size * 5 + self.label_w = round(self.font_size * 4.5) + self.menu_bar_height = self.font_size + self.spacing/2 + + max_w = self.content_frame_width - self.offset_x_pixels + max_h = self.content_frame_height - self.offset_y_pixels + window_changed = (self._content_width != self.content_width + or self._content_height != self.content_height + or self._pane_w != self.pane_w) + + # Process drag-and-drop files + paths = self.pop_drag_and_drop_paths() + if paths is not None and len(paths) >= 1: + self.autoload(paths[0], ignore_errors=True) + + self._clear_textures() + self._draw_control_pane() + self._draw_menu_bar() + self._draw_about_dialog() + self._draw_mpp_zoom_dialog() + + user_input = self._handle_user_input() + + # Re-generate WSI view if the window size changed, or if we don't + # yet have a SlideViewer initialized. + if window_changed: + self._content_width = self.content_width + self._content_height = self.content_height + self._pane_w = self.pane_w + + for widget in self.widgets: + if hasattr(widget, '_on_window_change'): + widget._on_window_change() + + # Main display. + if self.viewer: + self.viewer.update(**self._viewer_kwargs()) + self._draw_main_view(user_input, window_changed) + else: + self._draw_empty_background() + + # --- Render arguments ------------------------------------------------ + self.args.x = self.x + self.args.y = self.y + self.args.tile_px = self.tile_px + self.args.tile_um = self.tile_um + if (self._model_config is not None and self._use_model): + self.args.tile_px = self._model_config['tile_px'] + self.args.tile_um = self._model_config['tile_um'] + if 'img_format' in self._model_config and self._use_model_img_fmt: + self.args.img_format = self._model_config['img_format'] + self.args.use_model = self._use_model + self.args.use_uncertainty = (self.has_uq() and self._use_uncertainty) + self.args.use_saliency = self._use_saliency + self.args.normalizer = self._normalizer + + # Buffer tile view if using a live viewer. + if self.has_live_viewer() and self.args.x and self.args.y: + + if (self._render_manager.is_async + and self._render_manager._args_queue.qsize() > 2): + if self._defer_tile_refresh is None: + self._defer_tile_refresh = time.time() + self.defer_rendering() + elif time.time() - self._defer_tile_refresh < 2: + self.defer_rendering() + else: + self._defer_tile_refresh = None + + self.viewer.x = self.x + self.viewer.y = self.y + self.args.full_image = self.viewer.tile_view + self.args.tile_px = self.viewer.tile_px + self.args.tile_um = self.viewer.tile_um + self.viewer.apply_args(self.args) + + if self.has_live_viewer(): + self.args.viewer = None + else: + self.args.viewer = self.viewer + # --------------------------------------------------------------------- + + # Render control pane contents. + self._render_control_pane_contents() + + + if self.is_skipping_frames(): + pass + elif self._defer_rendering > 0: + self._defer_rendering -= 1 + else: + self._render_manager.set_args(**self.args) + result = self._render_manager.get_result() + if result is not None: + self.result = result + if 'predictions' in result: + self._predictions = result.predictions + self._uncertainty = result.uncertainty + + # Update input image textures (tile view). + middle_pos = np.array([self.offset_x_pixels + max_w/2, max_h/2]) + if 'image' in self.result: + if self._tex_img is not self.result.image: + self._tex_img = self.result.image + if self._tex_obj is None or not self._tex_obj.is_compatible(image=self._tex_img): + if self._tex_obj is not None: + self._tex_to_delete += [self._tex_obj] + self._tex_obj = gl_utils.Texture(image=self._tex_img, bilinear=False, mipmap=False) + else: + self._tex_obj.update(self._tex_img) + if 'normalized' in self.result: + if self._norm_tex_img is not self.result.normalized: + self._norm_tex_img = self.result.normalized + if self._norm_tex_obj is None or not self._norm_tex_obj.is_compatible(image=self._norm_tex_img): + if self._norm_tex_obj is not None: + self._tex_to_delete += [self._norm_tex_obj] + self._norm_tex_obj = gl_utils.Texture(image=self._norm_tex_img, bilinear=False, mipmap=False) + else: + self._norm_tex_obj.update(self._norm_tex_img) + if 'error' in self.result: + self.print_error(self.result.error) + if 'message' not in self.result: + self.result.message = str(self.result.error) + if 'message' in self.result or self.message: + _msg = self.message if 'message' not in self.result else self.result['message'] + tex = text_utils.get_texture(_msg, size=self.gl_font_size, max_width=max_w, max_height=max_h, outline=2) + tex.draw(pos=middle_pos, align=0.5, rint=True, color=1) + + # Render user widgets. + for widget in self.widgets: + if hasattr(widget, 'render'): + widget.render() + + # Render slide widget tile boxes (for tile extraction preview) + self.slide_widget.early_render() + + # Render the tile view and status bar. + self._draw_tile_view() + self._draw_status_bar() + + # Draw prediction message next to box on main display. + if self._pred_message and self.viewer is not None: + self._render_prediction_message(self._pred_message) + elif (self._use_model + and self._predictions is not None + and not isinstance(self._predictions, list) + and self.viewer is not None): + if not hasattr(self.result, 'in_focus') or self.result.in_focus: + pred_str = prediction_to_string( + predictions=self._predictions, + outcomes=self._model_config['outcome_labels'], + is_classification=(self._model_config['model_type'] == 'classification') + ) + self._render_prediction_message(pred_str) + + # End frame. + if self._should_close_model: + self.close_model(True) + if self._should_close_slide: + self.close_slide(True) + + self.end_frame()
+ +
[docs] @staticmethod + def get_default_widgets() -> List[Any]: + """Returns a list of the default non-mandatory extension widgets.""" + return []
+ +
[docs] def get_renderer(self, name: Optional[str] = None) -> Optional[Renderer]: + """Check for the given additional renderer in the rendering pipeline. + + Args: + name (str): Name of the renderer to check for. If None, + returns the main renderer. + + Returns: + Renderer if name is a recognized renderer, otherwise None + + """ + if name is None: + if (self._render_manager is not None + and self._render_manager._renderer_obj is not None): + return self._render_manager._renderer_obj + else: + return None + elif name in self._addl_renderers: + return self._addl_renderers[name] + else: + return None
+ +
[docs] def get_extension(self, tag: str) -> Optional[Widget]: + """Returns a given widget (extension) by tag. + + Args: + tag (str): Tag of the widget to search for. + + Returns: + slideflow.studio.widgets.Widget if found, else None + + """ + for w in self.widgets: + if w.tag == tag: + return w + return None
+ +
[docs] def get_widget(self, name: str) -> Widget: + """Returns a given widget by class name. + + Args: + name (str): Name of the widget to search for. + + Returns: + slideflow.studio.widgets.Widget + + Raises: + ValueError: If the widget could not be found. + + """ + for w in self.widgets: + if w.__class__.__name__ == name: + return w + raise ValueError(f"Unable to find widget with class name {name}")
+ +
[docs] def has_live_viewer(self) -> bool: + """Check if the current viewer is a live viewer (e.g. camera feed).""" + return (self.viewer is not None and self.viewer.live)
+ +
[docs] def has_uq(self) -> bool: + """Check if the current model supports uncertainty quantification.""" + return (self._model_path is not None + and self._model_config is not None + and 'uq' in self._model_config['hp'] + and self._model_config['hp']['uq'])
+ +
[docs] def ask_zoom_to_mpp(self) -> None: + """Prompt the user to zoom to a specific microns-per-pixel (MPP).""" + if self.viewer and isinstance(self.viewer, SlideViewer): + self._show_mpp_zoom_popup = True
+ +
[docs] def increase_tile_zoom(self) -> None: + """Increase zoom of tile view two-fold.""" + self.tile_zoom *= 2
+ +
[docs] def decrease_tile_zoom(self) -> None: + """Decrease zoom of tile view by half.""" + self.tile_zoom /= 2
+ +
[docs] def reset_tile_zoom(self) -> None: + """Reset tile zoom level.""" + self.tile_zoom = 1
+ +
[docs] def load_heatmap(self, path: Union[str, "sf.Heatmap"]) -> None: + """Load a saved heatmap (\*.npz). + + Args: + path (str): Path to exported heatmap in \*.npz format, as generated + by Heatmap.save() or Heatmap.save_npz(). + + """ + if self._model_config is None: + self.create_toast( + "Unable to load heatmap; model must also be loaded.", + icon="error" + ) + return + try: + self.heatmap_widget.load(path) + self.create_toast(f"Loaded heatmap at {path}", icon="success") + + except Exception as e: + log.warn("Exception raised loading heatmap: {}".format(e)) + self.create_toast(f"Error loading heatmap at {path}", icon="error")
+ +
[docs] def load_model(self, model: str, ignore_errors: bool = False) -> None: + """Load the given model. + + Args: + model (str): Path to Slideflow model (in either backend). + ignore_errors (bool): Do not fail if an error is encountered. + Defaults to False. + + """ + log.debug("Loading model from Studio") + self.close_model(True) + log.debug("Model closed") + self.clear_result() + log.debug("Model result cleared") + self.skip_frame() # The input field will change on next frame. + self._render_manager.get_result() # Flush prior result + self._render_manager.clear_result() + try: + + # Trigger user widgets + for widget in self.widgets: + if hasattr(widget, '_before_model_load'): + widget._before_model_load() + + self.defer_rendering() + self.model_widget.user_model = model + + # Read model configuration + config = sf.util.get_model_config(model) + normalizer = sf.util.get_model_normalizer(model) + self.result.message = f'Loading {config["model_name"]}...' + self.defer_rendering() + self._use_model = True + self._model_path = model + self._model_config = config + self._normalizer = normalizer + self._predictions = None + self._uncertainty = None + self._use_uncertainty = 'uq' in config['hp'] and config['hp']['uq'] + self.tile_um = config['tile_um'] + self.tile_px = config['tile_px'] + self._render_manager.load_model(model) + if sf.util.torch_available and sf.util.path_to_ext(model) == 'zip': + self.model_widget.backend = 'torch' + else: + self.model_widget.backend = 'tensorflow' + + # Update widgets + log.debug("Updating widgets") + self.model_widget.reset() + self.model_widget.cur_model = model + self.model_widget.use_model = True + self.model_widget.use_uncertainty = 'uq' in config['hp'] and config['hp']['uq'] + if normalizer is not None and hasattr(self, 'slide_widget'): + self.slide_widget.add_model_normalizer_option() + self.slide_widget.norm_idx = len(self.slide_widget._normalizer_methods)-1 + if self.wsi: + log.debug(f"Loading slide... tile_px={self.tile_px}, tile_um={self.tile_um}") + self.slide_widget.load( + self.wsi.path, + mpp=self.slide_widget.manual_mpp, + ignore_errors=ignore_errors + ) + if hasattr(self, 'heatmap_widget'): + log.debug("Resetting heatmap") + self.heatmap_widget.reset() + if not self.sidebar.expanded: + self.sidebar.selected = 'model' + self.sidebar.expanded = True + + # Update viewer + self._show_tile_preview = True + log.debug("Updating viewer with tile_px={}, tile_um={}".format(self.tile_px, self.tile_um)) + if self.viewer and not isinstance(self.viewer, SlideViewer): + self.viewer.set_tile_px(self.tile_px) + self.viewer.set_tile_um(self.tile_um) + + # Trigger user widgets + for widget in self.widgets: + if hasattr(widget, '_on_model_load'): + widget._on_model_load() + + self.create_toast(f"Loaded model at {model}", icon="success") + + except Exception as e: + self.model_widget.cur_model = None + if model == '': + log.debug("Exception raised: no model loaded.") + self.result = EasyDict(message='No model loaded') + else: + log.warn("Exception raised (ignore_errors={}): {}".format(ignore_errors, e)) + self.create_toast(f"Error loading model at {model}", icon="error") + self.result = EasyDict(error=CapturedException()) + if not ignore_errors: + raise + log.debug("Model loading complete (path={})".format(self._model_path))
+ +
[docs] def load_project(self, project: str, ignore_errors: bool = False) -> None: + """Load the given project. + + Args: + project (str): Path to Slideflow project. + ignore_errors (bool): Do not fail if an error is encountered. + Defaults to False. + """ + self.project_widget.load(project, ignore_errors=ignore_errors)
+ +
[docs] def load_slide(self, slide: str, **kwargs) -> None: + """Load the given slide. + + Args: + slide (str): Path to whole-slide image. + stride (int, optional): Stride for tiles. 1 is non-overlapping + tiles, 2 is tiles with 50% overlap, etc. Defaults to 1. + ignore_errors (bool): Do not fail if an error is encountered. + Defaults to False. + """ + self.slide_widget.load(slide, **kwargs) + + # Trigger user widgets + for widget in self.widgets: + if hasattr(widget, '_on_slide_load'): + widget._on_slide_load()
+ +
[docs] def print_error(self, error: str) -> None: + """Print the given error message.""" + error = str(error) + if error != self._last_error_print: + print('\n' + error + '\n') + self._last_error_print = error
+ +
[docs] def reload_model(self) -> None: + """Reload the current model.""" + self._render_manager.load_model(self._model_path)
+ +
[docs] def reload_viewer(self) -> None: + """Reload the current main viewer.""" + if self.viewer is not None: + self.viewer.close() + if isinstance(self.viewer, SlideViewer): + old_viewer = self.viewer + self.set_viewer(SlideViewer(self.wsi, **self._viewer_kwargs())) + self.viewer.show_thumbnail = old_viewer.show_thumbnail + self.viewer.show_scale = old_viewer.show_scale + else: + self.viewer.reload(**self._viewer_kwargs())
+ +
[docs] def set_message(self, msg: str) -> None: + """Set a message for display.""" + self._message = msg
+ +
[docs] def set_status_message( + self, + message: str, + description: Optional[str] = None, + *, + color: Optional[Tuple[float, float, float]] = (0.7, 0, 0, 1), + text_color: Tuple[float, float, float, float] = (1, 1, 1, 1), + rounding: int = 0 + ) -> None: + """Set the status message to display in the status bar.""" + if not message: + self.clear_status_message() + return + self._status_message = StatusMessage( + self, + message, + description=description, + color=color, + text_color=text_color, + rounding=rounding + )
+ +
[docs] def clear_status_message(self) -> None: + """Clear the status message from the status bar.""" + self._status_message = None
+ +
[docs] def set_prediction_message(self, msg: str) -> None: + """Set the prediction message to display under the tile outline.""" + self._pred_message = msg
+ +
[docs] def set_overlay( + self, + overlay: np.ndarray, + method: int, + *, + original: Optional[np.ndarray] = None + ) -> None: + """Configure the overlay to be applied to the current view screen. + + Overlay is a numpy array, and method is a flag indicating the + method to use when showing the overlay. + + If ``method`` is ``sf.studio.OVERLAY_WSI``, the array will be mapped + to the entire whole-slide image, without offsets. + + If ``method`` is ``sf.studio.OVERLAY_GRID``, the array is interpreted + as having been generated from the slide's grid, meaning that an offset + will be applied to ensure that the overlay is aligned properly. + + If ``method`` is ``sf.studio.OVERLAY_VIEW``, the array is interpreted + as an overlay that is applied only to the area of the slide + currently in view. + + Args: + overlay (np.ndarray): Overlay to render. + method (int): Mapping method for linking the overlay to the + whole-slide image. + + Keyword args: + original (np.ndarray, optional): Original grid values before any + colorization or other modifications. Used for displaying the + tooltip when alt-hovering. Defaults to None. + + """ + if self.viewer is None: + raise ValueError("Unable to set overlay; viewer not loaded.") + if original is not None and original.shape != overlay.shape: + raise ValueError("Unable to set grid overlay; original grid shape " + "does not match grid shape.") + self.overlay = overlay + self.overlay_original = original + if method == OVERLAY_WSI: + # Overlay maps to the entire whole-slide image, + # with no offset needed. + self._overlay_wsi_dim = self.wsi.dimensions + self._overlay_offset_wsi_dim = (0, 0) + elif method == OVERLAY_GRID: + # Overlay was generated from the slide's grid, meaning + # that we need to apply an offset to ensure the overlay + # lines up apppropriately. + self.set_grid_overlay(overlay) + elif method == OVERLAY_VIEW: + # Overlay should only apply to the area of the WSI + # currently in view. + self._overlay_wsi_dim = self.viewer.wsi_window_size + self._overlay_offset_wsi_dim = self.viewer.origin + else: + raise ValueError(f"Unrecognized method {method}")
+ +
[docs] def set_grid_overlay( + self, + grid: np.ndarray, + *, + tile_um: Optional[int] = None, + stride_div: Optional[int] = None, + mpp: Optional[float] = None, + original: Optional[np.ndarray] = None + ) -> None: + """Set the grid overlay to the given grid. + + Args: + grid (np.ndarray): Grid to render as an overlay. + + Keyword args: + tile_um (int, optional): Tile size, in microns. If None, uses + the tile size of the currently loaded slide. + stride_div (int, optional): Stride divisor. If None, uses + the stride divisor of the currently loaded slide. + mpp (float, optional): Microns per pixel. If None, uses + the MPP of the currently loaded slide. + original (np.ndarray, optional): Original grid values before any + colorization or other modifications. Used for displaying the + tooltip when alt-hovering. Defaults to None. + + """ + if self.viewer is None: + raise ValueError("Unable to set grid overlay; viewer not loaded.") + if any(x is None for x in (tile_um, stride_div, mpp)) and self.wsi is None: + raise ValueError("Unable to set grid overlay; no slide loaded.") + if original is not None and original.shape[0:2] != grid.shape[0:2]: + raise ValueError("Unable to set grid overlay; original grid shape " + "({}) does not match grid shape ({}).".format( + original.shape, grid.shape + )) + + self.overlay = grid + self.overlay_original = original + if tile_um is None: + tile_um = self.wsi.tile_um + if stride_div is None: + stride_div = self.wsi.stride_div + if mpp is None: + mpp = self.wsi.mpp + full_extract = int(tile_um / mpp) + wsi_stride = int(full_extract / stride_div) + self._overlay_wsi_dim = (wsi_stride * (grid.shape[1]), + wsi_stride * (grid.shape[0])) + self._overlay_offset_wsi_dim = (full_extract/2 - wsi_stride/2, + full_extract/2 - wsi_stride/2)
+ +
[docs] def set_viewer(self, viewer: Any) -> None: + """Set the main viewer. + + Args: + viewer (:class:`slideflow.studio.gui.viewer.Viewer`): Viewer to use. + + """ + log.debug("Setting viewer to {}".format(viewer)) + if self.viewer is not None: + self.viewer.close() + self.viewer = viewer + self._render_manager._live_updates = viewer.live + self._render_manager.set_async(viewer.live)
+ +# ----------------------------------------------------------------------------- + + + +# ----------------------------------------------------------------------------- +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/util/index.html b/docs/_modules/slideflow/util/index.html new file mode 100644 index 000000000..165db3aeb --- /dev/null +++ b/docs/_modules/slideflow/util/index.html @@ -0,0 +1,2178 @@ + + + + + + + + + + + + slideflow.util — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.util

+import atexit
+import csv
+import importlib.util
+import json
+import logging
+import os
+import re
+import shutil
+import sys
+import requests
+import tarfile
+import hashlib
+import pandas as pd
+import tempfile
+import threading
+import multiprocessing as mp
+import time
+from rich import progress
+from rich.logging import RichHandler
+from rich.highlighter import NullHighlighter
+from rich.panel import Panel
+from rich.console import Console
+from rich.progress import Progress, TextColumn, BarColumn
+from contextlib import contextmanager
+from functools import partial
+from glob import glob
+from os.path import dirname, exists, isdir, join
+from packaging import version
+from tqdm import tqdm
+from typing import (
+    Any, Callable, Dict, Iterable, List, Optional, Tuple, Union, Iterator
+)
+
+import numpy as np
+import slideflow as sf
+from slideflow import errors
+from . import example_pb2, log_utils
+from .colors import *  # noqa F403,F401 - Here for compatibility
+from .smac_utils import (broad_search_space, shallow_search_space,
+                         create_search_space)
+
+tf_available = importlib.util.find_spec('tensorflow')
+torch_available = importlib.util.find_spec('torch')
+
+# Enable color sequences on Windows
+try:
+    import ctypes.windll
+    kernel32 = ctypes.windll.kernel32
+    kernel32.SetConsoleMode(kernel32.GetStdHandle(-11), 7)
+except Exception:
+    pass
+
+
+# --- Global vars -------------------------------------------------------------
+
+SUPPORTED_FORMATS = ['svs', 'tif', 'ndpi', 'vms', 'vmu', 'scn', 'mrxs',
+                     'tiff', 'svslide', 'bif', 'jpg', 'jpeg', 'png',
+                     'ome.tif', 'ome.tiff']
+EMPTY = ['', ' ', None, np.nan]
+CPLEX_AVAILABLE = (importlib.util.find_spec('cplex') is not None)
+try:
+    import pyomo.environ as pyo
+    from pyomo.opt import SolverFactory
+    opt = SolverFactory('bonmin', validate=False)
+    if not opt.available():
+        raise errors.SolverNotFoundError
+except Exception:
+    BONMIN_AVAILABLE = False
+else:
+    BONMIN_AVAILABLE = True
+
+
+# --- Commonly used types -----------------------------------------------------
+
+# Outcome labels
+Labels = Union[Dict[str, str], Dict[str, int], Dict[str, List[float]]]
+
+# Normalizer fit keyword arguments
+NormFit = Union[Dict[str, np.ndarray], Dict[str, List]]
+
+# --- Detect CPU cores --------------------------------------------------------
+
+def num_cpu(default: Optional[int] = None) -> Optional[int]:
+    try:
+        return len(os.sched_getaffinity(0))
+    except Exception as e:
+        count = os.cpu_count()
+        if count is None and default is not None:
+            return default
+        else:
+            return count
+
+# --- Configure logging--------------------------------------------------------
+
+log = logging.getLogger('slideflow')
+log.setLevel(logging.DEBUG)
+
+
+
[docs]def setLoggingLevel(level): + """Set the logging level. + + Uses standard python logging levels: + + - 50: CRITICAL + - 40: ERROR + - 30: WARNING + - 20: INFO + - 10: DEBUG + - 0: NOTSET + + Args: + level (int): Logging level numeric value. + + """ + log.handlers[0].setLevel(level)
+ + +
[docs]def getLoggingLevel(): + """Return the current logging level.""" + return log.handlers[0].level
+ + +@contextmanager +def logging_level(level: int): + _initial = getLoggingLevel() + setLoggingLevel(level) + try: + yield + finally: + setLoggingLevel(_initial) + + +def addLoggingFileHandler(path): + fh = logging.FileHandler(path) + fh.setFormatter(log_utils.FileFormatter()) + handler = log_utils.MultiProcessingHandler( + "mp-file-handler-{0}".format(len(log.handlers)), + sub_handler=fh + ) + log.addHandler(handler) + atexit.register(handler.close) + + +# Add tqdm-friendly stream handler +#ch = log_utils.TqdmLoggingHandler() +ch = RichHandler( + markup=True, + log_time_format="[%X]", + show_path=False, + highlighter=NullHighlighter(), + rich_tracebacks=True +) +ch.setFormatter(log_utils.LogFormatter()) +if 'SF_LOGGING_LEVEL' in os.environ: + try: + intLevel = int(os.environ['SF_LOGGING_LEVEL']) + ch.setLevel(intLevel) + except ValueError: + pass +else: + ch.setLevel(logging.INFO) +log.addHandler(ch) + +# Add multiprocessing-friendly file handler +addLoggingFileHandler("slideflow.log") + +# Workaround for duplicate logging with TF 2.9 +log.propagate = False + + +
[docs]class TileExtractionSpeedColumn(progress.ProgressColumn): + """Renders human readable transfer speed.""" + +
[docs] def render(self, task: "progress.Task") -> progress.Text: + """Show data transfer speed.""" + speed = task.finished_speed or task.speed + if speed is None: + return progress.Text("?", style="progress.data.speed") + data_speed = f'{int(speed)} img' + return progress.Text(f"{data_speed}/s", style="progress.data.speed")
+ + +
[docs]class LabeledMofNCompleteColumn(progress.MofNCompleteColumn): + """Renders a completion column with labels.""" + +
[docs] def __init__(self, unit: str, *args, **kwargs): + super().__init__(*args, **kwargs) + self.unit = unit
+ +
[docs] def render(self, task: "progress.Task") -> progress.Text: + """Show completion status with labels.""" + if task.total is None: + return progress.Text("?", style="progress.spinner") + return progress.Text( + f"{task.completed}/{task.total} {self.unit}", + style="progress.spinner" + )
+ + +
[docs]class ImgBatchSpeedColumn(progress.ProgressColumn): + """Renders human readable transfer speed.""" + +
[docs] def __init__(self, batch_size=1, *args, **kwargs): + super().__init__(*args, **kwargs) + self.batch_size = batch_size
+ +
[docs] def render(self, task: "progress.Task") -> progress.Text: + """Show data transfer speed.""" + speed = task.finished_speed or task.speed + if speed is None: + return progress.Text("?", style="progress.data.speed") + data_speed = f'{int(speed * self.batch_size)} img' + return progress.Text(f"{data_speed}/s", style="progress.data.speed")
+ + +
[docs]class TileExtractionProgress(Progress): +
[docs] def get_renderables(self): + for task in self.tasks: + if task.fields.get("progress_type") == 'speed': + self.columns = ( + TextColumn("[progress.description]{task.description}"), + TileExtractionSpeedColumn() + ) + if task.fields.get("progress_type") == 'slide_progress': + self.columns = ( + TextColumn("[progress.description]{task.description}"), + BarColumn(), + progress.TaskProgressColumn(), + progress.MofNCompleteColumn(), + "●", + progress.TimeRemainingColumn(), + ) + yield self.make_tasks_table([task])
+ + +
[docs]class FeatureExtractionProgress(Progress): +
[docs] def get_renderables(self): + for task in self.tasks: + if task.fields.get("progress_type") == 'speed': + self.columns = ( + TextColumn("[progress.description]{task.description}"), + TileExtractionSpeedColumn(), + LabeledMofNCompleteColumn('tiles'), + "●", + progress.TimeRemainingColumn(), + ) + if task.fields.get("progress_type") == 'slide_progress': + self.columns = ( + TextColumn("[progress.description]{task.description}"), + BarColumn(), + progress.TaskProgressColumn(), + LabeledMofNCompleteColumn('slides') + ) + yield self.make_tasks_table([task])
+ + +
[docs]def set_ignore_sigint(): + """Ignore keyboard interrupts.""" + import signal + signal.signal(signal.SIGINT, signal.SIG_IGN)
+ + +
[docs]class MultiprocessProgressTracker: + """Wrapper for a rich.progress tracker that can be shared across processes.""" + +
[docs] def __init__(self, tasks): + ctx = mp.get_context('spawn') + self.mp_values = { + task.id: ctx.Value('i', task.completed) + for task in tasks + }
+ + def advance(self, id, amount): + with self.mp_values[id].get_lock(): + self.mp_values[id].value += amount + + def __getitem__(self, id): + return self.mp_values[id].value
+ +
[docs]class MultiprocessProgress: + """Wrapper for a rich.progress bar that can be shared across processes.""" + +
[docs] def __init__(self, pb): + self.pb = pb + self.tracker = MultiprocessProgressTracker(self.pb.tasks) + self.should_stop = False
+ + def _update_progress(self): + while not self.should_stop: + for task in self.pb.tasks: + self.pb.update(task.id, completed=self.tracker[task.id]) + time.sleep(0.1) + + def __enter__(self): + self._thread = threading.Thread(target=self._update_progress) + self._thread.start() + return self + + def __exit__(self, *args): + self.should_stop = True + self._thread.join()
+ + +# --- Slideflow header -------------------------------------------------------- + +
[docs]def about(console=None) -> None: + """Print a summary of the slideflow version and active backends. + + Example + >>> sf.about() + ╭=======================╮ + │ Slideflow │ + │ Version: 3.0.0 │ + │ Backend: torch │ + │ Slide Backend: cucim │ + │ https://slideflow.dev │ + ╰=======================╯ + + Args: + console (rich.console.Console, optional): Active console, if one exists. + Defaults to None. + """ + if console is None: + console = Console() + col1 = 'yellow' if sf.backend() == 'tensorflow' else 'purple' + if sf.slide_backend() == 'libvips': + try: + import pyvips + _version = '{}.{}.{}'.format( + pyvips.major, pyvips.minor, pyvips.micro + ) + except Exception: + _version = 'unknown' + col2 = 'cyan' + slide_backend = 'libvips ({})'.format(_version) + else: + slide_backend = sf.slide_backend() + col2 = 'green' + console.print( + Panel(f"[white bold]Slideflow[/]" + f"\nVersion: {sf.__version__}" + f"\nBackend: [{col1}]{sf.backend()}[/]" + f"\nSlide Backend: [{col2}]{slide_backend}[/]" + "\n[blue]https://slideflow.dev[/]", + border_style='purple'), + justify='left')
+ + +# --- Data download functions ------------------------------------------------- + +
[docs]def download_from_tcga( + uuid: str, + dest: str, + message: str = 'Downloading...' +) -> None: + """Download a file from TCGA (GDC) by UUID.""" + data_endpt = f"https://api.gdc.cancer.gov/data/" + response = requests.post( + data_endpt, + data=json.dumps({'ids': [uuid]}), + headers={"Content-Type": "application/json"}, + stream=True + ) + response_head_cd = response.headers["Content-Disposition"] + block_size = 4096 + block_per_mb = block_size / 1000000 + file_size = int(response.headers.get('Content-Length', '')) + file_size_mb = file_size / 1000000 + running_total_mb = 0 + file_name = join(dest, re.findall("filename=(.+)", response_head_cd)[0]) + pbar = tqdm(desc=message, + total=file_size_mb, unit='MB', + bar_format="{desc}: {percentage:3.0f}%|{bar}| " + "{n:.2f}/{total:.2f} [{elapsed}<{remaining}] " + "{rate_fmt}{postfix}") + + with open(file_name, "wb") as output_file: + for chunk in response.iter_content(chunk_size=block_size): + output_file.write(chunk) + if block_per_mb + running_total_mb < file_size_mb: + running_total_mb += block_per_mb # type: ignore + pbar.update(block_per_mb) + else: + running_total_mb += file_size_mb - running_total_mb # type: ignore + pbar.update(file_size_mb - running_total_mb)
+ + +def make_cache_dir_path(path: str) -> str: + if 'HOME' in os.environ: + dest = os.path.join(os.environ['HOME'], '.cache', 'slideflow', path) + elif 'USERPROFILE' in os.environ: + dest = os.path.join(os.environ['USERPROFILE'], '.cache', 'slideflow', path) + else: + dest = os.path.join(tempfile.gettempdir(), '.cache', 'slideflow', path) + os.makedirs(dest, exist_ok=True) + return dest + + +def get_gdc_manifest() -> pd.DataFrame: + sf_cache = make_cache_dir_path('gdc') + manifest = join(sf_cache, 'gdc_manifest.tsv') + if not exists(manifest): + tar = 'gdc_manifest.tar.xz' + r = requests.get(f'https://raw.githubusercontent.com/slideflow/slideflow/1.4.0/datasets/{tar}') + open(join(sf_cache, tar), 'wb').write(r.content) + tarfile.open(join(sf_cache, tar)).extractall(sf_cache) + os.remove(join(sf_cache, tar)) + if not exists(manifest): + log.error("Failed to download GDC manifest.") + return pd.read_csv(manifest, delimiter='\t') + + +# --- Utility functions and classes ------------------------------------------- + +class no_scope(): + def __enter__(self): + return None + + def __exit__(self, exc_type, exc_value, traceback): + return False + + +
[docs]class EasyDict(dict): + """Convenience class that behaves like a dict but allows access + with the attribute syntax.""" + + def __getattr__(self, name: str) -> Any: + try: + return self[name] + except KeyError: + raise AttributeError(name) + + def __setattr__(self, name: str, value: Any) -> None: + self[name] = value + + def __delattr__(self, name: str) -> None: + del self[name]
+ +def zip_allowed() -> bool: + return not ('SF_ALLOW_ZIP' in os.environ and os.environ['SF_ALLOW_ZIP'] == '0') + +@contextmanager +def enable_zip(enable: bool) -> Iterator[None]: + _zip_allowed = zip_allowed() + os.environ['SF_ALLOW_ZIP'] = '1' if enable else '0' + yield + os.environ['SF_ALLOW_ZIP'] = '0' if not _zip_allowed else '1' + +
[docs]def md5(path: str) -> str: + """Calculate and return MD5 checksum for a file.""" + m = hashlib.md5() + with open(path, 'rb') as f: + chunk = f.read(4096) + # No walrus for Python 3.7 :( + while chunk: + m.update(chunk) + chunk = f.read(4096) + return m.hexdigest()
+ +def allow_gpu_memory_growth() -> None: + import tensorflow as tf + gpus = tf.config.experimental.list_physical_devices('GPU') + for gpu in gpus: + try: + tf.config.experimental.set_memory_growth(gpu, True) + except RuntimeError: + pass + +def model_backend(model): + if sf.util.torch_available and 'torch' in sys.modules: + import torch + if isinstance(model, torch.nn.Module): + return 'torch' + if sf.util.tf_available and 'tensorflow' in sys.modules: + import tensorflow as tf + if isinstance(model, tf.keras.Model): + return 'tensorflow' + from tensorflow.lite.python.interpreter import SignatureRunner + if isinstance(model, SignatureRunner): + return 'tflite' + raise ValueError(f"Unable to interpret model {model}") + + +def detuple(arg1: Any, args: tuple) -> Any: + if len(args): + return tuple([arg1] + list(args)) + else: + return arg1 + +def _as_list(arg1: Any) -> List[Any]: + if isinstance(arg1, np.ndarray): + return arg1.tolist() + else: + return arg1 + +
[docs]def batch(iterable: List, n: int = 1) -> Iterable: + """Separates an interable into batches of maximum size `n`.""" + l = len(iterable) + for ndx in range(0, l, n): + yield iterable[ndx:min(ndx + n, l)]
+ + +
[docs]def batch_generator(iterable: Iterable, n: int = 1) -> Iterable: + """Separates an interable into batches of maximum size `n`.""" + batch = [] + for item in iterable: + batch.append(item) + if len(batch) == n: + yield batch + batch = [] + if len(batch): + yield batch + return
+ + +def as_list(arg1: Any) -> List[Any]: + if not isinstance(arg1, list): + return [arg1] + else: + return arg1 + + +
[docs]def isnumeric(val: Any) -> bool: + """Check if the given value is numeric (numpy or python). + + Tensors will return False. + + Specifically checks if the value is a python int or float, + or if the value is a numpy array with a numeric dtype (int or float). + + """ + float_np_types = (np.int32, np.int64, np.uint8, np.float16, + np.float32, np.float64) + if isinstance(val, (int, float)): + return True + if isinstance(val, np.ndarray): + return val.dtype in float_np_types + return type(val) in float_np_types
+ + +def is_mag(arg1: str) -> bool: + arg1_split = arg1.lower().split('x') + if (len(arg1_split) != 2) or (arg1_split[1] != ''): + return False + try: + mag = float(arg1_split[0]) + except ValueError: + return False + return True + + +
[docs]def is_model(path: str) -> bool: + """Checks if the given path is a valid Slideflow model.""" + return is_tensorflow_model_path(path) or is_torch_model_path(path)
+ + +
[docs]def is_project(path: str) -> bool: + """Checks if the given path is a valid Slideflow project.""" + return isdir(path) and exists(join(path, 'settings.json'))
+ + +
[docs]def is_slide(path: str) -> bool: + """Checks if the given path is a supported slide.""" + return (os.path.isfile(path) + and sf.util.path_to_ext(path).lower() in SUPPORTED_FORMATS)
+ + +
[docs]def is_tensorflow_model_path(path: str) -> bool: + """Checks if the given path is a valid Slideflow/Tensorflow model.""" + return (isdir(path) + and (exists(join(path, 'params.json')) + or exists(join(dirname(path), 'params.json'))))
+ + +
[docs]def is_torch_model_path(path: str) -> bool: + """Checks if the given path is a valid Slideflow/PyTorch model.""" + return (os.path.isfile(path) + and sf.util.path_to_ext(path).lower() == 'zip' + and exists(join(dirname(path), 'params.json')))
+ + +
[docs]def is_simclr_model_path(path: Any) -> bool: + """Checks if the given path is a valid SimCLR model or checkpoint.""" + is_model = (isinstance(path, str) + and isdir(path) + and exists(join(path, 'args.json'))) + is_checkpoint = (isinstance(path, str) + and path.endswith('.ckpt') + and exists(join(dirname(path), 'args.json'))) + return is_model or is_checkpoint
+ + +
[docs]def is_uq_model(model_path: str) -> bool: + """Checks if the given model path points to a UQ-enabled model.""" + is_model_path = (is_tensorflow_model_path(model_path) + or is_torch_model_path(model_path)) + if not is_model_path: + return False + config = get_model_config(model_path) + return config['hp']['uq']
+ + +def assert_is_mag(arg1: str): + if not isinstance(arg1, str) or not is_mag(arg1): + raise ValueError( + f'Invalid magnification {arg1}. Must be of format' + f' [int/float]x, such as "10x", "20X", or "2.5x"' + ) + + +def to_mag(arg1: str) -> Union[int, float]: + assert_is_mag(arg1) + try: + return int(arg1.lower().split('x')[0]) + except ValueError: + return float(arg1.lower().split('x')[0]) + + +
[docs]def is_tile_size_compatible( + tile_px1: int, + tile_um1: Union[str, int], + tile_px2: int, + tile_um2: Union[str, int] +) -> bool: + """Check whether tile sizes are compatible. + + Compatibility is defined as: + - Equal size in pixels + - If tile width (tile_um) is defined in microns (int) for both, these must be equal + - If tile width (tile_um) is defined as a magnification (str) for both, these must be equal + - If one is defined in microns and the other as a magnification, the calculated magnification must be +/- 2. + + Example 1: + - tile_px1=299, tile_um1=302 + - tile_px2=299, tile_um2=304 + - Incompatible (unequal micron width) + + Example 2: + - tile_px1=299, tile_um1=10x + - tile_px2=299, tile_um2=9x + - Incompatible (unequal magnification) + + Example 3: + - tile_px1=299, tile_um1=302 + - tile_px2=299, tile_um2=10x + - Compatible (first has an equivalent magnification of 9.9x, which is +/- 2 compared to 10x) + + + Args: + tile_px1 (int): Tile size (in pixels) of first slide. + tile_um1 (int or str): Tile size (in microns) of first slide. + Can also be expressed as a magnification level, e.g. ``'10x'`` + tile_px2 (int): Tile size (in pixels) of second slide. + tile_um2 (int or str): Tile size (in microns) of second slide. + Can also be expressed as a magnification level, e.g. ``'10x'`` + + Returns: + bool: Whether the tile sizes are compatible. + + """ + # Type checks + if not isinstance(tile_px1, int): + raise ValueError("Expected tile_px1 to be an int, got: {}".format(type(tile_px1))) + if not isinstance(tile_um1, (str, int)): + raise ValueError("Expected tile_um1 to be a str or int, got: {}".format(type(tile_um1))) + if not isinstance(tile_px2, int): + raise ValueError("Expected tile_px2 to be an int, got: {}".format(type(tile_px2))) + if not isinstance(tile_um2, (str, int)): + raise ValueError("Expected tile_um2 to be a str or int, got: {}".format(type(tile_um2))) + + # Enforce equivalent pixel size + if tile_px1 != tile_px2: + return False + # If both are defined as a magnification, check if these are equal + if isinstance(tile_um1, str) and isinstance(tile_um2, str): + return tile_um1 == tile_um2 + # If both are defined in microns, check if these are equal + if isinstance(tile_um1, int) and isinstance(tile_um2, int): + return tile_um1 == tile_um2 + # If one is defined in microns and the other as magnification, + # check if they are compatible. + if isinstance(tile_um1, str) and isinstance(tile_um2, int): + mag2 = 10 / (tile_um2 / tile_px2) + return abs(mag2 - to_mag(tile_um1)) <= 2 + if isinstance(tile_um1, int) and isinstance(tile_um2, str): + mag1 = 10 / (tile_um1 / tile_px1) + return abs(mag1 - to_mag(tile_um2)) <= 2 + else: + raise ValueError("Error assessing tile size compatibility between px={}, um={} and px={}, um={}".format( + tile_px1, tile_um1, tile_px2, tile_um2 + ))
+ + +
[docs]def multi_warn(arr: List, compare: Callable, msg: Union[Callable, str]) -> int: + """Logs multiple warning + + Args: + arr (List): Array to compare. + compare (Callable): Comparison to perform on array. If True, will warn. + msg (str): Warning message. + + Returns: + int: Number of warnings. + """ + num_warned = 0 + warn_threshold = 3 + for item in arr: + if compare(item): + fn = log.warn if num_warned < warn_threshold else log.debug + if isinstance(msg, str): + fn(msg.format(item)) + elif callable(msg): + fn(msg(item)) + num_warned += 1 + if num_warned >= warn_threshold: + log.warn(f'...{num_warned} total warnings, see log for details') + return num_warned
+ + +
[docs]def to_onehot(val: int, max: int) -> np.ndarray: + """Converts value to one-hot encoding + + Args: + val (int): Value to encode + max (int): Maximum value (length of onehot encoding) + """ + + onehot = np.zeros(max, dtype=np.int64) + onehot[val] = 1 + return onehot
+ + +def clear_console() -> None: + sys.stdout.write("\r\033[K") + sys.stdout.flush() + + +
[docs]def make_dir(_dir: str) -> None: + """Makes a directory if one does not already exist, + in a manner compatible with multithreading. + """ + if not exists(_dir): + try: + os.makedirs(_dir, exist_ok=True) + except FileExistsError: + pass
+ + +
[docs]def relative_path(path: str, root: str): + """Returns a relative path, from a given root directory.""" + if path[0] == '.': + return join(root, path[2:]) + elif path.startswith('$ROOT'): + raise ValueError("Invalid path prefix $ROOT; update project settings") + else: + return path
+ + +
[docs]def global_path(root: str, path_string: str): + '''Returns global path from a local path.''' + if not root: + root = "" + if path_string and (len(path_string) > 2) and path_string[:2] == "./": + return os.path.join(root, path_string[2:]) + elif path_string and (path_string[0] != "/"): + return os.path.join(root, path_string) + else: + return path_string
+ + +def _shortname(string: str): + if len(string) == 60: + # May be TCGA slide with long name; convert to + # patient name by returning first 12 characters + return string[:12] + else: + return string + + +
[docs]def yes_no_input(prompt: str, default: str = 'no') -> bool: + '''Prompts user for yes/no input.''' + while True: + response = input(prompt) + if not response and default: + return (default in ('yes', 'y')) + elif response.lower() in ('yes', 'no', 'y', 'n'): + return (response.lower() in ('yes', 'y')) + else: + print("Invalid response.")
+ + +
[docs]def path_input( + prompt: str, + root: str, + default: Optional[str] = None, + create_on_invalid: bool = False, + filetype: Optional[str] = None, + verify: bool = True +) -> str: + '''Prompts user for directory input.''' + while True: + relative_response = input(f"{prompt}") + reponse = global_path(root, relative_response) + if not relative_response and default: + relative_response = default + reponse = global_path(root, relative_response) + if verify and not os.path.exists(reponse): + if not filetype and create_on_invalid: + prompt = f'Path "{reponse}" does not exist. Create? [Y/n] ' + if yes_no_input(prompt, default='yes'): + os.makedirs(reponse) + return relative_response + else: + continue + elif filetype: + print(f'Unable to locate file "{reponse}"') + continue + elif not filetype and not os.path.exists(reponse): + print(f'Unable to locate directory "{reponse}"') + continue + resp_type = path_to_ext(reponse) + if filetype and (resp_type != filetype): + print(f'Incorrect filetype "{resp_type}", expected "{filetype}"') + continue + return relative_response
+ + +
[docs]def choice_input(prompt, valid_choices, default=None, multi_choice=False, + input_type=str): + '''Prompts user for multi-choice input.''' + while True: + response = input(f"{prompt}") + if not response and default: + return default + if not multi_choice and response not in valid_choices: + print("Invalid option.") + continue + elif multi_choice: + try: + replaced = response.replace(" ", "") + response = [input_type(r) for r in replaced.split(',')] + except ValueError: + print(f"Invalid selection (response: {response})") + continue + invalid = [r not in valid_choices for r in response] + if any(invalid): + print(f'Invalid selection (response: {response})') + continue + return response
+ + +
[docs]def load_json(filename: str) -> Any: + '''Reads JSON data from file.''' + with open(filename, 'r') as data_file: + return json.load(data_file)
+ + +
[docs]class ValidJSONEncoder(json.JSONEncoder): +
[docs] def default(self, obj): + try: + return super().default(obj) + except TypeError: + return "<unknown>"
+ + +
[docs]def write_json(data: Any, filename: str) -> None: + """Write data to JSON file. + + Args: + data (Any): Data to write. + filename (str): Path to JSON file. + + """ + # First, remove any invalid entries that are not serializable + with open(filename, "w") as data_file: + json.dump(data, data_file, indent=1, cls=ValidJSONEncoder)
+ + +
[docs]def log_manifest( + train_tfrecords: Optional[List[str]] = None, + val_tfrecords: Optional[List[str]] = None, + *, + labels: Optional[Dict[str, Any]] = None, + filename: Optional[str] = None, + remove_extension: bool = True +) -> str: + """Saves the training manifest in CSV format and returns as a string. + + Args: + train_tfrecords (list(str)], optional): List of training TFRecords. + Defaults to None. + val_tfrecords (list(str)], optional): List of validation TFRecords. + Defaults to None. + + Keyword args: + labels (dict, optional): TFRecord outcome labels. Defaults to None. + filename (str, optional): Path to CSV file to save. Defaults to None. + remove_extension (bool, optional): Remove file extension from slide + names. Defaults to True. + + Returns: + str: Saved manifest in str format. + """ + out = '' + has_labels = (isinstance(labels, dict) and len(labels)) + if filename: + save_file = open(os.path.join(filename), 'w') + writer = csv.writer(save_file) + writer.writerow(['slide', 'dataset', 'outcome_label']) + if train_tfrecords or val_tfrecords: + if train_tfrecords: + for tfrecord in train_tfrecords: + if remove_extension: + slide = sf.util.path_to_name(tfrecord) + else: + slide = tfrecord + outcome_label = labels[slide] if has_labels else 'NA' + out += ' '.join([slide, 'training', str(outcome_label)]) + if filename: + writer.writerow([slide, 'training', outcome_label]) + if val_tfrecords: + for tfrecord in val_tfrecords: + if remove_extension: + slide = sf.util.path_to_name(tfrecord) + else: + slide = tfrecord + outcome_label = labels[slide] if has_labels else 'NA' + out += ' '.join([slide, 'validation', str(outcome_label)]) + if filename: + writer.writerow([slide, 'validation', outcome_label]) + if filename: + save_file.close() + return out
+ + +
[docs]def get_slides_from_model_manifest( + model_path: str, + dataset: Optional[str] = None +) -> List[str]: + """Get list of slides from a model manifest. + + Args: + model_path (str): Path to model from which to load the model manifest. + dataset (str): 'training' or 'validation'. Will return only slides + from this dataset. Defaults to None (all). + + Returns: + list(str): List of slide names. + """ + + slides = [] + if exists(join(model_path, 'slide_manifest.csv')): + manifest = join(model_path, 'slide_manifest.csv') + elif exists(join(dirname(model_path), 'slide_manifest.csv')): + log.debug("Slide manifest not found in model directory") + log.debug("Loading manifest from parent directory.") + manifest = join(dirname(model_path), 'slide_manifest.csv') + else: + log.error('Slide manifest not found in model folder') + return [] + with open(manifest, 'r') as manifest_file: + reader = csv.reader(manifest_file) + header = next(reader) + dataset_index = header.index('dataset') + slide_index = header.index('slide') + for row in reader: + dataset_name = row[dataset_index] + slide_name = row[slide_index] + if dataset_name == dataset or not dataset: + slides += [slide_name] + return slides
+ + +
[docs]def get_gan_config(model_path: str) -> Dict: + """Loads a GAN training_options.json for an associated network PKL.""" + + if exists(join(dirname(model_path), 'training_options.json')): + return load_json(join(dirname(model_path), 'training_options.json')) + else: + raise errors.ModelParamsNotFoundError
+ + +
[docs]def get_model_config(model_path: str) -> Dict: + """Loads model configuration JSON file.""" + + if model_path.endswith('params.json'): + config = load_json(model_path) + elif exists(join(model_path, 'params.json')): + config = load_json(join(model_path, 'params.json')) + elif exists(model_path) and exists(join(dirname(model_path), 'params.json')): + if not (sf.util.torch_available + and sf.util.path_to_ext(model_path) == 'zip'): + log.warning( + "Hyperparameters not in model directory; loading from parent" + " directory. Please move params.json into model folder." + ) + config = load_json(join(dirname(model_path), 'params.json')) + else: + raise errors.ModelParamsNotFoundError + # Compatibility for pre-1.1 + if 'norm_mean' in config: + config['norm_fit'] = { + 'target_means': config['norm_mean'], + 'target_stds': config['norm_std'], + } + if 'outcome_label_headers' in config: + log.debug("Replacing outcome_label_headers in params.json -> outcomes") + config['outcomes'] = config.pop('outcome_label_headers') + # Compatibility for pre-3.0 + if 'model_type' in config and config['model_type'] == 'categorical': + config['model_type'] = 'classification' + if 'model_type' in config and config['model_type'] == 'linear': + config['model_type'] = 'regression' + return config
+ + +
[docs]def get_ensemble_model_config(model_path: str) -> Dict: + """Loads ensemble model configuration JSON file.""" + + if exists(join(model_path, 'ensemble_params.json')): + config = load_json(join(model_path, 'ensemble_params.json')) + elif exists(join(dirname(model_path), 'ensemble_params.json')): + if not (sf.util.torch_available + and sf.util.path_to_ext(model_path) == 'zip'): + log.warning( + "Hyperparameters not in model directory; loading from parent" + " directory. Please move ensemble_params.json into model folder." + ) + config = load_json(join(dirname(model_path), 'params.json')) + else: + raise errors.ModelParamsNotFoundError + # Compatibility for pre-1.1 + if 'norm_mean' in config: + config['norm_fit'] = { + 'target_means': config['norm_mean'], + 'target_stds': config['norm_std'], + } + if 'outcome_label_headers' in config: + log.debug("Replacing outcome_label_headers in params.json -> outcomes") + config['outcomes'] = config.pop('outcome_label_headers') + return config
+ + +
[docs]def get_model_normalizer( + model_path: str +) -> Optional["sf.norm.StainNormalizer"]: + """Loads and fits normalizer using configuration at a model path.""" + + config = sf.util.get_model_config(model_path) + if is_torch_model_path(model_path): + backend = 'torch' + elif is_tensorflow_model_path(model_path): + backend = 'tensorflow' + else: + log.warn(f"Unable to determine backend for model at {model_path}") + backend = None + + if not config['hp']['normalizer']: + return None + + if ('slideflow_version' in config + and version.parse(config['slideflow_version']) <= version.parse("1.2.2") + and config['hp']['normalizer'] in ('vahadane', 'macenko')): + log.warn("Detected model trained with Macenko or Vahadane " + "normalization with Slideflow version <= 1.2.2. Macenko " + "and Vahadane algorithms were optimized in 1.2.3 and may " + "now yield slightly different results. ") + + normalizer = sf.norm.autoselect( + config['hp']['normalizer'], + config['hp']['normalizer_source'], + backend=backend + ) + if 'norm_fit' in config and config['norm_fit'] is not None: + normalizer.set_fit(**config['norm_fit']) + return normalizer
+ + +
[docs]def get_preprocess_fn(model_path: str): + """Returns a function which preprocesses a uint8 image for a model. + + Args: + model_path (str): Path to a saved Slideflow model. + + Returns: + A function which accepts a single image or batch of uint8 images, + and returns preprocessed (and stain normalized) float32 images. + + """ + normalizer = get_model_normalizer(model_path) + if is_torch_model_path(model_path): + from slideflow.io.torch import preprocess_uint8 + return partial(preprocess_uint8, normalizer=normalizer) + elif is_tensorflow_model_path(model_path): + from slideflow.io.tensorflow import preprocess_uint8 + return partial(preprocess_uint8, normalizer=normalizer, as_dict=False) + else: + raise ValueError(f"Unrecognized model: {model_path}")
+ + +
[docs]def get_slide_paths(slides_dir: str) -> List[str]: + '''Get all slide paths from a given directory containing slides.''' + slide_list = [i for i in glob(join(slides_dir, '**/*.*')) if is_slide(i)] + slide_list.extend([i for i in glob(join(slides_dir, '*.*')) if is_slide(i)]) + return slide_list
+ + +
[docs]def read_annotations(path: str) -> Tuple[List[str], List[Dict]]: + '''Read an annotations file.''' + results = [] + with open(path, 'r') as csv_file: + csv_reader = csv.reader(csv_file, delimiter=',') + # First, try to open file + try: + header = next(csv_reader, None) + except OSError: + raise OSError( + f"Failed to open annotations file {path}" + ) + assert isinstance(header, list) + for row in csv_reader: + row_dict = {} + for i, key in enumerate(header): + row_dict[key] = row[i] + results += [row_dict] + return header, results
+ + +
[docs]def get_relative_tfrecord_paths(root: str, directory: str = "") -> List[str]: + '''Returns relative tfrecord paths with respect to the given directory.''' + + tfrecords = [ + join(directory, f) for f in os.listdir(join(root, directory)) + if (not isdir(join(root, directory, f)) + and len(f) > 10 and f[-10:] == ".tfrecords") + ] + subdirs = [ + f for f in os.listdir(join(root, directory)) + if isdir(join(root, directory, f)) + ] + for sub in subdirs: + tfrecords += get_relative_tfrecord_paths(root, join(directory, sub)) + return tfrecords
+ + +def contains_nested_subdirs(directory: str) -> bool: + subdirs = [ + _dir for _dir in os.listdir(directory) + if isdir(join(directory, _dir)) + ] + for subdir in subdirs: + contents = os.listdir(join(directory, subdir)) + for c in contents: + if isdir(join(directory, subdir, c)): + return True + return False + + +
[docs]def path_to_name(path: str) -> str: + '''Returns name of a file, without extension, + from a given full path string.''' + _file = os.path.basename(path) + dot_split = _file.split('.') + if len(dot_split) == 1: + return _file + elif len(dot_split) > 2 and '.'.join(dot_split[-2:]) in SUPPORTED_FORMATS: + return '.'.join(dot_split[:-2]) + else: + return '.'.join(dot_split[:-1])
+ + +
[docs]def path_to_ext(path: str) -> str: + '''Returns extension of a file path string.''' + _file = os.path.basename(path) + dot_split = _file.split('.') + if len(dot_split) == 1: + return '' + elif len(dot_split) > 2 and '.'.join(dot_split[-2:]) in SUPPORTED_FORMATS: + return '.'.join(dot_split[-2:]) + else: + return dot_split[-1]
+ + +
[docs]def update_results_log( + results_log_path: str, + model_name: str, + results_dict: Dict +) -> None: + '''Dynamically update results_log when recording training metrics.''' + # First, read current results log into a dictionary + results_log = {} # type: Dict[str, Any] + if exists(results_log_path): + with open(results_log_path, "r") as results_file: + reader = csv.reader(results_file) + try: + headers = next(reader) + except StopIteration: + pass + else: + try: + model_name_i = headers.index('model_name') + result_keys = [k for k in headers if k != 'model_name'] + except ValueError: + model_name_i = headers.index('epoch') + result_keys = [k for k in headers if k != 'epoch'] + for row in reader: + name = row[model_name_i] + results_log[name] = {} + for result_key in result_keys: + result = row[headers.index(result_key)] + results_log[name][result_key] = result + # Move the current log file into a temporary file + shutil.move(results_log_path, f"{results_log_path}.temp") + + # Next, update the results log with the new results data + for epoch in results_dict: + results_log.update({f'{model_name}-{epoch}': results_dict[epoch]}) + + # Finally, create a new log file incorporating the new data + with open(results_log_path, "w") as results_file: + writer = csv.writer(results_file) + result_keys = [] + # Search through results to find all results keys + for model in results_log: + result_keys += list(results_log[model].keys()) + # Remove duplicate result keys + result_keys = list(set(result_keys)) + result_keys.sort() + # Write header labels + writer.writerow(['model_name'] + result_keys) + # Iterate through model results and record + for model in results_log: + row = [model] + # Include all saved metrics + for result_key in result_keys: + if result_key in results_log[model]: + row += [results_log[model][result_key]] + else: + row += [""] + writer.writerow(row) + + # Delete the old results log file + if exists(f"{results_log_path}.temp"): + os.remove(f"{results_log_path}.temp")
+ + +
[docs]def map_values_to_slide_grid( + locations: np.ndarray, + values: np.ndarray, + wsi: "sf.WSI", + background: str = 'min', + *, + interpolation: Optional[str] = 'bicubic', +) -> np.ndarray: + """Map heatmap values to a slide grid, using tile location information. + + Args: + locations (np.ndarray): Array of shape ``(n_tiles, 2)`` containing x, y + coordinates for all image tiles. Coordinates represent the center + for an associated tile, and must be in a grid. + values (np.ndarray): Array of shape ``(n_tiles,)`` containing heatmap + values for each tile. + wsi (slideflow.wsi.WSI): WSI object. + + Keyword args: + background (str, optional): Background strategy for heatmap. Can be + 'min', 'mean', 'median', 'max', or 'mask'. Defaults to 'min'. + interpolation (str, optional): Interpolation strategy for smoothing + heatmap. Defaults to 'bicubic'. + + """ + no_interpolation = (interpolation is None or interpolation == 'nearest') + + # Slide coordinate information + loc_grid_dict = {(c[0], c[1]): (c[2], c[3]) for c in wsi.coord} + + # Determine the heatmap background + grid = np.empty((wsi.grid.shape[1], wsi.grid.shape[0])) + if background == 'mask' and not no_interpolation: + raise ValueError( + "'mask' background is not compatible with interpolation method " + "'{}'. Expected: None or 'nearest'".format(interpolation) + ) + elif background == 'mask': + grid[:] = np.nan + elif background == 'min': + grid[:] = np.min(values) + elif background == 'mean': + grid[:] = np.mean(values) + elif background == 'median': + grid[:] = np.median(values) + elif background == 'max': + grid[:] = np.max(values) + else: + raise ValueError(f"Unrecognized value for background: {background}") + + if not isinstance(locations, np.ndarray): + locations = np.array(locations) + + # Transform from coordinates as center locations to top-left locations. + locations = locations - int(wsi.full_extract_px/2) + + for i, wsi_dim in enumerate(locations): + try: + idx = loc_grid_dict[tuple(wsi_dim)] + except (IndexError, KeyError): + raise errors.CoordinateAlignmentError( + "Error plotting value at location {} for slide {}. The heatmap " + "grid is not aligned to the slide coordinate grid. Ensure " + "that tile_px (got: {}) and tile_um (got: {}) match the given " + "location values. If you are using data stored in TFRecords, " + "verify that the TFRecord was generated using the same " + "tile_px and tile_um.".format( + tuple(wsi_dim), wsi.path, wsi.tile_px, wsi.tile_um + ) + ) + grid[idx[1]][idx[0]] = values[i] + + # Mask out background, if interpolation is not used and background == 'mask' + if no_interpolation and background == 'mask': + masked_grid = np.ma.masked_invalid(grid) + else: + masked_grid = grid + return masked_grid
+ + +
[docs]def bin_values_to_slide_grid( + locations: np.ndarray, + values: np.ndarray, + wsi: "sf.WSI", + background: str = 'min', +) -> np.ndarray: + """Bin heatmap values to a slide grid, using tile location information. + + Args: + locations (np.ndarray): Array of shape ``(n_tiles, 2)`` containing x, y + coordinates for all image tiles. Coordinates represent the center + for an associated tile, and must be in a grid. + values (np.ndarray): Array of shape ``(n_tiles,)`` containing heatmap + values for each tile. + wsi (slideflow.wsi.WSI): WSI object. + + Keyword args: + background (str, optional): Background strategy for heatmap. Can be + 'min', 'mean', 'median', 'max', or 'mask'. Defaults to 'min'. + + """ + from scipy.stats import binned_statistic_2d + masked_grid, *_ = binned_statistic_2d( + locations[:, 0], + locations[:, 1], + values, + bins=wsi.grid.shape, + range=[[0, wsi.dimensions[0]], [0, wsi.dimensions[1]]] + ) + masked_grid = masked_grid.T + nan_idx = np.where(np.isnan(masked_grid)) + + if background == 'mask': + # No action needed + pass + elif background == 'min': + masked_grid[nan_idx] = np.min(values) + elif background == 'mean': + masked_grid[nan_idx] = np.mean(values) + elif background == 'median': + masked_grid[nan_idx] = np.median(values) + elif background == 'max': + masked_grid[nan_idx] = np.max(values) + else: + raise ValueError(f"Unrecognized value for background: {background}") + + return masked_grid
+ + +
[docs]def infer_stride(locations, wsi): + """Infer the stride of a grid of locations from a set of locations. + + Args: + locations (np.ndarray): Nx2 array of locations + wsi (slideflow.wsi.WSI): WSI object + + Returns: + float: inferred stride divisor in pixels + + """ + sort_unique_x = np.sort(np.unique(locations[:, 0])) + sort_unique_y = np.sort(np.unique(locations[:, 1])) + min_stride_x = (sort_unique_x[1:] - sort_unique_x[:-1]).min() + min_stride_y = (sort_unique_y[1:] - sort_unique_y[:-1]).min() + inferred_stride_px = min(min_stride_x, min_stride_y) + return wsi.full_extract_px / inferred_stride_px
+ + +
[docs]def location_heatmap( + locations: np.ndarray, + values: np.ndarray, + slide: str, + tile_px: int, + tile_um: Union[int, str], + filename: str, + *, + interpolation: Optional[str] = 'bicubic', + cmap: str = 'inferno', + norm: Optional[str] = None, + background: str = 'min' +) -> None: + """Generate a heatmap for a slide. + + Args: + locations (np.ndarray): Array of shape ``(n_tiles, 2)`` containing x, y + coordinates for all image tiles. Coordinates represent the center + for an associated tile, and must be in a grid. + values (np.ndarray): Array of shape ``(n_tiles,)`` containing heatmap + values for each tile. + slide (str): Path to corresponding slide. + tile_px (int): Tile pixel size. + tile_um (int, str): Tile micron or magnification size. + filename (str): Destination filename for heatmap. + + Keyword args: + interpolation (str, optional): Interpolation strategy for smoothing + heatmap. Defaults to 'bicubic'. + cmap (str, optional): Matplotlib colormap for heatmap. Can be any + valid matplotlib colormap. Defaults to 'inferno'. + norm (str, optional): Normalization strategy for assigning heatmap + values to colors. Either 'two_slope', or any other valid value + for the ``norm`` argument of ``matplotlib.pyplot.imshow``. + If 'two_slope', normalizes values less than 0 and greater than 0 + separately. Defaults to None. + + """ + if not isinstance(values, np.ndarray): + raise ValueError( + "Error generating heatmap. 'values' should be a numpy array " + "with shape (n_tiles, )" + ) + if (len(values.shape) > 1) and (values.shape[1] != 1): + raise ValueError( + "Error generating heatmap. Expected 'values' to have (n_tiles,) " + "but got shape {}".format(values.shape) + ) + + log.info(f'Generating heatmap for [green]{slide}[/]...') + log.debug(f"Plotting {len(values)} values") + wsi = sf.WSI(slide, tile_px, tile_um, verbose=False) + stride = infer_stride(locations, wsi) + if stride > 32: + # Large inferred strides are likely due to unaligned grid. + # Rather than attempting to build a coordinate grid for verifying + # grid alignment, we will assume that the grid is unaligned and + # use the default stride (1). This will cause map_values_to_slide_grid + # to recognize that the grid is unaligned, and the heatmap will be built + # using histogram2d. + log.debug(f"Failed sanity check for inferred stride ({stride})") + elif stride != 1: + log.debug(f"Inferred stride: {stride}") + wsi = sf.WSI(slide, tile_px, tile_um, stride_div=stride, verbose=False) + + try: + masked_grid = map_values_to_slide_grid( + locations, values, wsi, background=background, interpolation=interpolation + ) + except errors.CoordinateAlignmentError as e: + log.debug("Coordinate alignment error: {}".format(e)) + log.info("Unable to align grid for plotting heatmap. Heatmap will be " + "binned with a stride of 1.") + masked_grid = bin_values_to_slide_grid( + locations, values, wsi, background=background + ) + + import matplotlib.pyplot as plt + import matplotlib.colors as mcol + with matplotlib_backend('Agg'): + thumb = wsi.thumb(mpp=5) + fig = plt.figure(figsize=(18, 16)) + ax = fig.add_subplot(111) + fig.subplots_adjust(bottom=0.25, top=0.95) + gca = plt.gca() + gca.tick_params( + axis='x', + top=True, + labeltop=True, + bottom=False, + labelbottom=False + ) + ax.imshow(thumb, zorder=0) + + # Calculate overlay offset + extent = sf.heatmap.calculate_heatmap_extent(wsi, thumb, masked_grid) + + # Plot + if norm == 'two_slope': + norm = mcol.TwoSlopeNorm( + vmin=min(-0.01, min(values)), + vcenter=0, + vmax=max(0.01, max(values)) + ) + ax.imshow( + masked_grid, + zorder=10, + alpha=0.6, + extent=extent, + interpolation=interpolation, + cmap=cmap, + norm=norm + ) + ax.set_xlim(0, thumb.size[0]) + ax.set_ylim(thumb.size[1], 0) + log.debug('Saving figure...') + plt.savefig(filename, bbox_inches='tight') + plt.close()
+ + +
[docs]def tfrecord_heatmap( + tfrecord: str, + slide: str, + tile_px: int, + tile_um: Union[int, str], + tile_dict: Dict[int, float], + filename: str, + **kwargs +) -> None: + """Creates a tfrecord-based WSI heatmap using a dictionary of tile values + for heatmap display. + + Args: + tfrecord (str): Path to tfrecord. + slide (str): Path to whole-slide image. + tile_dict (dict): Dictionary mapping tfrecord indices to a + tile-level value for display in heatmap format. + tile_px (int): Tile width in pixels. + tile_um (int or str): Tile width in microns (int) or magnification + (str, e.g. "20x"). + filename (str): Destination filename for heatmap. + + """ + locations = sf.io.get_locations_from_tfrecord(tfrecord) + if len(tile_dict) != len(locations): + raise errors.TFRecordsError( + f'tile_dict length ({len(tile_dict)}) != TFRecord length ' + f'({len(locations)}).' + ) + + return location_heatmap( + locations=np.array(locations), + values=np.array([tile_dict[loc] for loc in range(len(locations))]), + slide=slide, + tile_px=tile_px, + tile_um=tile_um, + filename=filename, + **kwargs + )
+ + +
[docs]def tile_size_label(tile_px: int, tile_um: Union[str, int]) -> str: + """Return the string label of the given tile size.""" + if isinstance(tile_um, str): + return f"{tile_px}px_{tile_um.lower()}" + else: + return f"{tile_px}px_{tile_um}um"
+ + +
[docs]def get_valid_model_dir(root: str) -> List: + ''' + This function returns the path of the first indented directory from root. + This only works when the indented folder name starts with a 5 digit number, + like "00000%". + + Examples + If the root has 3 files: + root/00000-foldername/ + root/00001-foldername/ + root/00002-foldername/ + + The function returns "root/00000-foldername/" + ''' + + prev_run_dirs = [ + x for x in os.listdir(root) + if isdir(join(root, x)) + ] + prev_run_ids = [re.match(r'^\d+', x) for x in prev_run_dirs] + prev_run_ids = [int(x.group()) for x in prev_run_ids if x is not None] + return prev_run_ids, prev_run_dirs
+ + +def get_new_model_dir(root: str, model_name: str) -> str: + prev_run_ids, prev_run_dirs = get_valid_model_dir(root) + cur_id = max(prev_run_ids, default=-1) + 1 + model_dir = os.path.join(root, f'{cur_id:05d}-{model_name}') + assert not os.path.exists(model_dir) + os.makedirs(model_dir) + return model_dir + + +def create_new_model_dir(root: str, model_name: str) -> str: + path = get_new_model_dir(root, model_name) + if not os.path.exists(path): + os.makedirs(path) + return path + + +
[docs]def split_list(a: List, n: int) -> List[List]: + '''Function to split a list into n components''' + k, m = divmod(len(a), n) + return [a[i * k + min(i, m): (i + 1) * k + min(i + 1, m)] + for i in range(n)]
+ + +# --- TFRecord utility functions ---------------------------------------------- + +def process_feature( + feature: example_pb2.Feature, # type: ignore + typename: str, + typename_mapping: Dict, + key: str +) -> np.ndarray: + # NOTE: We assume that each key in the example has only one field + # (either "bytes_list", "float_list", or "int64_list")! + field = feature.ListFields()[0] # type: ignore + inferred_typename, value = field[0].name, field[1].value + + if typename is not None: + tf_typename = typename_mapping[typename] + if tf_typename != inferred_typename: + reversed_mapping = {v: k for k, v in typename_mapping.items()} + raise TypeError( + f"Incompatible type '{typename}' for `{key}` " + f"(should be '{reversed_mapping[inferred_typename]}')." + ) + + if inferred_typename == "bytes_list": + value = np.frombuffer(value[0], dtype=np.uint8) + elif inferred_typename == "float_list": + value = np.array(value, dtype=np.float32) + elif inferred_typename == "int64_list": + value = np.array(value, dtype=np.int64) + return value + + +def extract_feature_dict( + features: Union[example_pb2.FeatureLists, # type: ignore + example_pb2.Features], # type: ignore + description: Optional[Union[List, Dict]], + typename_mapping: Dict +) -> Dict[str, Any]: + if isinstance(features, example_pb2.FeatureLists): + features = features.feature_list # type: ignore + + def get_value(typename, typename_mapping, key): + feature = features[key].feature + fn = partial( + process_feature, + typename=typename, + typename_mapping=typename_mapping, + key=key + ) + return list(map(fn, feature)) + elif isinstance(features, example_pb2.Features): + features = features.feature # type: ignore + + def get_value(typename, typename_mapping, key): + return process_feature(features[key], typename, + typename_mapping, key) + else: + raise TypeError(f"Incompatible type: features should be either of type " + f"example_pb2.Features or example_pb2.FeatureLists and " + f"not {type(features)}") + + all_keys = list(features.keys()) # type: ignore + + if description is None or len(description) == 0: + description = dict.fromkeys(all_keys, None) + elif isinstance(description, list): + description = dict.fromkeys(description, None) + + processed_features = {} + for key, typename in description.items(): + if key not in all_keys: + raise KeyError(f"Key {key} doesn't exist (select from {all_keys})!") + + processed_features[key] = get_value(typename, typename_mapping, key) + + return processed_features + + +
[docs]def load_predictions(path: str, **kwargs) -> pd.DataFrame: + """Loads a 'csv', 'parquet' or 'feather' file to a pandas dataframe. + + Args: + path (str): Path to the file to be read. + + Returns: + df (pd.DataFrame): The dataframe read from the path. + """ + if path.endswith("csv"): + return pd.read_csv(f"{path}", **kwargs) + elif path.endswith("parquet") or path.endswith("gzip"): + return pd.read_parquet(f"{path}", **kwargs) + elif path.endswith("feather"): + return pd.read_feather(f"{path}", **kwargs) + else: + raise ValueError(f'Unrecognized extension "{path_to_ext(path)}"')
+ + +@contextmanager +def cleanup_progress(pb: Optional["Progress"]): + try: + yield + finally: + if pb is not None: + pb.refresh() + pb.stop() + + +@contextmanager +def matplotlib_backend(backend): + import matplotlib + original_backend = matplotlib.get_backend() + try: + matplotlib.use(backend) + yield + finally: + matplotlib.use(original_backend) + + +
[docs]def create_triangles(vertices, hole_vertices=None, hole_points=None): + """ + Tessellate a complex polygon, possibly with holes. + + :param vertices: A list of vertices [(x1, y1), (x2, y2), ...] defining the polygon boundary. + :param holes: An optional list of points [(hx1, hy1), (hx2, hy2), ...] inside each hole in the polygon. + :return: A numpy array of vertices for the tessellated triangles. + """ + import triangle as tr + + # Prepare the segment information for the exterior boundary + segments = np.array([[i, (i + 1) % len(vertices)] for i in range(len(vertices))]) + + # Prepare the polygon for Triangle + polygon = {'vertices': np.array(vertices), 'segments': segments} + + # If there are holes and hole boundaries, add them to the polygon definition + if hole_points is not None and hole_vertices is not None and len(hole_vertices): + polygon['holes'] = np.array(hole_points).astype(np.float32) + + # Start adding hole segments after the exterior segments + start_idx = len(vertices) + for hole in hole_vertices: + hole_segments = [[start_idx + i, start_idx + (i + 1) % len(hole)] for i in range(len(hole))] + segments = np.vstack([segments, hole_segments]) + start_idx += len(hole) + + # Update the vertices and segments in the polygon + all_vertices = np.vstack([vertices] + hole_vertices) + polygon['vertices'] = all_vertices + polygon['segments'] = segments + + # Tessellate the polygon + tess = tr.triangulate(polygon, 'pF') + + # Extract tessellated triangle vertices + if 'triangles' not in tess: + return None + + tessellated_vertices = np.array([tess['vertices'][t] for t in tess['triangles']]).reshape(-1, 2) + + # Convert to float32 + tessellated_vertices = tessellated_vertices.astype('float32') + + return tessellated_vertices
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow/util/tfrecord2idx/index.html b/docs/_modules/slideflow/util/tfrecord2idx/index.html new file mode 100644 index 000000000..f81fc5820 --- /dev/null +++ b/docs/_modules/slideflow/util/tfrecord2idx/index.html @@ -0,0 +1,779 @@ + + + + + + + + + + + + slideflow.util.tfrecord2idx — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow.util.tfrecord2idx

+from __future__ import print_function
+
+import io
+import gzip
+import os
+import struct
+import sys
+import numpy as np
+import slideflow as sf
+from typing import Optional, Dict, Tuple
+from os.path import dirname, join, exists
+from slideflow import errors
+
+
+TYPENAME_MAPPING = {
+    "byte": "bytes_list",
+    "float": "float_list",
+    "int": "int64_list"
+}
+
+FEATURE_DESCRIPTION = {
+    'image_raw': 'byte',
+    'slide': 'byte',
+    'loc_x': 'int',
+    'loc_y': 'int'
+}
+
+# -----------------------------------------------------------------------------
+
+def _build_index_from_tfrecord(file_path: str) -> Tuple[np.ndarray, np.ndarray]:
+    """Build an index from a TFRecord file.
+
+    Args:
+        file_path (str): Path to the TFRecord file.
+
+    Returns:
+        Tuple[np.ndarray, np.ndarray]: A tuple containing two arrays:
+            - The first array contains the starting byte and length of each
+              record in the TFRecord file.
+            - The second array contains the location information of each record.
+
+    """
+    infile = open(file_path, "rb")
+    start_bytes_array = []
+    loc_array = []
+    idx = 0
+    datum_bytes = bytearray(1024 * 1024)
+
+    while True:
+        cur = infile.tell()
+        byte_len = infile.read(8)
+        if len(byte_len) == 0:
+            break
+        infile.read(4)
+        proto_len = struct.unpack("q", byte_len)[0]
+
+        if proto_len > len(datum_bytes):
+            try:
+                _fill = int(proto_len * 1.5)
+                datum_bytes = datum_bytes.zfill(_fill)
+            except OverflowError:
+                raise OverflowError(
+                    f'Error reading tfrecord {file_path}'
+                )
+        datum_bytes_view = memoryview(datum_bytes)[:proto_len]
+        if infile.readinto(datum_bytes_view) != proto_len:
+            raise RuntimeError(
+                f"Failed to read record {idx} of file {file_path}"
+            )
+        infile.read(4)
+        start_bytes_array += [[cur, infile.tell() - cur]]
+
+        # Process record bytes, to read location information.
+        try:
+            record = process_record_from_bytes(datum_bytes_view)
+        except errors.TFRecordsError:
+            raise errors.TFRecordsError(
+                f'Unable to detect TFRecord format: {file_path}'
+            )
+        if 'loc_x' in record and 'loc_y' in record:
+            loc_array += [[record['loc_x'], record['loc_y']]]
+        elif 'loc_x' in record:
+            loc_array += [[record['loc_x']]]
+        idx += 1
+
+    infile.close()
+    if loc_array:
+        loc_array = np.array(loc_array)
+
+    return np.array(start_bytes_array), loc_array
+
+
+def create_index(
+    tfrecord_file: str,
+    index_file: Optional[str] = None
+) -> str:
+    """Create index from the tfrecords file.
+
+    Stores starting location (byte) and length (in bytes) of each
+    serialized record.
+
+    Params:
+    -------
+    tfrecord_file: str
+        Path to the TFRecord file.
+
+    index_file: str
+        Path where to store the index file.
+    """
+    if index_file is None:
+        index_file = join(dirname(tfrecord_file),
+                          sf.util.path_to_name(tfrecord_file) + '.index')
+    start_bytes, locations = _build_index_from_tfrecord(tfrecord_file)
+    return save_index(start_bytes, index_file, locations=locations)
+
+
+def save_index(
+    index_array: np.ndarray,
+    index_file: str,
+    locations: Optional[np.ndarray] = None
+) -> str:
+    """Save an array as an index file."""
+    if sf.util.zip_allowed():
+        loc_kw = dict()
+        if locations is not None:
+            loc_kw['locations'] = locations
+        np.savez(
+            index_file,
+            arr_0=index_array,
+            **loc_kw
+        )
+        return index_file + '.npz'
+    else:
+        np.save(index_file + '.npy', index_array)
+        return index_file + '.npy'
+
+
+def find_index(tfrecord: str) -> Optional[str]:
+    """Find the index file for a TFRecord."""
+    name = sf.util.path_to_name(tfrecord)
+    if exists(join(dirname(tfrecord), name+'.index')):
+        return join(dirname(tfrecord), name+'.index')
+    elif exists(join(dirname(tfrecord), name+'.index.npz')):
+        return join(dirname(tfrecord), name+'.index.npz')
+    elif exists(join(dirname(tfrecord), name+'.index.npy')):
+        return join(dirname(tfrecord), name+'.index.npy')
+    else:
+        return None
+
+
+def load_index(tfrecord: str) -> Optional[np.ndarray]:
+    """Find and load the index associated with a TFRecord."""
+    index_path = find_index(tfrecord)
+    if index_path is None:
+        raise OSError(f"Could not find index path for TFRecord {tfrecord}")
+    if os.stat(index_path).st_size == 0:
+        return None
+    elif index_path.endswith('npz'):
+        return np.load(index_path)['arr_0']
+    elif index_path.endswith('npy'):
+        return np.load(index_path)
+    else:
+        return np.loadtxt(index_path, dtype=np.int64)
+
+
+def index_has_locations(index: str) -> bool:
+    """Check if an index file has tile location information stored."""
+    if index.endswith('npy'):
+        return False
+    else:
+        try:
+            return 'locations' in np.load(index).files
+        except ValueError as e:
+            raise ValueError(
+                f"Failed to load TFRecord index. Try regenerating index files "
+                f"with Dataset.rebuild_index(). Error received: {e}"
+            )
+
+
+def get_locations_from_index(index: str):
+    if index.endswith('npy'):
+        raise errors.TFRecordsIndexError(
+            f"Index file {index} does not contain location information."
+        )
+    loaded = np.load(index)
+    if 'locations' not in loaded:
+        raise errors.TFRecordsIndexError(
+            f"Index file {index} does not contain location information."
+        )
+    return [tuple(l) for l in loaded['locations']]
+
+
+
[docs]def get_tfrecord_length(tfrecord: str) -> int: + """Return the number of records in a TFRecord file. + + Uses an index file if available, otherwise iterates through + the file to find the total record length. + + Args: + tfrecord (str): Path to TFRecord. + + Returns: + int: Number of records. + + """ + index_path = find_index(tfrecord) + if index_path is None: + return read_tfrecord_length(tfrecord) + if os.stat(index_path).st_size == 0: + return 0 + else: + index_array = load_index(tfrecord) + if index_array is None: + return 0 + else: + return index_array.shape[0]
+ + +def read_tfrecord_length(tfrecord: str) -> int: + """Returns number of records stored in the given tfrecord file.""" + infile = open(tfrecord, "rb") + num_records = 0 + while True: + infile.tell() + try: + byte_len = infile.read(8) + if len(byte_len) == 0: + break + infile.read(4) + proto_len = struct.unpack("q", byte_len)[0] + infile.read(proto_len) + infile.read(4) + num_records += 1 + except Exception: + sf.log.error(f"Failed to parse TFRecord at {tfrecord}") + infile.close() + return 0 + infile.close() + return num_records + + +
[docs]def get_tfrecord_by_index( + tfrecord: str, + index: int, + *, + compression_type: Optional[str] = None, + index_array: Optional[np.ndarray] = None +) -> Dict: + """Read a specific record in a TFRecord file. + + Args: + tfrecord (str): TFRecord file to read. + index (int): Index of record to read from the file. + compression_type (str): Type of compression in the TFRecord file. + Either 'gzip' or None. Defaults to None. + + Returns: + A dictionary mapping record names (e.g., ``'slide'``, ``'image_raw'``, + ``'loc_x'``, and ``'loc_y'``) to their values. ``'slide'`` will be a + string, ``image_raw`` will be bytes, and ``'loc_x'`` and ``'loc_y'`` + will be `int`. + + Raises: + slideflow.error.EmptyTFRecordsError: If the file is empty. + + slideflow.error.InvalidTFRecordIndex: If the given index cannot be found. + """ + + # Load the TFRecord file. + if compression_type == "gzip": + file = gzip.open(tfrecord, 'rb') + elif compression_type is None: + file = io.open(tfrecord, 'rb') # type: ignore + else: + raise ValueError("compression_type should be 'gzip' or None") + if not os.path.getsize(tfrecord): + raise errors.EmptyTFRecordsError(f"{tfrecord} is empty.") + + # Load the TFRecord index file. + if index: + idx = index_array if index_array is not None else load_index(tfrecord) + if idx is None: + raise ValueError(f"Could not find tfrecord index for {tfrecord}") + if index >= idx.shape[0]: + raise errors.InvalidTFRecordIndex( + f"Index {index} is invalid for tfrecord {tfrecord} " + f"(size: {idx.shape[0]})" + ) + start_offset = idx[index, 0] + file.seek(start_offset) + + # Read the designated record. + length_bytes = bytearray(8) + crc_bytes = bytearray(4) + datum_bytes = bytearray(1024 * 1024) + if file.readinto(length_bytes) != 8: + raise RuntimeError("Failed to read the record size.") + if file.readinto(crc_bytes) != 4: + raise RuntimeError("Failed to read the start token.") + length, = struct.unpack("<Q", length_bytes) + if length > len(datum_bytes): + try: + _fill = int(length * 1.5) + datum_bytes = datum_bytes.zfill(_fill) + except OverflowError: + raise OverflowError('Error reading tfrecords; please ' + 'try regenerating index files') + datum_bytes_view = memoryview(datum_bytes)[:length] + if file.readinto(datum_bytes_view) != length: + raise RuntimeError("Failed to read the record.") + if file.readinto(crc_bytes) != 4: + raise RuntimeError("Failed to read the end token.") + + # Process record bytes. + try: + record = process_record_from_bytes(datum_bytes_view) + except errors.TFRecordsError: + raise errors.TFRecordsError( + f'Unable to detect TFRecord format: {tfrecord}' + ) + + file.close() + return record
+ + +def process_record_from_bytes(bytes_view): + try: + record = process_record(bytes_view) + except KeyError: + feature_description = { + k: v for k, v in FEATURE_DESCRIPTION.items() + if k in ('slide', 'image_raw') + } + try: + record = process_record(bytes_view, description=feature_description) + except KeyError: + raise errors.TFRecordsError + + # Final parsing. + if 'slide' in record: + record['slide'] = bytes(record['slide']).decode('utf-8') + if 'image_raw' in record: + record['image_raw'] = bytes(record['image_raw']) + if 'loc_x' in record: + record['loc_x'] = record['loc_x'][0] + if 'loc_y' in record: + record['loc_y'] = record['loc_y'][0] + return record + + +def process_record(record, description=None): + if description is None: + description = FEATURE_DESCRIPTION + example = sf.util.example_pb2.Example() + example.ParseFromString(record) + return sf.util.extract_feature_dict( + example.features, + description, + TYPENAME_MAPPING) + + +def main(): + if len(sys.argv) < 3: + print("Usage: tfrecord2idx <tfrecord path> <index path>") + sys.exit() + + create_index(sys.argv[1], sys.argv[2]) + + +if __name__ == "__main__": + main() +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow_gpl/clam/config/index.html b/docs/_modules/slideflow_gpl/clam/config/index.html new file mode 100644 index 000000000..d211defcb --- /dev/null +++ b/docs/_modules/slideflow_gpl/clam/config/index.html @@ -0,0 +1,765 @@ + + + + + + + + + + + + slideflow_gpl.clam.config — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow_gpl.clam.config

+# Slideflow-GPL - Add-ons for the deep learning library Slideflow
+# Copyright (C) 2024 James Dolezal
+#
+# This file is part of Slideflow-GPL.
+#
+# Slideflow-GPL is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# Slideflow-GPL is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Slideflow-GPL. If not, see <https://www.gnu.org/licenses/>.
+
+import slideflow as sf
+from typing import Union, List, Tuple, Optional, TYPE_CHECKING
+from slideflow import log, errors, Dataset
+from slideflow.mil import MILModelConfig, TrainerConfig
+
+if TYPE_CHECKING:
+    import torch
+
+# -----------------------------------------------------------------------------
+
+
[docs]class CLAMModelConfig(MILModelConfig): + + valid_models = ['clam_sb', 'clam_mb', 'mil_fc_mc', 'mil_fc'] + + def __init__( + self, + model: str = 'clam_sb', + *, + model_size: str = 'small', + bag_loss: str = 'ce', + bag_weight: float = 0.7, + dropout: bool = False, + opt: str = 'adam', + inst_loss: str = 'ce', + no_inst_cluster: bool = False, + B: int = 8, + model_kwargs: Optional[dict] = None, + validate: bool = True, + **kwargs + ): + """Model configuration for CLAM models. + + These configuration options are identical to the options in the + `original CLAM paper <https://arxiv.org/abs/2004.09666>`_. + + Keyword args: + model (str): Model. Either ``'clam_sb'``, ``'clam_mb'``, + ``'mil_fc'``, or ``'mil_fc_mc'``. Defaults to ``'clam_sb'``. + model_size (str): Size of the model. Available sizes include: + + ``clam_sb`` + + .. list-table:: + :header-rows: 0 + + * - small + - [1024, 512, 256] + * - big + - [1024, 512, 384] + * - multiscale + - [2048, 512, 256] + * - xception + - [2048, 256, 128] + * - xception_multi + - [1880, 128, 64] + * - xception_3800 + - [3800, 512, 256] + + ``clam_mb`` + + .. list-table:: + :header-rows: 0 + + * - small + - [1024, 512, 256] + * - big + - [1024, 512, 384] + * - multiscale + - [2048, 512, 256] + + ``mil_fc`` + + .. list-table:: + :header-rows: 0 + + * - small + - [1024, 512] + + ``mil_fc_mc`` + + .. list-table:: + :header-rows: 0 + + * - small + - [1024, 512] + + bag_loss (str): Primary loss function. Either 'ce' or 'svm'. + If 'ce', the model loss function is a cross entropy loss. + If 'svm', the model loss is topk.SmoothTop1SVM. + Defaults to 'ce'. + bag_weight (float): Weight of the bag loss. The total loss is + defined0 as ``W * loss + (1 - W) * instance_loss``, where + ``W`` is the bag weight. Defaults to 0.7 + dropout (bool): Add dropout (p=0.25) after the attention layers. + Defaults to False. + opt (str): Optimizer. Either 'adam' (Adam optimizer) or 'sgd' + (Stochastic Gradient Descent). Defaults to 'adam'. + inst_loss (str): Instance loss function. Either 'ce' or 'svm'. + If 'ce', the instance loss is a cross entropy loss. + If 'svm', the loss is topk.SmoothTop1SVM. + Defaults to 'ce'. + no_inst_cluster (bool): Disable instance-level clustering. + Defaults to False. + B (int): Number of positive/negative patches to sample for + instance-level training. Defaults to 8. + validate (bool): Validate the hyperparameter configuration. + Defaults to True. + + """ + + for argname, argval in dict(locals()).items(): + if argname not in ('kwargs', 'validate'): + setattr(self, argname, argval) + if kwargs and validate: + raise errors.UnrecognizedHyperparameterError("Unrecognized parameters: {}".format( + ', '.join(list(kwargs.keys())) + )) + elif kwargs: + log.warning("Ignoring unrecognized parameters: {}".format( + ', '.join(list(kwargs.keys())) + )) + + @property + def model_fn(self): + from .model import CLAM_MB, CLAM_SB, MIL_fc_mc, MIL_fc + model_dict = { + 'clam_sb': CLAM_SB, + 'clam_mb': CLAM_MB, + 'mil_fc_mc': MIL_fc_mc, + 'mil_fc': MIL_fc + } + return model_dict[self.model] + + @property + def loss_fn(self): + from .legacy.utils import loss_utils + if self.bag_loss == 'ce': + if self.model.startswith('clam'): + return loss_utils.CrossEntropyWithInstanceLoss + else: + return loss_utils.CrossEntropyLoss + else: + raise ValueError("Unrecognized bag loss: {}".format(self.bag_loss)) + + @property + def model_type(self): + return 'classification' + + def get_metrics(self): + from .legacy.utils import loss_utils + return [loss_utils.RocAuc()] + + def build_model(self, n_in, n_out, **kwargs): + if isinstance(self.model_size, str): + config_size = self.model_fn.sizes[self.model_size] + else: + config_size = self.model_size + model_size = [n_in] + config_size[1:] + return self.model_fn(size=model_size, n_classes=n_out, **kwargs) + + def verify_trainer(self, trainer): + if hasattr(trainer, 'batch_size') and trainer.batch_size > 1: + log.info( + "CLAM models do not support batch sizes > 1; setting batch_size to 1." + ) + trainer.batch_size = 1 + + def inspect_batch(self, batch) -> Tuple[int, int]: + """Inspect a batch to determine the input and output dimensions..""" + bags, targets, _ = batch[0] + n_in = bags.shape[-1] + n_out = targets.shape[-1] + return n_in, n_out + + def _verify_eval_params(self, **kwargs): + """Verify evaluation parameters.""" + super()._verify_eval_params(**kwargs) + + if kwargs.get('uq'): + raise ValueError( + "Cannot calculate uncertainty quantification using CLAM models." + ) + + def _build_dataloader( + self, + bags, + targets, + encoder, + *, + dataset_kwargs = None, + dataloader_kwargs = None + ) -> "torch.utils.DataLoader": + from fastai.vision.all import DataLoader + from .data import build_clam_dataset + + dataset_kwargs = dataset_kwargs or dict() + dataloader_kwargs = dataloader_kwargs or dict() + + dataset = build_clam_dataset(bags, targets, encoder=encoder, **dataset_kwargs) + dataloader = DataLoader(dataset, **dataloader_kwargs) + return dataloader + + def predict(self, model, bags, attention=False, device=None, **kwargs): + """Generate CLAM predictions for a list of bags.""" + from .inference import run_inference + + self._verify_eval_params(**kwargs) + return run_inference(model, bags, attention=attention) + + def batched_predict(self, *args, **kwargs): + """CLAM models do not support batched predictions with batch_size > 1. + + Thus, this method is equivalent to :meth:`predict`, which generates + predictions for each bag individually. + + """ + return self.predict(*args, **kwargs)
+ +# ----------------------------------------------------------------------------- + +class LegacyCLAMTrainerConfig(TrainerConfig): + + tag = 'legacy_clam' + + def __init__( + self, + *, + num_splits: int = 1, # Unused; kept for backwards compatibility + k: int = 3, + k_start: int = -1, + k_end: int = -1, + max_epochs: int = 20, + lr: float = 1e-4, + reg: float = 1e-5, + label_frac: float = 1, + weighted_sample: bool = False, + log_data: bool = False, + testing: bool = False, + early_stopping: bool = False, + subtyping: bool = False, + seed: int = 1, + results_dir: Optional[str] = None, # Unused; kept for compatibility + n_classes: Optional[int] = None, + split_dir=None, + data_root_dir=None, + micro_average=False, + **kwargs + ): + """Training configuration for the legacy CLAM trainer. + + This configures the legacy CLAM trainer. The FastAI trainer is + preferred for all models, including CLAM. + + The configuration options for the legacy CLAM trainer are identical to + the options in the `original CLAM paper <https://arxiv.org/abs/2004.09666>`_. + + Keyword args: + k (int): Number of cross-fold splits. Defaults to 3. + k_start (int): Starting cross-fold. Defaults to first cross-fold. + k_end (int): Ending cross-fold. Defaults to ending after last + cross-fold is done. + max_epochs (int): Number of epochs to train. Defaults to 20. + lr (float): Learning rate. Defaults to 1e-4. + reg (float): Weight decay. Defaults to 1e-5. + weighted_sample (bool): Equally sample from all outcome classes. + Defaults to False. + log_data (bool): Log to tensorboard. Defaults to False. + early_stopping (bool): Stop the training if validation loss doesn't + improve after 5 epochs. Will not trigger early stopping + until epoch 50. Defaults to False. + subtyping (bool): Whether this is a subtyping problem. + Defaults to False. + seed (int): Set the random seed. Defaults to 1. + n_classes (int): Number of outcome classes. Defaults to None. + micro_average (bool): Use micro averaging when calculate AUROC. + **kwargs: All additional keyword arguments are passed to + :class:`slideflow.mil.CLAMModelConfig`. + """ + for argname, argval in dict(locals()).items(): + if argname != 'kwargs': + setattr(self, argname, argval) + self.model_config = CLAMModelConfig(**kwargs) + + def _to_clam_args(self): + """Convert into CLAM_Args format (legacy support).""" + from .legacy import CLAM_Args + all_kw = self.to_dict() + all_kw.update(self.model_config.to_dict()) + all_kw['model_type'] = all_kw['model'] + all_kw['drop_out'] = all_kw['dropout'] + del all_kw['model'] + del all_kw['dropout'] + del all_kw['model_kwargs'] + return CLAM_Args(**all_kw) + + def train( + self, + train_dataset: Dataset, + val_dataset: Optional[Dataset], + outcomes: Union[str, List[str]], + bags: Union[str, List[str]], + *, + outdir: str = 'mil', + exp_label: Optional[str] = None, + **kwargs + ): + from .legacy.trainer import train_clam + + # Prepare output directory + outdir = self.prepare_training(outcomes, exp_label, outdir) + + # Use training data as validation if no validation set is provided + if val_dataset is None: + sf.log.info( + "Training without validation; metrics will be calculated on training data." + ) + val_dataset = train_dataset + + return train_clam( + self, + train_dataset, + val_dataset, + outcomes, + bags, + outdir=outdir, + **kwargs + ) + + def _verify_eval_params(self, **kwargs): + """Verify evaluation parameters.""" + super()._verify_eval_params(**kwargs) + + if kwargs.get('aggregation_level') == 'patient': + raise ValueError( + "Cannot aggregate bags by patient using the legacy CLAM trainer." + ) + if kwargs.get('uq'): + raise ValueError( + "Cannot calculate uncertainty quantification using the legacy CLAM trainer." + ) + +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow_noncommercial/biscuit/experiment/index.html b/docs/_modules/slideflow_noncommercial/biscuit/experiment/index.html new file mode 100644 index 000000000..c5614aff2 --- /dev/null +++ b/docs/_modules/slideflow_noncommercial/biscuit/experiment/index.html @@ -0,0 +1,1511 @@ + + + + + + + + + + + + slideflow_noncommercial.biscuit.experiment — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow_noncommercial.biscuit.experiment

+import shutil
+import pandas as pd
+import seaborn as sns
+import matplotlib.pyplot as plt
+import matplotlib.ticker as plticker
+import numpy as np
+from skmisc.loess import loess
+from scipy import stats
+from tqdm import tqdm
+from statistics import mean
+from os.path import join, exists
+
+import slideflow as sf
+from slideflow.util import log
+from . import utils, threshold
+from . import hp as biscuit_hp
+from .errors import MatchError, ModelNotFoundError, ThresholdError
+
+# -----------------------------------------------------------------------------
+
+ALL_EXP = {
+    'AA': 'full',
+    'U': 800,
+    'T': 700,
+    'S': 600,
+    'R': 500,
+    'A': 400,
+    'L': 350,
+    'M': 300,
+    'N': 250,
+    'D': 200,
+    'O': 176,
+    'P': 150,
+    'Q': 126,
+    'G': 100,
+    'V': 90,
+    'W': 80,
+    'X': 70,
+    'Y': 60,
+    'Z': 50,
+    'ZA': 40,
+    'ZB': 30,
+    'ZC': 20,
+    'ZD': 10
+}
+
+# -----------------------------------------------------------------------------
+
+
[docs]class Experiment: + def __init__( + self, + train_project, + eval_projects=None, + outcome='cohort', + outcome1='LUAD', + outcome2='LUSC', + outdir='results' + ): + """Supervises uncertainty thresholding experiments.""" + + if eval_projects is None: + eval_projects = [] + + if isinstance(train_project, str): + self.train_project = sf.Project(train_project) + elif isinstance(train_project, sf.Project): + self.train_project = train_project + else: + raise ValueError(f"Unrecognized value for train_project: {train_project}") + + self.eval_projects = [] + for ep in eval_projects: + if isinstance(ep, str): + self.eval_projects += [sf.Project(ep)] + elif isinstance(ep, sf.Project): + self.eval_projects += [ep] + else: + raise ValueError(f"Unrecognized value for eval_project: {eval_projects}") + + self.outcome = outcome + self.outcome1 = outcome1 + self.outcome2 = outcome2 + self.outdir = outdir + + def add(self, path, label, out1, out2, order='f', order_col='order', gan=0): + """Adds a sample size experiment to the given project annotations file. + + Args: + path (str): Path to project annotations file. + label (str): Experimental label. + out1 (int): Number of lung adenocarcinomas (LUAD) to include in the + experiment. + out2 (int): Number of lung squamous cell carcinomas (LUSC) to include + in the experiment. + outcome (str, optional): Annotation header which indicates the outcome + of interest. Defaults to 'cohort'. + order (str, optional): 'f' (forward) or 'r' (reverse). Indicates which + direction to follow when sequentially adding slides. + Defaults to 'f'. + order_col (str, optional): Annotation header column to use when + sequentially adding slides. Defaults to 'order'. + gan (int, optional): Number of GAN slides to include in experiment. + Defaults to 0. + + Returns: + None + """ + + assert isinstance(out1, int) + assert isinstance(out2, int) + assert isinstance(gan, (int, float)) and 0 <= gan < 1 + assert order in ('f', 'r') + + ann = pd.read_csv(path, dtype=str) + print(f"Setting up exp. {label} with order {order} (sort by {order_col})") + ann[order_col] = pd.to_numeric(ann[order_col]) + ann.sort_values( + ['gan', self.outcome, order_col], + ascending=[True, True, (order != 'r')], + inplace=True + ) + gan_out1 = round(gan * out1) + gan_out2 = round(gan * out2) + out1_indices = np.where((ann['site'].to_numpy() != 'GAN') + & (ann[self.outcome] == self.outcome1))[0] + out2_indices = np.where((ann['site'].to_numpy() != 'GAN') + & (ann[self.outcome] == self.outcome2))[0] + gan_out1_indices = np.where((ann['site'].to_numpy() == 'GAN') + & (ann[self.outcome] == self.outcome1))[0] + gan_out2_indices = np.where((ann['site'].to_numpy() == 'GAN') + & (ann[self.outcome] == self.outcome2))[0] + + assert out1 <= out1_indices.shape[0] + assert out2 <= out2_indices.shape[0] + assert gan_out1 <= gan_out1_indices.shape[0] + assert gan_out2 <= gan_out2_indices.shape[0] + + include = np.array(['exclude' for _ in range(len(ann))]) + include[out1_indices[:out1]] = 'include' + include[out2_indices[:out2]] = 'include' + include[gan_out1_indices[:gan_out1]] = 'include' + include[gan_out2_indices[:gan_out2]] = 'include' + ann[f'include_{label}'] = include + ann.to_csv(path, index=False) + + @staticmethod + def config(name_pattern, subset, ratio, **kwargs): + """Configures a set of experiments. + + Args: + name_pattern (str): String pattern for experiment naming. + subset (list(str)): List of experiment ID/labels. + ratio (float): Float 0-1. n_out1 / n_out2 (or n_out2 / n_out1) + """ + + if not isinstance(ratio, (int, float)) and ratio >= 1: + raise ValueError("Invalid ratio; must be float >= 1") + config = {} + for exp in ALL_EXP: + if exp not in subset: + continue + if exp == 'AA' and ratio != 1: + raise ValueError("Cannot create full dataset exp. with ratio != 1") + + exp_name = name_pattern.format(exp) + if ratio != 1: + n1 = round(ALL_EXP[exp] / (1 + (1/ratio))) + n2 = ALL_EXP[exp] - n1 + + config.update({ + exp_name: {'out1': n1, 'out2': n2, **kwargs}, + exp_name+'i': {'out1': n2, 'out2': n1, **kwargs} + }) + else: + if ALL_EXP[exp] == 'full': + n_out1 = 467 + n_out2 = 474 + else: + n_out1 = n_out2 = int(ALL_EXP[exp] / 2) + config.update({ + exp_name: {'out1': n_out1, 'out2': n_out2, **kwargs}, + }) + return config + + def display(self, df, eval_dfs, hue='uq', palette='tab10', relplot_uq_compare=True, + boxplot_uq_compare=True, ttest_uq_groups=['all', 'include'], + prefix=''): + """Creates plots from assmebled results, exports results to CSV. + + Args: + df (pandas.DataFrame): Cross-validation results metrics, as generated + by results() + eval_dfs (dict(pandas.DataFrame)): Dict of external eval dataset names + (keys) mapped to pandas DataFrame of result metrics (values). + hue (str, optional): Comparison to show with different hue on plots. + Defaults to 'uq'. + palette (str, optional): Seaborn color palette. Defaults to 'tab10'. + relplot_uq_compare (bool, optional): For the Relplot display, ensure + non-UQ and UQ results are generated from the same models/preds. + boxplot_uq_compare (bool, optional): For the boxplot display, ensure + non-UQ and UQ results are generated from the same models/preds. + ttest_uq_groups (list(str)): UQ groups to compare via t-test. Defaults + to ['all', 'include']. + prefix (str, optional): Prefix to use when saving figures. + Defaults to empty string. + + Returns: + None + """ + + if not len(df): + log.error("No results to display") + return + + # Filter out UQ results if n_slides < 100 + df = df.loc[~ ((df['n_slides'] < 100) + & (df['uq'].isin(['include', 'exclude'])))] + + # --- Paired t-tests --------------------------------------------------- + if ttest_uq_groups and len(ttest_uq_groups) != 2: + raise ValueError("Length of ttest_uq_groups must be exactly 2") + ttest_df = df.loc[df['uq'].isin(ttest_uq_groups)].copy() + ttest_df = ttest_df.sort_values(['id', 'fold']) + + def perform_paired_testing(level): + print(f"Paired t-tests ({level}-level):") + for n in sorted(ttest_df['n_slides'].unique()): + exp_df = ttest_df[ttest_df['n_slides'] == n] + try: + ttest_result = stats.ttest_rel( + exp_df.loc[exp_df['uq'] == ttest_uq_groups[0]][f'{level}_auc'], + exp_df.loc[exp_df['uq'] == ttest_uq_groups[1]][f'{level}_auc'], + alternative='less') + print(n, '\t', 'p =', ttest_result.pvalue) + except ValueError: + print(n, '\t', 'p = (error)') + + perform_paired_testing('patient') + perform_paired_testing('slide') + + # --- Cross-validation plots ------------------------------------------- + + if len(df): + # AUC (relplot) + if relplot_uq_compare: + rel_df = df.loc[df['uq'] != 'none'] + else: + rel_df = df + sns.relplot( + x='n_slides', + y='slide_auc', + data=rel_df, + hue=hue, + marker='o', + kind='line', + palette=palette + ) + plt.title('Cross-val AUC') + ax = plt.gca() + ax.set_ylim([0.5, 1]) + ax.grid(visible=True, which='both', axis='both', color='white') + ax.set_facecolor('#EAEAF2') + ax.xaxis.set_minor_locator(plticker.MultipleLocator(100)) + plt.subplots_adjust(top=0.9) + plt.savefig(join(self.outdir, f'{prefix}relplot.svg')) + + f, axes = plt.subplots(1, 3) + f.set_size_inches(18, 6) + + # AUC boxplot + if boxplot_uq_compare: + box_df = df.loc[df['uq'] != 'none'] + else: + box_df = df + sns.boxplot( + x='n_slides', + y='slide_auc', + hue=hue, + data=box_df, + ax=axes[0], + palette=palette + ) + axes[0].title.set_text('Cross-val AUC') + axes[0].set_ylabel('') + axes[0].tick_params(labelrotation=90) + + # AUC scatter - LOESS & standard error + df = df.sort_values(by=['n_slides']) + x = df['n_slides'].to_numpy().astype(np.float32) + y = df['slide_auc'].to_numpy() + lo = loess(x, y) + try: + lo.fit() + pred = lo.predict(x, stderror=True) + conf = pred.confidence() + z = pred.values + ll = conf.lower + ul = conf.upper + axes[1].plot(x, y, '+', ms=6) + axes[1].plot(x, z) + axes[1].fill_between(x, ll, ul, alpha=.33) + except ValueError: + pass + + axes[1].xaxis.set_minor_locator(plticker.MultipleLocator(20)) + axes[1].spines['bottom'].set_linewidth(0.5) + axes[1].spines['bottom'].set_color('black') + axes[1].tick_params(axis='x', colors='black') + axes[1].grid(visible=True, which='both', axis='both', color='white') + axes[1].set_facecolor('#EAEAF2') + axes[1].set_xscale('log') + axes[1].title.set_text('Cross-val AUC') + + # % slides included + sns.lineplot( + x='n_slides', + y='patient_uq_perc', + data=df, + marker='o', + ax=axes[2], + zorder=3 + ) + axes[2].set_ylabel('') + axes[2].title.set_text('% Patients Included with UQ (cross-val)') + axes[2].xaxis.set_minor_locator(plticker.MultipleLocator(100)) + axes[2].tick_params(labelrotation=90) + axes[2].grid(visible=True, which='both', axis='both', color='white', zorder=0) + axes[2].set_facecolor('#EAEAF2') + axes[2].set_xlim(100) + axes[2].scatter(x=df.groupby('n_slides', as_index=False).median().n_slides.values, y=df.groupby('n_slides').median().patient_uq_perc.values, marker='x', zorder=5) + + plt.subplots_adjust(bottom=0.2) + plt.savefig(join(self.outdir, f'{prefix}crossval.svg')) + + # --- Evaluation plots ---------------------------------------------------- + + if eval_dfs: + for eval_name, eval_df in eval_dfs.items(): + if not len(eval_df): + continue + has_uq = len(eval_df.loc[eval_df['uq'].isin(['include', 'exclude'])]) + + # Prepare figure + sns.set(rc={"xtick.bottom": True, "ytick.left": True}) + f, axes = plt.subplots(1, (4 if has_uq else 3)) + f.suptitle(f'{eval_name} Evaluation Dataset') + f.set_size_inches(16, 4) + + # AUC + if not len(eval_df): + continue + eval_df = eval_df.loc[~ ((eval_df['n_slides'] < 100) + & (eval_df['uq'].isin(['include', 'exclude'])))] + sns.lineplot( + x='n_slides', + y='patient_auc', + hue=hue, + data=eval_df, + marker="o", + ax=axes[0] + ) + sns.scatterplot( + x='n_slides', + y='slide_auc', + hue=hue, + data=eval_df, + marker="x", + ax=axes[0] + ) + axes[0].get_legend().remove() + axes[0].title.set_text('AUC') + + # Accuracy + sns.lineplot( + x='n_slides', + y='patient_acc', + hue=hue, + data=eval_df, + marker="o", + ax=axes[1] + ) + sns.scatterplot( + x='n_slides', + y='slide_acc', + hue=hue, + data=eval_df, + marker="x", + ax=axes[1] + ) + axes[1].get_legend().remove() + axes[1].title.set_text('Accuracy') + + # Youden's index + sns.lineplot( + x='n_slides', + y='patient_youden', + hue=hue, + data=eval_df, + marker="o", + ax=axes[2] + ) + sns.scatterplot( + x='n_slides', + y='slide_youden', + hue=hue, + data=eval_df, + marker="x", + ax=axes[2] + ) + axes[2].title.set_text("Youden's J") + axes[2].get_legend().remove() + + # % slides included + if has_uq: + sns.lineplot( + x='n_slides', + y='patient_incl', + data=eval_df.loc[eval_df['uq'] == 'include'], + marker='o' + ) + sns.scatterplot( + x='n_slides', + y='slide_incl', + data=eval_df.loc[eval_df['uq'] == 'include'], + marker='x' + ) + axes[3].title.set_text('% Included') + for ax in axes: + ax.set_ylabel('') + ax.xaxis.set_major_locator(plticker.MultipleLocator(base=100)) + ax.tick_params(labelrotation=90) + plt.subplots_adjust(top=0.8) + plt.subplots_adjust(bottom=0.2) + plt.savefig(join(self.outdir, f'{prefix}eval.svg')) + + def plot_uq_calibration(self, label, tile_uq, slide_uq, slide_pred, epoch=1): + """Plots a graph of predictions vs. uncertainty. + + Args: + label (str): Experiment label. + kfold (int): Validation k-fold. + tile_uq (float): Tile-level uncertainty threshold. + slide_uq (float): Slide-level uncertainty threshold. + slide_pred (float): Slide-level prediction threshold. + + Returns: + None + """ + + val_dfs = [ + pd.read_csv( + join( + utils.find_model(self.train_project, label, kfold=k, outcome=self.outcome), + f'tile_predictions_val_epoch{epoch}.csv'), + dtype={'slide': str}) + for k in range(1, 4) + ] + for v in range(len(val_dfs)): + utils.rename_cols(val_dfs[v], outcome=self.outcome) + _df = val_dfs[0] + _df = pd.concat([_df, val_dfs[1]], axis=0, join='outer', ignore_index=True) + _df = pd.concat([_df, val_dfs[2]], axis=0, join='outer', ignore_index=True) + + # Plot tile-level uncertainty + patients = self.train_project.dataset().patients() + _df, _ = threshold.process_tile_predictions(_df, patients=patients) + threshold.plot_uncertainty( + _df, + kind='tile', + threshold=tile_uq, + title=f'CV UQ Calibration: {label}' + ) + # Plot slide-level uncertainty + _df = _df[_df['uncertainty'] < tile_uq] + _s_df, _ = threshold.process_group_predictions( + _df, + pred_thresh=slide_pred, + level='slide' + ) + threshold.plot_uncertainty( + _s_df, + kind='slide', + threshold=slide_uq, + title=f'CV UQ Calibration: {label}' + ) + + def results(self, exp_to_run, uq=True, eval=True, plot=False): + """Assembles results from experiments, applies UQ thresholding, + and returns pandas dataframes with metrics. + + Args: + exp_to_run (list): List of experiment IDs to search for results. + uq (bool, optional): Apply UQ thresholds. Defaults to True. + eval (bool, optional): Calculate results of external evaluation models. + Defaults to True. + plot (bool, optional): Show plots. Defaults to False. + + Returns: + pandas.DataFrame: Cross-val results, + pandas.DataFrame: Dxternal eval results + """ + + # === Initialize projects & prepare experiments =========================== + + P = self.train_project + eval_Ps = self.eval_projects + df = pd.DataFrame() + eval_dfs = {val_P.name: pd.DataFrame() for val_P in eval_Ps} + prediction_thresholds = {} + slide_uq_thresholds = {} + tile_uq_thresholds = {} + pred_uq_thresholds = {} + + # === Show results from designated epoch ================================== + for exp in exp_to_run: + try: + models = utils.find_cv(P, f'EXP_{exp}', outcome=self.outcome) + except MatchError: + log.debug(f"Unable to find cross-val results for {exp}; skipping") + continue + for i, m in enumerate(models): + try: + results = utils.get_model_results(m, outcome=self.outcome, epoch=1) + except FileNotFoundError: + print(f"Unable to open cross-val results for {exp}; skipping") + continue + m_slides = sf.util.get_slides_from_model_manifest(m, dataset=None) + df = pd.concat([df, pd.DataFrame([{ + 'id': exp, + 'n_slides': len(m_slides), + 'fold': i+1, + 'uq': 'none', + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + 'tile_auc': results['tile_auc'], + 'tile_ap': results['tile_ap'], + }])], axis=0, join='outer', ignore_index=True) + + # === Add UQ Crossval results (non-thresholded) =========================== + for exp in exp_to_run: + try: + skip = False + models = utils.find_cv(P, f'EXP_{exp}_UQ', outcome=self.outcome) + except MatchError: + continue + all_pred_thresh = [] + for i, m in enumerate(models): + try: + results = utils.get_model_results(m, outcome=self.outcome, epoch=1) + all_pred_thresh += [results['opt_thresh']] + df = pd.concat([df, pd.DataFrame([{ + 'id': exp, + 'n_slides': len(sf.util.get_slides_from_model_manifest(m, dataset=None)), + 'fold': i+1, + 'uq': 'all', + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + 'tile_auc': results['tile_auc'], + 'tile_ap': results['tile_ap'], + }])], axis=0, join='outer', ignore_index=True) + except FileNotFoundError: + log.debug(f"Skipping UQ crossval (non-thresholded) results for {exp}; not found") + skip = True + break + if not skip: + prediction_thresholds[exp] = mean(all_pred_thresh) + + # === Get & Apply Nested UQ Threshold ===================================== + if uq: + pb = tqdm(exp_to_run) + for exp in pb: + # Skip UQ for experiments with n_slides < 100 + if exp in ('V', 'W', 'X', 'Y', 'Z', 'ZA', 'ZB', 'ZC', 'ZD'): + continue + pb.set_description(f"Calculating thresholds (exp {exp})...") + try: + _df, thresh = self.thresholds_from_nested_cv( + f'EXP_{exp}_UQ', id=exp + ) + df = pd.concat([df, _df], axis=0, join='outer', ignore_index=True) + except (MatchError, FileNotFoundError, ModelNotFoundError) as e: + log.debug(str(e)) + log.debug(f"Skipping UQ crossval results for {exp}; not found") + continue + except ThresholdError as e: + log.debug(str(e)) + log.debug(f'Skipping UQ crossval results for {exp}; could not find thresholds in cross-validation') + continue + + tile_uq_thresholds[exp] = thresh['tile_uq'] + slide_uq_thresholds[exp] = thresh['slide_uq'] + pred_uq_thresholds[exp] = thresh['slide_pred'] + # Show CV uncertainty calibration + if plot and exp == 'AA': + print("Plotting UQ calibration for cross-validation (exp. AA)") + self.plot_uq_calibration( + label=f'EXP_{exp}_UQ', + **thresh + ) + plt.show() + + # === Show external validation results ==================================== + if eval: + # --- Step 7A: Show non-UQ external validation results ---------------- + for val_P in eval_Ps: + name = val_P.name + pb = tqdm(exp_to_run, ncols=80) + for exp in pb: + pb.set_description(f'Working on {name} eval (EXP {exp})...') + + # Read and prepare model results + try: + eval_dir = utils.find_eval(val_P, f'EXP_{exp}_FULL', outcome=self.outcome) + results = utils.get_eval_results(eval_dir, outcome=self.outcome) + except (FileNotFoundError, MatchError): + log.debug(f"Skipping eval for exp {exp}; eval not found") + continue + if not utils.model_exists(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1): + log.debug(f'Skipping eval for exp {exp}; trained model not found') + continue + if exp not in prediction_thresholds: + log.warn(f"No predictions threshold for experiment {exp}; using slide-level pred threshold of 0.5") + pred_thresh = 0.5 + else: + pred_thresh = prediction_thresholds[exp] + + # Patient-level and slide-level predictions & metrics + patient_yt, patient_yp = utils.read_group_predictions( + join( + eval_dir, + f'patient_predictions_{self.outcome}_eval.csv' + ) + ) + patient_metrics = utils.prediction_metrics( + patient_yt, + patient_yp, + threshold=pred_thresh + ) + patient_metrics = { + f'patient_{m}': patient_metrics[m] + for m in patient_metrics + } + slide_yt, slide_yp = utils.read_group_predictions( + join( + eval_dir, + f'patient_predictions_{self.outcome}_eval.csv' + ) + ) + slide_metrics = utils.prediction_metrics( + slide_yt, + slide_yp, + threshold=pred_thresh + ) + slide_metrics = { + f'slide_{m}': slide_metrics[m] + for m in slide_metrics + } + model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + n_slides = len(sf.util.get_slides_from_model_manifest(model, dataset=None)) + eval_dfs[name] = pd.concat([eval_dfs[name], pd.DataFrame([{ + 'id': exp, + 'n_slides': n_slides, + 'uq': 'none', + 'incl': 1, + 'patient_auc': results['pt_auc'], + 'patient_ap': results['pt_ap'], + 'slide_auc': results['slide_auc'], + 'slide_ap': results['slide_ap'], + **patient_metrics, + **slide_metrics, + }])], axis=0, join='outer', ignore_index=True) + + # --- [end patient-level predictions] ------------------------- + + if exp not in prediction_thresholds: + log.debug(f"Unable to calculate eval UQ performance; no prediction thresholds found for exp {exp}") + continue + + # --- Step 7B: Show UQ external validation results ------------ + if uq: + if exp in tile_uq_thresholds: + for keep in ('high_confidence', 'low_confidence'): + tile_pred_df = pd.read_csv( + join( + eval_dir, + 'tile_predictions_eval.csv' + ), dtype={'slide': str} + ) + new_cols = { + f'{self.outcome}_y_pred1': 'y_pred', + f'{self.outcome}_y_true0': 'y_true', + f'{self.outcome}_uncertainty1': 'uncertainty' + } + tile_pred_df.rename(columns=new_cols, inplace=True) + thresh_tile = tile_uq_thresholds[exp] + thresh_slide = slide_uq_thresholds[exp] + + val_patients = val_P.dataset(verification=None).patients() + + def get_metrics_by_level(level): + return threshold.apply( + tile_pred_df, + tile_uq=thresh_tile, + slide_uq=thresh_slide, + tile_pred=0.5, + slide_pred=pred_uq_thresholds[exp], + plot=(plot and level == 'slide' and keep == 'high_confidence' and exp == 'AA'), + title=f'{name}: Exp. {exp} Uncertainty', + keep=keep, # Keeps only LOW or HIGH-confidence slide predictions + patients=val_patients, + level=level + ) + + s_results, _ = get_metrics_by_level('slide') + p_results, _ = get_metrics_by_level('patient') + if (plot and keep == 'high_confidence' and exp == 'AA'): + plt.savefig(join(self.outdir, f'{name}_uncertainty_v_preds.svg')) + + full_model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + n_slides = len(sf.util.get_slides_from_model_manifest(full_model, dataset=None)) + eval_dfs[name] = pd.concat([eval_dfs[name], pd.DataFrame([{ + 'id': exp, + 'n_slides': n_slides, + 'uq': ('include' if keep == 'high_confidence' else 'exclude'), + 'slide_incl': s_results['percent_incl'], + 'slide_auc': s_results['auc'], + 'slide_acc': s_results['acc'], + 'slide_sens': s_results['sensitivity'], + 'slide_spec': s_results['specificity'], + 'slide_youden': s_results['sensitivity'] + s_results['specificity'] - 1, + 'patient_incl': p_results['percent_incl'], + 'patient_auc': p_results['auc'], + 'patient_acc': p_results['acc'], + 'patient_sens': p_results['sensitivity'], + 'patient_spec': p_results['specificity'], + 'patient_youden': p_results['sensitivity'] + p_results['specificity'] - 1, + }])], axis=0, join='outer', ignore_index=True) + for eval_name in eval_dfs: + eval_dfs[eval_name].to_csv( + join(self.outdir, f'{eval_name}_results.csv'), + index=False + ) + else: + eval_dfs = None + df.to_csv(join(self.outdir, 'crossval_results.csv'), index=False) + return df, eval_dfs + + def run(self, exp_to_run, steps=None, hp='nature2022'): + """Trains the designated experiments. + + Args: + exp_to_run (dict): Dict containing experiment configuration, + as provided by config(). + steps (list(int)): Steps to run. Defaults to all steps, 1-6. + hp (slideflow.ModelParams, optional): Hyperparameters object. + Defaults to hyperparameters used for publication. + + Returns: + None + """ + + # === Initialize projects & prepare experiments =========================== + print(sf.util.bold("Initializing experiments...")) + P = self.train_project + eval_Ps = self.eval_projects + exp_annotations = join(P.root, 'experiments.csv') + if P.annotations != exp_annotations: + if not exists(exp_annotations): + shutil.copy(P.annotations, exp_annotations) + P.annotations = exp_annotations + exp_to_add = [ + e for e in exp_to_run + if f'include_{e}' not in pd.read_csv(exp_annotations).columns.tolist() + ] + for exp in exp_to_add: + self.add(exp_annotations, label=exp, **exp_to_run[exp]) + + full_epoch_exp = [e for e in exp_to_run if e in ('AA', 'A', 'D', 'G')] + + if hp == 'nature2022': + exp_hp = biscuit_hp.nature2022() + else: + exp_hp = hp + + # Configure steps to run + if steps is None: + steps = range(7) + + # === Step 1: Initialize full-epochs experiments ========================== + if 1 in steps: + print(sf.util.bold("[Step 1] Running full-epoch experiments...")) + exp_hp.epochs = [1, 3, 5, 10] + for exp in full_epoch_exp: + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 1 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 1] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}', + filters={f'include_{exp}': ['include']}, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 2: Run the rest of the experiments at the designated epoch ===== + if 2 in steps: + print(sf.util.bold("[Step 2] Running experiments at target epoch...")) + exp_hp.epochs = [1] + for exp in exp_to_run: + if exp in full_epoch_exp: + continue # Already done in Step 2 + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 2 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 2] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}', + filters={f'include_{exp}': ['include']}, + save_predictions=True, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 3: Run experiments with UQ & save predictions ================== + if 3 in steps: + print(sf.util.bold("[Step 3] Running experiments with UQ...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + val_k = [ + k for k in range(1, 4) + if not utils.model_exists(P, f'EXP_{exp}_UQ', outcome=self.outcome, kfold=k) + ] + if not len(val_k): + print(f'Skipping Step 3 for experiment {exp}; already done.') + continue + elif val_k != list(range(1, 4)): + print(f'[Step 3] Some k-folds done; running {val_k} for {exp}') + self.train( + hp=exp_hp, + label=f'EXP_{exp}_UQ', + filters={f'include_{exp}': ['include']}, + save_predictions=True, + splits=f'splits_{exp}.json', + val_k=val_k, + val_strategy='k-fold', + save_model=False + ) + + # === Step 4: Run nested UQ cross-validation ============================== + if 4 in steps: + print(sf.util.bold("[Step 4] Running nested UQ experiments...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + total_slides = exp_to_run[exp]['out2'] + exp_to_run[exp]['out1'] + if total_slides >= 50: + self.train_nested_cv( + hp=exp_hp, + label=f'EXP_{exp}_UQ', + val_strategy='k-fold' + ) + else: + print(f"[Step 4] Skipping UQ for {exp}, need >=50 slides") + + # === Step 5: Train models across full datasets =========================== + if 5 in steps: + print(sf.util.bold("[Step 5] Training across full datasets...")) + exp_hp.epochs = [1] + exp_hp.uq = True + for exp in exp_to_run: + if utils.model_exists(P, f'EXP_{exp}_FULL', outcome=self.outcome): + print(f'Skipping Step 5 for experiment {exp}; already done.') + else: + stop_batch = utils.find_cv_early_stop(P, f'EXP_{exp}', outcome=self.outcome, k=3) + print(f"Using detected early stop batch {stop_batch}") + self.train( + hp=exp_hp, + label=f'EXP_{exp}_FULL', + filters={f'include_{exp}': ['include']}, + save_model=True, + val_strategy='none', + steps_per_epoch_override=stop_batch + ) + + # === Step 6: External validation ======================================== + if 6 in steps: + for val_P in eval_Ps: + print(sf.util.bold(f"[Step 6] Running eval ({val_P.name})...")) + for exp in exp_to_run: + full_model = utils.find_model(P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1) + if utils.eval_exists(val_P, f'EXP_{exp}_FULL', outcome=self.outcome, epoch=1): + print(f'Skipping eval for experiment {exp}; already done.') + else: + filters = {self.outcome: [self.outcome1, self.outcome2]} + val_P.evaluate( + full_model, + self.outcome, + filters=filters, + save_predictions=True, + ) + + def thresholds_from_nested_cv(self, label, outer_k=3, inner_k=5, id=None, + threshold_params=None, epoch=1, + tile_filename='tile_predictions_val_epoch1.csv', + y_true=None, y_pred=None, uncertainty=None): + """Detects tile- and slide-level UQ thresholds and slide-level prediction + thresholds from nested cross-validation.""" + + if id is None: + id = label + patients = self.train_project.dataset(verification=None).patients() + if threshold_params is None: + threshold_params = { + 'tile_pred': 'detect', + 'slide_pred': 'detect', + 'plot': False, + 'patients': patients + } + all_tile_uq_thresh = [] + all_slide_uq_thresh = [] + all_slide_pred_thresh = [] + df = pd.DataFrame() + for k in range(1, outer_k+1): + + try: + dfs = utils.df_from_cv( + self.train_project, + f'{label}-k{k}', + outcome=self.outcome, + k=inner_k, + y_true=y_true, + y_pred=y_pred, + uncertainty=uncertainty) + except ModelNotFoundError: + log.warn(f"Could not find {label} k-fold {k}; skipping") + continue + + val_path = join( + utils.find_model(self.train_project, f'{label}', kfold=k, outcome=self.outcome), + tile_filename + ) + if not exists(val_path): + log.warn(f"Could not find {label} k-fold {k}; skipping") + continue + tile_uq = threshold.from_cv( + dfs, + tile_uq='detect', + slide_uq=None, + **threshold_params + )['tile_uq'] + thresholds = threshold.from_cv( + dfs, + tile_uq=tile_uq, + slide_uq='detect', + **threshold_params + ) + all_tile_uq_thresh += [tile_uq] + all_slide_uq_thresh += [thresholds['slide_uq']] + all_slide_pred_thresh += [thresholds['slide_pred']] + if sf.util.path_to_ext(val_path).lower() == 'csv': + tile_pred_df = pd.read_csv(val_path, dtype={'slide': str}) + elif sf.util.path_to_ext(val_path).lower() in ('parquet', 'gzip'): + tile_pred_df = pd.read_parquet(val_path) + else: + raise OSError(f"Unrecognized prediction filetype {val_path}") + utils.rename_cols(tile_pred_df, self.outcome, y_true=y_true, y_pred=y_pred, uncertainty=uncertainty) + + def uq_auc_by_level(level): + results, _ = threshold.apply( + tile_pred_df, + plot=False, + patients=patients, + level=level, + **thresholds + ) + return results['auc'], results['percent_incl'] + + pt_auc, pt_perc = uq_auc_by_level('patient') + slide_auc, slide_perc = uq_auc_by_level('slide') + model = utils.find_model( + self.train_project, + f'{label}', + kfold=k, + epoch=1, + outcome=self.outcome + ) + m_slides = sf.util.get_slides_from_model_manifest(model, dataset=None) + df = pd.concat([df, pd.DataFrame([{ + 'id': id, + 'n_slides': len(m_slides), + 'fold': k, + 'uq': 'include', + 'patient_auc': pt_auc, + 'patient_uq_perc': pt_perc, + 'slide_auc': slide_auc, + 'slide_uq_perc': slide_perc + }])], axis=0, join='outer', ignore_index=True) + + thresholds = { + 'tile_uq': None if not all_tile_uq_thresh else mean(all_tile_uq_thresh), + 'slide_uq': None if not all_slide_uq_thresh else mean(all_slide_uq_thresh), + 'slide_pred': None if not all_slide_pred_thresh else mean(all_slide_pred_thresh), + } + return df, thresholds + + def train(self, hp, label, filters=None, save_predictions='csv', + validate_on_batch=32, validation_steps=32, **kwargs): + r"""Train outer cross-validation models. + + Args: + hp (:class:`slideflow.ModelParams`): Hyperparameters object. + label (str): Experimental label. + filters (dict, optional): Dataset filters to use for + selecting slides. See :meth:`slideflow.Dataset.filter` for + more information. Defaults to None. + save_predictions (bool, optional): Save validation predictions to + model folder. Defaults to 'csv'. + + Keyword args: + validate_on_batch (int): Frequency of validation checks during + training, in steps. Defaults to 32. + validation_steps (int): Number of validation steps to perform + during each mid-training evaluation check. Defaults to 32. + **kwargs: All remaining keyword arguments are passed to + :meth:`slideflow.Project.train`. + + Returns: + None + """ + self.train_project.train( + self.outcome, + exp_label=label, + filters=filters, + params=hp, + save_predictions=save_predictions, + validate_on_batch=validate_on_batch, + validation_steps=validation_steps, + **kwargs + ) + + def train_nested_cv(self, hp, label, outer_k=3, inner_k=5, **kwargs): + r"""Train models using nested cross-validation (outer_k=3, inner_k=5), + skipping already-generated models. + + Args: + hp (slideflow.ModelParams): Hyperparameters object. + label (str): Experimental label. + + Keyword args: + outer_k (int): Number of outer cross-folds. Defaults to 3. + inner_k (int): Number of inner cross-folds. Defaults to 5. + **kwargs: All remaining keyword arguments are passed to + :meth:`slideflow.Project.train`. + + Returns: + None + """ + k_models = utils.find_cv(self.train_project, label, k=outer_k, outcome=self.outcome) + for ki, k_model in enumerate(k_models): + inner_k_to_run = [ + k for k in range(1, inner_k+1) + if not utils.model_exists(self.train_project, f'{label}-k{ki+1}', outcome=self.outcome, kfold=k) + ] + if not len(inner_k_to_run): + print(f'Skipping nested cross-val (inner k{ki+1} for experiment ' + f'{label}; already done.') + else: + if inner_k_to_run != list(range(1, inner_k+1)): + print(f'Only running k-folds {inner_k_to_run} for nested ' + f'cross-val k{ki+1} in experiment {label}; ' + 'some k-folds already done.') + train_slides = sf.util.get_slides_from_model_manifest( + k_model, dataset='training' + ) + self.train( + hp=hp, + label=f"{label}-k{ki+1}", + filters={'slide': train_slides}, + val_k_fold=inner_k, + val_k=inner_k_to_run, + save_predictions=True, + save_model=False, + **kwargs + )
+
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_modules/slideflow_noncommercial/biscuit/utils/index.html b/docs/_modules/slideflow_noncommercial/biscuit/utils/index.html new file mode 100644 index 000000000..4251240a3 --- /dev/null +++ b/docs/_modules/slideflow_noncommercial/biscuit/utils/index.html @@ -0,0 +1,910 @@ + + + + + + + + + + + + slideflow_noncommercial.biscuit.utils — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +

Source code for slideflow_noncommercial.biscuit.utils

+import os
+from os.path import join
+from statistics import mean, variance
+
+import warnings
+import matplotlib.colors as colors
+import numpy as np
+import pandas as pd
+import slideflow as sf
+from scipy import stats
+from sklearn import metrics
+from sklearn.exceptions import UndefinedMetricWarning
+
+from .delong import delong_roc_variance
+from .errors import ModelNotFoundError, MultipleModelsFoundError
+
+# -----------------------------------------------------------------------------
+
+def uncertainty_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'uncertainty1'
+
+
+def y_true_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'y_true0'
+
+
+def y_pred_header(outcome, underscore=False):
+    return str(outcome) + ('_' if underscore else '-') + 'y_pred1'
+
+
+def rename_cols(df, outcome, *, y_true=None, y_pred=None, uncertainty=None):
+    """Renames columns of dataframe, in place."""
+    # Support for using underscore or dashes
+    if y_true is None:
+        y_true = y_true_header(
+            outcome,
+            underscore=(y_true_header(outcome, underscore=True) in df.columns))
+        if y_true not in df.columns:
+            y_true = str(outcome) + '-y_true'
+    if y_pred is None:
+        y_pred = y_pred_header(
+            outcome,
+            underscore=(y_pred_header(outcome, underscore=True) in df.columns))
+    if uncertainty is None:
+        uncertainty = uncertainty_header(
+            outcome,
+            underscore=(uncertainty_header(outcome, underscore=True) in df.columns))
+    new_cols = {
+        y_true: 'y_true',
+        y_pred: 'y_pred',
+        uncertainty: 'uncertainty'
+    }
+    df.rename(columns=new_cols, inplace=True)
+
+# --- General utility functions -----------------------------------------------
+
+def truncate_colormap(cmap, minval=0.0, maxval=1.0, n=100):
+    """Truncates matplotlib colormap."""
+
+    new_cmap = colors.LinearSegmentedColormap.from_list(
+        'trunc({n},{a:.2f},{b:.2f})'.format(n=cmap.name, a=minval, b=maxval),
+        cmap(np.linspace(minval, maxval, n)))
+    return new_cmap
+
+
+
[docs]def get_model_results(path, epoch, outcome): + """Reads results/metrics from a trained model. + + Args: + path (str): Path to model. + outcome (str): Outcome name. + + Returns: + Dict of results with the keys: pt_auc, pt_ap, slide_auc, slide_ap, + tile_auc, tile_ap, opt_thresh + """ + csv = pd.read_csv(join(path, 'results_log.csv')) + result_rows = {} + for i, row in csv.iterrows(): + try: + row_epoch = int(row['model_name'].split('epoch')[-1]) + except ValueError: + continue + result_rows.update({ + row_epoch: row + }) + if epoch not in result_rows: + raise ModelNotFoundError(f"Unable to find results for epoch {epoch}") + model_res = result_rows[epoch] + pt_ap = mean(eval(model_res['patient_ap'])[outcome]) + pt_auc = eval(model_res['patient_auc'])[outcome][0] + slide_ap = mean(eval(model_res['slide_ap'])[outcome]) + slide_auc = eval(model_res['slide_auc'])[outcome][0] + tile_ap = mean(eval(model_res['tile_ap'])[outcome]) + tile_auc = eval(model_res['tile_auc'])[outcome][0] + + pred_path = join( + path, + f'patient_predictions_{outcome}_val_epoch{epoch}.csv' + ) + if os.path.exists(pred_path): + _, opt_thresh = auc_and_threshold(*read_group_predictions(pred_path)) + else: + try: + parquet_path = join(path, 'patient_predictions_val_epoch1.parquet.gzip') + _, opt_thresh = auc_and_threshold(*read_group_predictions(parquet_path)) + except OSError: + opt_thresh = None + return { + 'pt_auc': pt_auc, + 'pt_ap': pt_ap, + 'slide_auc': slide_auc, + 'slide_ap': slide_ap, + 'tile_auc': tile_auc, + 'tile_ap': tile_ap, + 'opt_thresh': opt_thresh + }
+ + +def get_eval_results(path, outcome): + """Reads results/metrics from a trained model. + + Args: + path (str): Path to model. + outcome (str): Outcome name. + + Returns: + Dict of results with the keys: pt_auc, pt_ap, slide_auc, slide_ap, + tile_auc, tile_ap, opt_thresh + """ + csv = pd.read_csv(join(path, 'results_log.csv')) + for i, row in csv.iterrows(): + model_res = row + pt_ap = mean(eval(model_res['patient_ap'])[outcome]) + pt_auc = eval(model_res['patient_auc'])[outcome][0] + slide_ap = mean(eval(model_res['slide_ap'])[outcome]) + slide_auc = eval(model_res['slide_auc'])[outcome][0] + tile_ap = mean(eval(model_res['tile_ap'])[outcome]) + tile_auc = eval(model_res['tile_auc'])[outcome][0] + + pred_path = join( + path, + f'patient_predictions_{outcome}_eval.csv' + ) + if os.path.exists(pred_path): + _, opt_thresh = auc_and_threshold(*read_group_predictions(pred_path)) + else: + try: + parquet_path = join(path, 'patient_predictions_eval.parquet.gzip') + _, opt_thresh = auc_and_threshold(*read_group_predictions(parquet_path)) + except OSError: + opt_thresh = None + return { + 'pt_auc': pt_auc, + 'pt_ap': pt_ap, + 'slide_auc': slide_auc, + 'slide_ap': slide_ap, + 'tile_auc': tile_auc, + 'tile_ap': tile_ap, + 'opt_thresh': opt_thresh + } + + +def find_cv_early_stop(project, label, outcome, k=3): + """Detects early stop batch from cross-val trained models. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + k (int, optional): Number of k-fold iterations. Defaults to 3. + outcome (str): Outcome name. + + Returns: + int: Early stop batch. + """ + cv_folders = find_cv(project, label, k=k, outcome=outcome) + early_stop_batch = [] + for cv_folder in cv_folders: + csv = pd.read_csv(join(cv_folder, 'results_log.csv')) + model_res = next(csv.iterrows())[1] + if 'early_stop_batch' in model_res: + early_stop_batch += [model_res['early_stop_batch']] + if len(early_stop_batch) == len(cv_folders): + # Only returns early stop if it was triggered in all crossfolds + return round(mean(early_stop_batch)) + else: + return None + + +def df_from_cv(project, label, outcome, epoch=None, k=3, y_true=None, + y_pred=None, uncertainty=None): + """Loads tile predictions from cross-fold models & renames columns. + + Args: + project (sf.Project): Slideflow project. + label (str): Experimental label. + epoch (int, optional): Epoch number of saved model. Defaults to None. + k (int, optional): K-fold iteration. Defaults to 3. + outcome (str, optional): Outcome name. + y_true (str, optional): Column name for ground truth labels. + Defaults to {outcome}_y_true0. + y_pred (str, optional): Column name for predictions. + Defaults to {outcome}_y_pred1. + uncertainty (str, optional): Column name for uncertainty. + Defaults to {outcome}_y_uncertainty1. + + Returns: + list(DataFrame): DataFrame for each k-fold. + """ + dfs = [] + model_folders = find_cv(project, label, epoch=epoch, k=k, outcome=outcome) + patients = project.dataset().patients() + e = '' if epoch is None else '../' + + for folder in model_folders: + csv_path = join(folder, f'{e}tile_predictions_val_epoch1.csv') + parquet_path = join(folder, f'{e}tile_predictions_val_epoch1.parquet.gzip') + if os.path.exists(csv_path): + df = pd.read_csv(csv_path) + elif os.path.exists(parquet_path): + df = pd.read_parquet(parquet_path) + else: + raise OSError(f"Could not find tile predictions file at {folder}") + rename_cols(df, outcome, y_true=y_true, y_pred=y_pred, uncertainty=uncertainty) + if 'patient' not in df.columns: + df['patient'] = df['slide'].map(patients) + dfs += [df] + return dfs + + +# --- Utility functions for finding experiment models ------------------------- + +def find_model(project, label, outcome, epoch=None, kfold=None): + """Searches for a model in a project model directory. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str): Outcome name. + epoch (int, optional): Epoch to search for. If not None, returns + path to the saved model. If None, returns path to parent model + folder. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + + Raises: + MultipleModelsFoundError: If multiple potential matches are found. + ModelNotFoundError: If no matching model is found. + + Returns: + str: Path to matching model. + """ + tail = '' if kfold is None else f'-kfold{kfold}' + model_name = f'{outcome}-{label}-HP0{tail}' + matching = [ + o for o in os.listdir(project.models_dir) + if o[6:] == model_name + ] + if len(matching) > 1: + raise MultipleModelsFoundError("Multiple matching models found " + f"matching {model_name}") + elif not len(matching): + raise ModelNotFoundError("No matching model found matching " + f"{model_name}.") + elif epoch is not None: + return join( + project.models_dir, + matching[0], + f'{outcome}-{label}-HP0{tail}_epoch{epoch}' + ) + else: + return join(project.models_dir, matching[0]) + + +def model_exists(project, label, outcome, epoch=None, kfold=None): + """Check if matching model exists. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + Returns: + bool: If model exists + """ + try: + find_model(project, label, outcome, kfold=kfold, epoch=epoch) + return True + except ModelNotFoundError: + return False + + +
[docs]def find_cv(project, label, outcome, epoch=None, k=3): + """Finds paths to cross-validation models. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + kfold (int, optional): K-fold iteration. Defaults to None. + + Returns: + list(str): Paths to cross-validation models. + """ + return [ + find_model(project, label, outcome, epoch=epoch, kfold=_k) + for _k in range(1, k+1) + ]
+ + +def find_eval(project, label, outcome, epoch=1): + """Finds matching eval directory. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + outcome (str, optional): Outcome name. + epoch (int, optional): Epoch number of saved model. Defaults to None. + + + Raises: + MultipleModelsFoundError: If multiple matches are found. + ModelNotFoundError: If no match is found. + + Returns: + str: path to eval directory + """ + matching = [ + o for o in os.listdir(project.eval_dir) + if o[11:] == f'{outcome}-{label}-HP0_epoch{epoch}' + ] + if len(matching) > 1: + raise MultipleModelsFoundError("Multiple matching eval experiments " + f"found for label {label}") + elif not len(matching): + raise ModelNotFoundError(f"No matching eval found for label {label}") + else: + return join(project.eval_dir, matching[0]) + + +def eval_exists(project, label, outcome, epoch=1): + """Check if matching eval exists. + + Args: + project (slideflow.Project): Project. + label (str): Experimental label. + epoch (int, optional): Epoch number of saved model. Defaults to None. + + Returns: + bool: If eval exists + """ + try: + find_eval(project, label, outcome, epoch=epoch) + return True + except ModelNotFoundError: + return False + + +# --- Thresholding and metrics functions -------------------------------------- + +def read_group_predictions(path): + '''Reads patient- or slide-level predictions CSV or parquet file, + returning y_true and y_pred. + + Expects a binary categorical outcome. + + Compatible with Slideflow 1.1 and 1.2. + ''' + if not os.path.exists(path): + raise OSError(f"Could not find predictions file at {path}") + if sf.util.path_to_ext(path).lower() == 'csv': + df = pd.read_csv(path) + elif sf.util.path_to_ext(path).lower() in ('parquet', 'gzip'): + df = pd.read_parquet(path) + else: + raise ValueError(f"Unrecognized extension for prediction file {path}") + if 'y_true1' in df.columns: + y_true = df['y_true1'].to_numpy() + else: + y_true_cols = [c for c in df.columns if c.endswith('y_true')] + if len(y_true_cols) == 1: + y_true = df[y_true_cols[0]].to_numpy() + else: + raise ValueError(f"Could not find y_true column at {path}") + if 'percent_tiles_positive1' in df.columns: + y_pred = df['percent_tiles_positive1'].to_numpy() + else: + y_pred_cols = [c for c in df.columns if 'y_pred' in c] + if len(y_pred_cols) == 2: + y_pred = df[y_pred_cols[1]].to_numpy() + else: + raise ValueError(f"Expected exactly 2 y_pred columns at {path}; " + f"got {len(y_pred_cols)}") + return y_true, y_pred + + +def prediction_metrics(y_true, y_pred, threshold): + """Calculate prediction metrics (AUC, sensitivity/specificity, etc) + + Args: + y_true (np.ndarray): True labels. + y_pred (np.ndarray): Predictions. + threshold (_type_): Prediction threshold. + + Returns: + dict: Prediction metrics. + """ + yt = y_true.astype(bool) + yp = y_pred > threshold + + alpha = 0.05 + z = stats.norm.ppf((1 - alpha/2)) + tp = np.logical_and(yt, yp).sum() + fp = np.logical_and(np.logical_not(yt), yp).sum() + tn = np.logical_and(np.logical_not(yt), np.logical_not(yp)).sum() + fn = np.logical_and(yt, np.logical_not(yp)).sum() + acc = (tp + tn) / (tp + tn + fp + fn) + sensitivity = tp / (tp + fn) + specificity = tn / (tn + fp) + + # Youden's confidence interval, via BAC (bootstrap AC estimate) + # Bootstrapping performed with sample size n = 100 and iterations B = 500 + all_jac = [] + for _ in range(500): + bootstrap_i = np.random.choice(np.arange(yt.shape[0]), size=(150,)) + _yt = yt[bootstrap_i] + _yp = yp[bootstrap_i] + _tp = np.logical_and(_yt, _yp).sum() + _fp = np.logical_and(np.logical_not(_yt), _yp).sum() + _tn = np.logical_and(np.logical_not(_yt), np.logical_not(_yp)).sum() + _fn = np.logical_and(_yt, np.logical_not(_yp)).sum() + _jac = (((_tn + 0.5 * z**2) / (_tn + _fp + z**2)) + - ((_fn + 0.5 * z**2) / (_fn + _tp + z**2))) + all_jac += [_jac] + + jac = mean(all_jac) + jac_var = variance(all_jac) + jac_low = jac - z * np.sqrt(jac_var) + jac_high = jac + z * np.sqrt(jac_var) + + # AUC confidence intervals + if not np.array_equal(np.unique(y_true), [0, 1]): + sf.util.log.warn("Unable to calculate CI; NaNs exist") + ci = [None, None] + else: + delong_auc, auc_cov = delong_roc_variance(y_true, y_pred) + auc_std = np.sqrt(auc_cov) + lower_upper_q = np.abs(np.array([0, 1]) - alpha / 2) + ci = stats.norm.ppf(lower_upper_q, loc=delong_auc, scale=auc_std) + ci[ci > 1] = 1 + + return { + 'auc_low': ci[0], + 'auc_high': ci[1], + 'acc': acc, + 'sens': sensitivity, + 'spec': specificity, + 'youden': sensitivity + specificity - 1, + 'youden_low': jac_low, + 'youden_high': jac_high, + } + + +def auc_and_threshold(y_true, y_pred): + """Calculates AUC and optimal threshold (via Youden's J) + + Args: + y_true (np.ndarray): Y true (labels). + y_pred (np.ndarray): Y pred (predictions). + + Returns: + float: AUC + float: Optimal threshold + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred) + roc_auc = metrics.auc(fpr, tpr) + max_j = max(zip(tpr, fpr), key=lambda x: x[0]-x[1]) + optimal_threshold = threshold[list(zip(tpr, fpr)).index(max_j)] + return roc_auc, optimal_threshold + + +def auc(y_true, y_pred): + """Calculate Area Under Receiver Operator Curve (AUC / AUROC) + + Args: + y_true (np.ndarray): True labels. + y_pred (np.ndarray): Predictions. + + Returns: + Float: AUC + """ + with warnings.catch_warnings(): + warnings.simplefilter("ignore", category=UndefinedMetricWarning) + try: + fpr, tpr, threshold = metrics.roc_curve(y_true, y_pred) + return metrics.auc(fpr, tpr) + except ValueError: + sf.util.log.warn("Unable to calculate ROC") + return np.nan +
+ +
+ +
+
+ + + + +
+ + + +
+

+ © Copyright 2023, James M Dolezal. + +

+
+ +
+ Built with Sphinx using a theme provided by Read the Docs. +
+ + +
+ +
+
+ +
+
+
+ +
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/_sources/appendix.rst.txt b/docs/_sources/appendix.rst.txt deleted file mode 100644 index 987a7d6d2..000000000 --- a/docs/_sources/appendix.rst.txt +++ /dev/null @@ -1,28 +0,0 @@ -Appendix -======== - -Model Architecture -************************* - -All standard models available in tf.keras.applications (Tensorflow) and torchvision (PyTorch) can be trained. Custom models can also be trained by importing the model and setting the :mod:`slideflow.model.ModelParams.model` parameter equal to the model class. - -Model inputs are an X by X by 3 array of standardized image data (R, G, and B image data layers converted to floats with range 0 -> 1). If desired, the core model is initialized with pre-trained weights, either from ImageNet or from a pre-trained model specified by the user. - -The model core is then optionally connected to an additional set of fully-connected hidden layers as specified in the hyperparameter options, which then connects to outputs with softmax (categorical models) or linear (linear models) activations. - -.. _balancing: - -A Note on Input Balancing -************************* - -When training, it is important to consider whether category-level balancing should be performed on your input in order to reduce bias against sparse categories. There is no established best practice for input balancing when training on histology images; the balancing method you choose to use is up to you. - -Suppose you have five slides, labeled A through E. Slides A and B belong to category 1, while C, D, E belong to category 2. Let's suppose tumors in all the slides are roughly the same physical size, except for B which is three times as large. - -You perform tile extraction, and all the patients except B produce roughly the same number of image tiles. The training optimizer is ready for the next batch of images. Let’s say the batch size is 32. How does it select the next 32 images? - -If **tile-level balancing** ("tile") is used, tiles will be selected randomly. Because slide B has so many more tiles than the other slides, B will be over-represented in the batch. This means that the model will inherently learn a bias towards patient B. If patients like patient B are truly of greater prevalence in the real-world population, this is fine; the model is learning an appropriate bias. Otherwise, it is learning a bias which will hurt the model’s generalizability, which will result in poor performance on our test set. - -If **patient-based balancing** ("patient") is used, the input stream will balance tiles in a given batch across the patients. Now the model has no bias towards any given patient. However, you’ll notice that category 1 (patients A and B) only has 13 tiles, whereas category 2 (patients C, D, and E) has 19 tiles. With this type of balancing, models will learn bias towards categories with more patients (in this case category 2). - -If **category-based balancing** ("category") is used, the input stream balances tiles based on the category. There are now an equal number of tiles from category 1 and category 2, 16 from both. We are still unbalanced within category 1, as slide B has more tiles than slide A. However, because this unbalance is not occurring between categories, which is what the algorithm is training on, the bias effect is less prominent. The algorithm will expect category 1 to look more like slide B than slide A, but it is not clear whether this is avoidable. Unless you dispose of excess tiles, your model will be exposed to more tiles from B than from A, whether it happens on a per-batch basis or throughout its training across epochs. diff --git a/docs/_sources/biscuit.rst.txt b/docs/_sources/biscuit.rst.txt new file mode 100644 index 000000000..3d1ddeea3 --- /dev/null +++ b/docs/_sources/biscuit.rst.txt @@ -0,0 +1,66 @@ +.. currentmodule:: slideflow.biscuit + +slideflow.biscuit +================= + +This module contains an official implementation of `BISCUIT `__, an uncertainty quantification and confidence thresholding algorithm for whole-slide images. The original implementation, which includes instructions for reproducing experimental results reported in the manuscript, is available on `GitHub `__. + +This module is requires the ``slideflow-noncommercial`` package, which can be installed with: + +.. code-block:: bash + + pip install slideflow-noncommercial + +See :ref:`uncertainty` for more information. + +.. autofunction:: find_cv +.. autofunction:: get_model_results + +biscuit.Experiment +****************** +.. autoclass:: Experiment +.. autofunction:: slideflow.biscuit.Experiment.display +.. autofunction:: slideflow.biscuit.Experiment.plot_uq_calibration +.. autofunction:: slideflow.biscuit.Experiment.results +.. autofunction:: slideflow.biscuit.Experiment.thresholds_from_nested_cv +.. autofunction:: slideflow.biscuit.Experiment.train +.. autofunction:: slideflow.biscuit.Experiment.train_nested_cv + +biscuit.hp +********** + +.. autofunction:: slideflow.biscuit.hp.nature2022 + +biscuit.threshold +***************** +.. autofunction:: slideflow.biscuit.threshold.apply +.. autofunction:: slideflow.biscuit.threshold.detect +.. autofunction:: slideflow.biscuit.threshold.from_cv +.. autofunction:: slideflow.biscuit.threshold.plot_uncertainty +.. autofunction:: slideflow.biscuit.threshold.process_group_predictions +.. autofunction:: slideflow.biscuit.threshold.process_tile_predictions + +biscuit.utils +************* + +.. autofunction:: slideflow.biscuit.utils.auc +.. autofunction:: slideflow.biscuit.utils.auc_and_threshold +.. autofunction:: slideflow.biscuit.utils.df_from_cv +.. autofunction:: slideflow.biscuit.utils.eval_exists +.. autofunction:: slideflow.biscuit.utils.find_cv +.. autofunction:: slideflow.biscuit.utils.find_cv_early_stop +.. autofunction:: slideflow.biscuit.utils.find_eval +.. autofunction:: slideflow.biscuit.utils.find_model +.. autofunction:: slideflow.biscuit.utils.get_model_results +.. autofunction:: slideflow.biscuit.utils.get_eval_results +.. autofunction:: slideflow.biscuit.utils.model_exists +.. autofunction:: slideflow.biscuit.utils.prediction_metrics +.. autofunction:: slideflow.biscuit.utils.read_group_predictions +.. autofunction:: slideflow.biscuit.utils.truncate_colormap + +biscuit.delong +************** + +.. autofunction:: slideflow.biscuit.delong.fastDeLong +.. autofunction:: slideflow.biscuit.delong.delong_roc_variance +.. autofunction:: slideflow.biscuit.delong.delong_roc_test diff --git a/docs/_sources/cellseg.rst.txt b/docs/_sources/cellseg.rst.txt new file mode 100644 index 000000000..b10794725 --- /dev/null +++ b/docs/_sources/cellseg.rst.txt @@ -0,0 +1,292 @@ +.. currentmodule:: slideflow.cellseg + +.. _cellseg: + +Cell Segmentation +================= + +Many tasks in digital pathology rely on analysis of cellular features, as opposed to higher-level architectural features. Slideflow supports whole-slide analysis of cellular features with a cell detection and segmentation pipeline based on `Cellpose `_. To start, ensure ``cellpose`` has been installed via pip: + +.. code-block:: bash + + pip install cellpose + +Approach +******** + +.. figure:: cell_segmentation.png + +The general approach for cell detection and segmentation in Slideflow is illustrated above, and will be discussed in the following sections. In short, the general approach is to tune the cell segmentation parameters on a single slide, use these parameters to detect cells in all of your slides, then extract cell images at these locations. + +Slideflow Studio +***************** + +Cellpose models have several configurable parameters which will affect the quality of your segmentation masks, namely the **pretrained model** and **cell diameter**. The best way to determine the optimal parameters to use for your dataset is through interactive visualization using :ref:`Slideflow Studio `. + +Use Cellpose-based cell segmentation in Slideflow Studio by :ref:`enabling the extension `, or start Studio with the ``--cellpose`` flag: + +.. code-block:: bash + + python -m slideflow.studio --cellpose + +Control panel +------------- + +Open the Cell Segmentation section in the control panel to access the segmentation controls. + +.. figure:: cellseg_workbench_panel.png + +The **Model & Cell Diameter** subsection is used to customize the segmentation model (defaults to +'cyto2') and cell diameter (defaults to 10 microns). Selecting "Auto-detect diameter" then +clicking "Preview" will perform cell segmentation on the portion of the slide currently in view. Once complete, the diameter text box will be updated with the detected cell diameter. Any `user-trained models `_ will be listed in the model dropdown selection. + +Viewing cell segmentations +-------------------------- + +.. figure:: cellseg_workbench_masks.png + +The **View Controls** subsection provides options for customizing how cell segmentations are displayed. By default, cell segmentation masks are shown in cyan on a black background. The black +background can be removed by unchecking "Black BG". You can add a green dot at each cell's detected centroid by selecting the "Centroid option." The "Alpha" slider controls transparency for the mask overlay. + +You can also choose to view the segmentation masks as outlines. The "Outline" button will +convert any masks currently in view to outlines, allowing you to more easily see how the +masks match cells visible on the slide. + +.. figure:: cellseg_workbench_outlines.png + +Finally, the "gradXY" option will show the flow gradients calculated during cell segmentation. + +.. figure:: cellseg_workbench_flows.png + +Preparing WSI segmentation +-------------------------- + +Once you are satisifed with a chosen model and cell diameter, set the cell diameter to a +manual value in microns. Once the cell diameter has been set, the middle control panel will +activate, allowing you to perform whole-slide segmentation. + +The **Otsu threshold** option will perform strict Otsu's thresholding on the whole slide image, +only performing cell segmentation in non-background areas (reducing computational time). +You can preview the Otsu's thresholding algorithm in the :ref:`Slide section `. This option is disabled by default, as Otsu's thresholding does not +work well for all slides (particularly cytology slides). + +The **Save flows** option saves gradients during cell segmentation, allowing you to generate +visualizations as shown with the **gradXY** option above. This is disabled by default, as +calculation requires high RAM usage and may not be practical on all systems. + +.. list-table:: + :widths: 60 40 + + * - The **Advanced** subsection provides additional options for controlling the cell segmentation process. + + **Window** controls the window size during cell segmentation; cell segmentation is performed + on images of this pixel size and then stitched together. The **Tile** option permits further sub- + tiling of each window, reducing GPU and CPU memory utilization. + + **Downscale** will scale down the final generated cell segmentation mask, reducing memory + utilization (both RAM and disk). **Enable spawn workers** enables a multiprocessing technique that improves cell segmentation speed at the cost of higher RAM usage. + + - .. image:: cellseg_workbench_advanced.png + :width: 245 + :align: right + +Running WSI segmentation +------------------------ + +Once you are satisifed with the settings, whole-slide cell segmentation can be initialized by +clicking **Segment**. You will see a notification in the bottom-right corner of the screen when +segmentation is complete. In the meantime, a progress bar will be shown in the terminal +along with ETA. + +Exporting results +----------------- + +Once segmentation is complete, masks can be saved to disk for later use with **Export**. +Masks are saved in \*.zip format, and can be loaded in Studio with drag-and-drop. + +Segmenting cells +**************** + +Single slide segmentation +------------------------- + +Once the segmentation parameters have been determined, you can run segmentation for a single slide using :func:`slideflow.cellseg.segment_slide`. + +.. code-block:: + + import slideflow as sf + from slideflow.cellseg import segment_slide + + segmentation = segment_slide( + '.../slide.svs', + model='cyto2', + diam_um=10, + ... + ) + segmentation.save('...masks.zip') + +Project-wide segmentation +------------------------- + +Cell segmentation can also be performed automatically for all slides in a Slideflow project. +Cell segmentation masks (and associated cell centroids) are calculated for all slides in the project using :meth:`slideflow.Project.cell_segmentation`. + +.. code-block:: + + import slideflow as sf + + # Load a slideflow project + P = sf.Project(...) + + # Perform cell segmentation + P.cell_segmentation( + model='cyto2', + diam_um=10 + ) + +Relevant arguments for this function include: + +- ``model`` : Cell segmentation model. All cellpose models are supported, including 'cyto', + 'cyto2', 'nuclei', and more. +- ``diam_um`` : Cell diameter, in microns. +- ``buffer`` : Path to a buffer, significantly speeds up segmentation if running from a HDD + (same as P.extract_tiles()) +- ``window_size`` : Integer. Defaults to 256. Increasing this to 512 will make things slightly + faster, but will use a bit more GPU memory. +- ``downscale`` : Factor by which to downscale the masks, to save memory. Defaults to 1 + (no downscaling, full resolution). Downscale of 2 is a nice balance between memory + size and fidelity. + +Depending on the size of the slide, this may take between 5-25 minutes per slide. + +Masks will be saved in the project subfolder ``masks/`` . As described above, +these masks can be loaded in Studio for interactive visualization via drag-and-drop. +They can also be used for downstream analysis and cell extraction, as described in the next +section. + +Accessing segmentation masks +---------------------------- + +Saved cell segmentation masks (in \*.zip format) can be loaded with :class:`slideflow.cellseg.Segmentation`. + +.. code-block:: python + + from slideflow.cellseg import Segmentation + seg = Segmentation.load('.../slide-masks.zip') + +The mask array, ``Segmentation.masks`` , is a ``np.ndarray`` with dtype of np.uint32. Zero values are background, and masks for each cell are represented by a unique integer. Flows/gradients, +if calculated, will be available in ``Segmentation.flows``. + +Centroids for detected cells can be calculated with Segmentation.centroids(), returning an array of centroid locations. By default, coordinates are returned in mask dimension space. With the argument ``wsi_dim=True``, centroid coordinates will be in the slide dimension space. + +Caveats +------- + +There are some caveats to the cell segmentation process, including: + +- **Memory usage**: Cell segmentation requires at minimum 32 GB of RAM. Larger slides (particularly cytology) may require up to 64 GB of RAM. +- **Stitching artifacts**: At present, due to the algorithm by which whole-slide cell segmentations are stitched together, you may see some cells that are not detected, missing in a grid-like pattern. Work is ongoing to reduce these stitching artifacts. +- **Cell diameter**: The quality of cell segmentation results is highly dependent on an appropriately chosen cell diameter. Use Slideflow Studio to find the best cell diameter for your application. + +Extracting cells from slides +**************************** + +Once segmentation masks have been calculated, images of individual cells can be extracted from a whole-slide image. This can be performed for either a single slide, or all slides in a project. + +From a single slide +------------------- + +Start by loading the saved segmentation, as described above. Then, use :meth:`slideflow.WSI.apply_segmentation`, followed by :meth:`slideflow.WSI.extract_cells`. + +.. code-block:: python + + import slideflow as sf + from slideflow.cellseg import Segmentation + + # Load WSI. + wsi = sf.WSI('../slide.svs', tile_px=96, tile_um='40x') + + # Load cell segmentations. + seg = Segmentation.load('.../slide-masks.zip') + + # Apply segmentations to the slide. + wsi.apply_segmentation(seg) + + # Extract images of cells. + wsi.extract_cells(tiles_dir=...) + + +.. list-table:: + :widths: 80 20 + + * - By default, segmentation masks will be applied to the extracted cell images: + + - .. image:: cell_masked.png + + * - However, you can choose not to apply masks by using the argument ``apply_masks=False``. + + + - .. image:: cell_unmasked.png + +Tile extraction is then performed as usual. Cell images (tiles) can either be saved as loose images or in TFRecord format. See :meth:`slideflow.WSI.extract_cells` for more information. + +From all slides +--------------- + +Additionally, cell images can be extracted from all slides in a project. This should only be +done after :meth:`slideflow.Project.cell_segmentation`. + +.. code-block:: python + + P.extract_cells( + tile_px=96, + tile_um='40x', + apply_masks=True + ) + +Extracted cell images are saved by default in TFRecord format, and are otherwise handled +identically to tile images generated through :meth:`slideflow.Project.extract_tiles`. + +Complete example +**************** + +An example of a complete cell segmentation pipeline is shown below, from parameter tuning +to final tile extraction from detected cells. + +1. Slideflow Studio +------------------- + +Determine optimal cell segmenation parameters using Studio, as described above: + +.. code-block:: bash + + python -m slideflow.studio --cellpose + +2. Cell segmentation +-------------------- + +Segment cells for all slides in a Slideflow project. + +.. code-block:: python + + P = sf.Project(...) + P.cell_segmentation( + model='cyto2', + diam_um=10, + window_size=512, + downscale=2 + ) + +3. Cell image extraction +------------------------ + +Extract image tiles of segmented cells, in this case using segmentation masks. + +.. code-block:: python + + P.extract_cells( + tile_px=96, + tile_um='40x', + apply_masks=True, + grayspace_fraction=1 + ) diff --git a/docs/_sources/clam.rst.txt b/docs/_sources/clam.rst.txt deleted file mode 100644 index b9dcc0453..000000000 --- a/docs/_sources/clam.rst.txt +++ /dev/null @@ -1,56 +0,0 @@ -CLAM -==== - -In addition to standard Tensorflow/Keras model applications, slideflow supports training models with `CLAM `_. A slightly modified version of CLAM which supports slideflow dataset and input pipelines is included in ``slideflow.clam``. - -Creating slide activations -************************** - -The first step in the CLAM pipeline is generating tile-level activations across whole-slide images. While the original `CLAM paper `_ used features generated from an imagenet-trained model, we have found it useful to generate feature activations from models pretrained with histology images. To this end, the project function :func:`slideflow.Project.generate_features_for_clam` accepts any model as input and will generate feature vectors from the specified intermediate layers. For example: - -.. code-block:: python - - P.generate_features_for_clam( - model='/path/to/saved/model', - outdir='/clam/path', - layers=['postconv'] - ) - -Training -******** - -To train a CLAM model, use the project function :func:`slideflow.Project.train_clam`. Clam arguments are configured with :func:`slideflow.clam.get_args`: - -.. code-block:: python - - dataset = P.dataset(tile_px=299, tile_um=302) - P.generate_features_for_clam(..., outdir='/clam/path') - - clam_args = sf.clam.get_args(k=3, bag_loss='svm', ...) - - P.train_clam( - exp_name='test_experiment', - pt_files='/clam/path', - outcomes='category1', - dataset=dataset, - clam_args=clam_args - ) - -The training function will, by default, save heatmaps of the attention layers for each of the validation slides. This behavior can be disabled by passing ``attention_heatmaps=False``. - -Evaluation -********** - -To evaluate a saved CLAM model on an external dataset, first extract features from this dataset, then use the project function :func:`slideflow.Project.evaluate_clam`: - -.. code-block:: python - - P.generate_features_for_clam(..., outdir='/eval/clam/path') - - P.evaluate_clam( - exp_name='evaluation', - pt_files='/eval/clam/path', - outcomes='category1', - tile_px=299, - tile_um=302 - ) \ No newline at end of file diff --git a/docs/_sources/custom_extractors.rst.txt b/docs/_sources/custom_extractors.rst.txt new file mode 100644 index 000000000..200f30245 --- /dev/null +++ b/docs/_sources/custom_extractors.rst.txt @@ -0,0 +1,274 @@ +.. _custom_extractors: + +Custom Feature Extractors +========================= + +Slideflow includes several :ref:`pretrained feature extractors ` for converting image tiles into feature vectors as well as tools to assist with building your own feature extractor. In this note, we'll walk through the process of building a custom feature extractor from both a PyTorch and Tensorflow model. + +PyTorch +******* + +Feature extractors are implemented as a subclass of :class:`slideflow.model.extractors._factory_torch.TorchFeatureExtractor`. The base class provides core functionality and helper methods for generating features from image tiles (dtype uint8) or whole-slide images (type :class:`slideflow.WSI`). + +The initializer should create the feature extraction model and move it to the appropriate device (*i.e.* GPU). The model should be a :class:`torch.nn.Module` that accepts an image tensor as input and returns a feature tensor as output. + +.. code-block:: python + + # Import your custom torch.nn.Module, + # which generates features from an image. + from my_module import MyModel + + from slideflow.model.extractors._factory_torch import TorchFeatureExtractor + + class MyFeatureExtractor(TorchFeatureExtractor): + + tag = 'my_feature_extractor' # Human-readable identifier + + def __init__(self): + super().__init__() + + # Create the device, move to GPU, and set in evaluation mode. + self.model = MyModel() + self.model.to('cuda') + self.model.eval() + +Next, the initializer should set the number of features expected to be returned by the model. + +.. code-block:: python + + ... + + def __init__(self): + ... + + self.num_features = 1024 + +The initializer is also responsible for registering image preprocessing. The image preprocessing transformation, a function which converts a raw ``uint8`` image to a ``float32`` tensor for model input, should be stored in ``self.transform``. If the transformation standardizes the images, then the parameter ``self.preprocess_kwargs`` should be set to ``{'standardize': False}``, indicating that Slideflow should not perform any additional standardization. You can use the class method ``.build_transform()`` to use the standard preprocessing pipeline. + +.. code-block:: python + + from torchvision import transforms + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.transform = self.build_transform(img_size=256) + # Disable Slideflow standardization, + # as we are standardizing with transforms.Normalize + self.preprocess_kwargs = {'standardize': False} + +The final required method is ``.dump_config()``, which returns a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with ``"class"`` and ``"kwargs"`` attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training. + +.. code-block:: python + + ... + + def dump_config(self): + return self._dump_config( + class_name='my_module.MyFeatureExtractor' + ) + +The final class should look like this: + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._factory_torch import TorchFeatureExtractor + from torchvision import transforms + + class MyFeatureExtractor(TorchFeatureExtractor): + + tag = 'my_feature_extractor' # Human-readable identifier + + def __init__(self): + super().__init__() + + # Create the device, move to GPU, and set in evaluation mode. + self.model = MyModel() + self.model.to('cuda') + self.model.eval() + self.num_features = 1024 + + # Image preprocessing. + self.transform = self.build_transform(img_size=256) + # Disable Slideflow standardization, + # as we are standardizing with transforms.Normalize + self.preprocess_kwargs = {'standardize': False} + + def dump_config(self): + return self._dump_config( + class_name='my_module.MyFeatureExtractor' + ) + +You can then use the feature extractor for generating bags for MIL training, as described in :ref:`mil`. + +.. code-block:: python + + # Build the feature extractor. + myfeatures = MyFeatureExtractor() + + # Load a dataset. + project = slideflow.load_project(...) + dataset = project.dataset(...) + + # Generate bags. + project.generate_feature_bags(myfeatures, dataset) + +You can also generate features across whole-slide images, returning a grid of features for each slide. The size of the returned grid reflects the slide's tile grid. For example, for a slide with 24 columns and 33 rows of tiles, the returned grid will have shape ``(24, 33, n_features)``. + +.. code-block:: python + + >>> myfeatures = MyFeatureExtractor() + >>> wsi = sf.WSI('path/to/wsi', tile_px=256, tile_um=302) + >>> features = myfeatures(wsi) + >>> features.shape + (24, 33, 1024) + +Finally, the feature extractor can also be used to perform latent space analysis and generate mosaic maps, as described in :ref:`activations`. + +Slideflow includes a registration system for keeping track of all available feature extractors. To register your feature extractor, use the :func:`slideflow.model.extractors.register_torch` decorator. + +.. code-block:: python + + from slideflow.model.extractors import register_torch + + @register_torch + def my_feature_extractor(**kwargs): + return MyFeatureExtractor(**kwargs) + +Once registered, a feature extractor can be built by name: + +.. code-block:: python + + import slideflow as sf + extractor = sf.build_feature_extractor('my_feature_extractor') + + +Tensorflow +********** + +Tensorflow feature extractors are implemented very similarly to PyTorch feature extractors, extended from :class:`slideflow.model.extractors._tensorflow_base.TensorflowFeatureExtractor`. + +The initializer should create the model and set the expected number of features. + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor + + class MyFeatureExtractor(TensorflowFeatureExtractor): + + tag = 'my_feature_extractor' # Unique identifier + + def __init__(self): + super().__init__() + + # Create the model. + self.model = MyModel() + self.num_features = 1024 + +.. |per_image_standardization| replace:: ``tf.image.per_image_standardization`` +.. _per_image_standardization: https://www.tensorflow.org/api_docs/python/tf/image/per_image_standardization + + +The initializer is also responsible for registering image preprocessing and transformations. Preprocessing steps are stored in the ``.preprocess_kwargs`` dictionary, which should have the keys ``standardize`` and ``transform``. If ``standardize=True``, images will be standardized using |per_image_standardization|_. If ``transform`` is not None, it should be a callable that accepts a single image tensor and returns a transformed image tensor. + +For example, to only perform standardization and no further preprocessing: + +.. code-block:: python + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': None + } + +To perform standardization and resize images to 256x256: + +.. code-block:: python + + import tensorflow as tf + + @tf.function + def resize_256(x): + return = tf.image.resize(x, (resize_px, resize_px)) + + ... + + def __init__(self): + ... + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': resize_256 + } + +The ``.dump_config()`` method should then be set, which is expected to return a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with ``"class"`` and ``"kwargs"`` attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training. + +.. code-block:: python + + ... + + def dump_config(self): + return { + 'class': 'MyFeatureExtractor', + 'kwargs': {} + } + +The final class should look like this: + +.. code-block:: python + + from my_module import MyModel + from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor + + class MyFeatureExtractor(TensorflowFeatureExtractor): + + tag = 'my_feature_extractor' # Unique identifier + + def __init__(self): + super().__init__() + + # Create the model. + self.model = MyModel() + self.num_features = 1024 + + # Image preprocessing. + self.preprocess_kwargs = { + 'standardize': True, + 'transform': None + } + + def dump_config(self): + return { + 'class': 'MyFeatureExtractor', + 'kwargs': {} + } + +As described above, this feature extractor can then be used to create bags for MIL training, generate features across whole-slide images, or perform feature space analysis across a dataset. + +To register your feature extractor, use the :func:`slideflow.model.extractors.register_tensorflow` decorator. + +.. code-block:: python + + from slideflow.model.extractors import register_tf + + @register_tf + def my_feature_extractor(**kwargs): + return MyFeatureExtractor(**kwargs) + +...which will allow the feature extractor to be built by name: + +.. code-block:: python + + import slideflow as sf + extractor = sf.build_feature_extractor('my_feature_extractor') \ No newline at end of file diff --git a/docs/_sources/custom_loops.rst.txt b/docs/_sources/custom_loops.rst.txt index 83d143645..da3b1bf5b 100644 --- a/docs/_sources/custom_loops.rst.txt +++ b/docs/_sources/custom_loops.rst.txt @@ -1,4 +1,4 @@ -Custom training loops +Custom Training Loops ===================== To use ``*.tfrecords`` from extracted tiles in a custom training loop or entirely separate architecture (such as `StyleGAN2 `_ or `YoloV5 `_), Tensorflow ``tf.data.Dataset`` or PyTorch ``torch.utils.data.DataLoader`` objects can be created for easily serving processed images to your custom trainer. @@ -15,7 +15,7 @@ The :class:`slideflow.Dataset` class includes functions to prepare a Tensorflow P = Project('/project/path', ...) dts = P.dataset(tile_px=299, tile_um=302) -If you want to perform any balancing, use the ``.balance()`` method: +If you want to perform any mini-batch balancing, use the ``.balance()`` method: .. code-block:: python @@ -53,4 +53,4 @@ or the :meth:`slideflow.Dataset.tensorflow` method to create a ``tf.data.Dataset standardize = True, # Standardize images ) -The returned dataloaders can then be used directly with your external applications. \ No newline at end of file +The returned dataloaders can then be used directly with your external applications. Read more about :ref:`creating and using dataloaders `. \ No newline at end of file diff --git a/docs/_sources/dataloaders.rst.txt b/docs/_sources/dataloaders.rst.txt new file mode 100644 index 000000000..f09de8855 --- /dev/null +++ b/docs/_sources/dataloaders.rst.txt @@ -0,0 +1,441 @@ +.. _dataloaders: + +Dataloaders: Sampling and Augmentation +====================================== + +With support for both Tensorflow and PyTorch, Slideflow provides several options for dataset sampling, processing, and augmentation. Here, we'll review the options for creating dataloaders - objects that read and process TFRecord data and return images and labels - in each framework. In all cases, data are read from TFRecords generated through :ref:`filtering`. The TFRecord data format is discussed in more detail in the :ref:`tfrecords` note. + +Tensorflow +********** + +.. |TFRecordDataset| replace:: ``tf.data.TFRecordDataset`` +.. _TFRecordDataset: https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset + +The :meth:`slideflow.Dataset.tensorflow()` method provides an easy interface for creating a ``tf.data.Dataset`` that reads and interleaves from tfrecords in a Slideflow dataset. Behind the scenes, this method uses the |TFRecordDataset|_ class for reading and parsing each TFRecord. + +The returned ``tf.data.Dataset`` object is an iterable-only dataset whose returned values depend on the arguments provided to the ``.tensorflow()`` function. + +If no arguments are provided, the returned dataset will yield a tuple of ``(image, None)``, where the image is a ``tf.Tensor`` of shape ``[tile_height, tile_width, num_channels]`` and type ``tf.uint8``. + +If the ``labels`` argument is provided (dictionary mapping slide names to a numeric label), the returned dataset will yield a tuple of ``(image, label)``, where the label is a ``tf.Tensor`` with a shape and type that matches the provided labels. + +.. code-block:: python + + import slideflow as sf + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Get the labels + labels, unique_labels = dataset.labels('HPV_status') + + # Create a tensorflow dataset + # that yields (image, label) tuples + tf_dataset = dataset.tensorflow(labels=labels) + + for image, label in tf_dataset: + # Do something with the image and label... + ... + +Slide names and tile locations +------------------------------ + +Dataloaders can be configured to return slide names and tile locations in addition to the image and label. This is done by providing the ``incl_slidenames`` and ``incl_loc`` arguments to the ``.tensorflow()`` method. Both arguments are boolean values and default to ``False``. + +Setting ``incl_slidenames=True`` will return the slidename as a Tensor (dtype=string) after the label. Setting ``incl_loc=True`` will return the x and y locations, both as Tensors (dtype=int64), as the last two values of the tuple. + +.. code-block:: python + + tf_dataset = dataset.tensorflow(incl_slidenames=True, incl_loc=True) + + for image, label, slide, loc_x, loc_y in tf_dataset: + ... + +Image preprocessing +------------------- + +.. |per_image_standardization| replace:: ``tf.image.per_image_standardization()`` +.. _per_image_standardization: https://www.tensorflow.org/api_docs/python/tf/image/per_image_standardization + +Dataloaders created with ``.tensorflow()`` include several image preprocessing options. These options are provided as keyword arguments to the ``.tensorflow()`` method and are executed in the order listed below: + +- **crop_left** (int): Crop images to this top-left x/y coordinate. Default is ``None``. +- **crop_width** (int): Crop images to this width. Default is ``None``. +- **resize_target** (int): Resize images to this width/height. Default is ``None``. +- **resize_method** (str): Resize method. Default is ``"lanczos3"``. +- **resize_aa** (bool): Enable antialiasing if resizing. Defaults to ``True``. +- **normalizer** (``StainNormalizer``): Perform stain normalization. +- **augment** (str): Perform augmentations based on the provided string. Combine characters to perform multiple augmentations (e.g. ``'xyrj'``). Options include: + - ``'n'``: Perform :ref:`stain_augmentation` (done concurrently with stain normalization) + - ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + - ``'r'``: Random 90-degree rotation + - ``'x'``: Random horizontal flip + - ``'y'``: Random vertical flip + - ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) +- **transform** (Any): Arbitrary function to apply to each image. The function must accept a single argument (the image) and return a single value (the transformed image). +- **standardize** (bool): Standardize images with |per_image_standardization|_, returning a ``tf.float32`` image. Default is ``False``, returning a ``tf.uint8`` image. + +Dataset sharding +---------------- + +Tensorflow dataloaders can be sharded into multiple partitions, ensuring that data is not duplicated when performing distributed training across multiple processes or nodes. This is done by providing the ``shard_idx`` and ``num_shards`` arguments to the ``.tensorflow()`` method. The ``shard_idx`` argument is an integer specifying the shard number, and ``num_shards`` is an integer specifying the total number of shards. + +.. code-block:: python + + # Shard the dataset for GPU 1 of 4 + tf_dataset = dataset.tensorflow( + ..., + shard_idx=0, + num_shards=4 + ) + +PyTorch +******* + +.. |DataLoader| replace:: ``torch.utils.data.DataLoader`` +.. _DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader + +As with Tensorflow, the :meth:`slideflow.Dataset.torch()` method creates a |DataLoader|_ that reads images from TFRecords. In the backend, TFRecords are read using :func:`slideflow.tfrecord.torch.MultiTFRecordDataset` and processed as described in :ref:`tfrecords`. + +The returned |DataLoader|_ is an iterable-only dataloader whose returned values depend on the arguments provided to the ``.torch()`` function. An indexable, map-style dataset is also available when using PyTorch, as described in :ref:`indexable_dataloader`. + +If no arguments are provided, the returned dataloader will yield a tuple of ``(image, None)``, where the image is a ``torch.Tensor`` of shape ``[num_channels, tile_height, tile_width]`` and type ``torch.uint8``. Labels are assigned as described above. Slide names and tile location can also be returned, using the same arguments as `described above `_. + + +.. code-block:: python + + import slideflow as sf + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Create a tensorflow dataset + torch_dl = dataset.torch() + + for image, label in torch_dl: + # Do something with the image... + ... + +Image preprocessing +------------------- + +Dataloaders created with ``.torch()`` include several image preprocessing options, provided as keyword arguments to the ``.torch()`` method. These preprocessing steps are executed in the order listed below: + +- **normalizer** (``StainNormalizer``): Perform stain normalization. +- **augment** (str): Perform augmentations based on the provided string. Combine characters to perform multiple augmentations (e.g. ``'xyrj'``). Augmentations are executed in the order characters appear in the string. Options include: + - ``'n'``: Perform :ref:`stain_augmentation` (done concurrently with stain normalization) + - ``'j'``: Random JPEG compression (50% chance to compress with quality between 50-100) + - ``'r'``: Random 90-degree rotation + - ``'x'``: Random horizontal flip + - ``'y'``: Random vertical flip + - ``'b'``: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0) +- **transform** (Any): Arbitrary function to apply to each image, including `torchvision transforms `_. The function must accept a single argument (the image, in ``(num_channels, height, width)`` format) and return a single value (the transformed image). +- **standardize** (bool): Standardize images with ``image / 127.5 - 1``, returning a ``torch.float32`` image. Default is ``False``, returning a ``torch.uint8`` image. + +Below is an example of using the ``transform`` argument to apply a torchvision transform to each image: + +.. code-block:: python + + import torchvision.transforms as T + + # Create a torch dataloader + torch_dataloader = dataset.torch( + transform=T.Compose([ + RandomResizedCrop(size=(224, 224), antialias=True), + Normalize(mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]), + ]) + ) + + for image, label in torch_dataloader: + # Do something with the image and label... + ... + +Dataset sharding +---------------- + +PyTorch Dataloaders can similarly be sharded into multiple partitions, ensuring that data is not duplicated when performing distributed training across multiple process or nodes. + +Sharding is done in two stages. First, dataloaders can be split into partitions using the ``rank`` and ``num_replicas`` arguments to the ``.torch()`` method. The ``rank`` argument is an integer specifying the rank of the current process, and ``num_replicas`` is an integer specifying the total number of processes. + +.. code-block:: python + + # Shard the dataset for GPU 1 of 4 + torch_dataloader = dataset.torch( + ..., + rank=0, + num_replicas=4 + ) + +The second stage of sharding happens in the background: if a dataloader is built with multiple worker processes (``Dataset.torch(num_workers=...)``), partitions will be automatically further subdivided into smaller chunks, ensuring that each worker process reads a unique subset of the data. + +Labeling +******** + +The ``label`` argument to the ``.tensorflow()`` and ``.torch()`` methods accept a dictionary mapping slide names to a numeric label. During TFRecord reading, the slide name is used to lookup the label from the provided dictionary. + +.. warning:: + + Labels are assigned to image tiles based on the slide names inside a :ref:`tfrecord ` file, not by the filename of the tfrecord. This means that renaming a TFRecord file will not change the label of the tiles inside the file. If you need to change the slide names associated with tiles inside a TFRecord, the TFRecord file must be regenerated. + +The most common way to generate labels is to use the :meth:`slideflow.Dataset.labels()` method, which returns a dictionary mapping slide names to numeric labels. For categorical labels, the numeric labels correspond to the index of the label in the ``unique_labels`` list. For example, if the ``unique_labels`` list is ``['HPV-', 'HPV+']``, then the mapping of numeric labels would be ``{ 'HPV-': 0, 'HPV+': 1 }``. + +.. code-block:: python + + >>> labels, unique_labels = dataset.labels('HPV_status') + >>> unique_labels + ['HPV-', 'HPV+'] + >>> labels + {'slide1': 0, + 'slide2': 1, + ... + } + >>> tf_dataset = dataset.tensorflow(labels=labels) + +.. _sampling: + +Sampling +******** + +Dataloaders created with ``.tensorflow()`` and ``.torch()`` are iterable-only dataloaders, meaning that they cannot be indexed directly. This is because the underlying TFRecords are sampled in a streaming fashion, and the dataloader does not know what the next record will be until it has been read. This is in contrast to the :ref:`indexable_dataloader` method described below, which creates an indexable, map-style dataset. + +Dataloaders created with ``.tensorflow()`` and ``.torch()`` can be configured to sample from TFRecords in several ways, with options for infinite vs. finite sampling, oversampling, and undersampling. These sampling methods are described below. + +Infinite dataloaders +-------------------- + +By default, dataloaders created with ``.tensorflow()`` and ``.torch()`` will sample from TFRecords in an infinite loop. This is useful for training, where the dataloader should continue to yield images until the training process is complete. By default, images are sampled from TFRecords with uniform sampling, meaning that each TFRecord has an equal chance of yielding an image. This sampling strategy can be configured, as described below. + +.. note:: + + When training :ref:`tile-based models `, a dataloader is considered to have yielded one "epoch" of data when it has yielded the number of images equal to the number of tiles in the dataset. Due to the random sampling from TFRecords, this means that some images will be overrepresented (images from TFRecords with fewer tiles) and some will be underrepresented (images from TFRecords with many tiles). + +Finite dataloaders +------------------ + +Dataloaders can also be configured with finite sampling, yielding tiles from TFRecords exactly once. This is accomplished by passing the argument ``infinite=False`` to the ``.tensorflow()`` or ``.torch()`` methods. + +.. _balancing: + +Oversampling with balancing +--------------------------- + +Oversampling methods control the probability that tiles are read from each TFRecord, affecting the balance of data across slides, patients, and outcome categories. Oversampling is configured at the Dataset level, using the :meth:`slideflow.Dataset.balance` method. This method returns a copy of the dataset with the specified oversampling strategy. + +**Slide-level balancing**: By default, images are sampled from TFRecords with uniform probability, meaning that each TFRecord has an equal chance of yielding an image. This is equivalent to both ``.balance(strategy='slide')`` and ``.balance(strategy=None)``. This strategy will oversample images from slides with fewer tiles, and undersample images from slides with more tiles. + +.. code-block:: python + + # Sample from TFRecords with equal probability + dataset = dataset.balance(strategy='slide') + +**Patient-level balancing**: To sample from TFRecords with probability proportional to the number of tiles in each patient, use ``.balance(strategy='patient')``. This strategy will oversample images from patients with fewer tiles, and undersample images from patients with more tiles. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each patient. + dataset = dataset.balance(strategy='patient') + +**Tile-level balancing**: To sample from TFRecords with uniform probability across image tiles, use ``.balance(strategy='tile')``. This strategy will sample from TFRecords with probability proportional to the number of tiles in the TFRecord, resulting in higher representation of slides with more tiles. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each TFRecord. + dataset = dataset.balance(strategy='tile') + +**Category-level balancing**: To sample from TFRecords with probability proportional to the number of tiles in each outcome category, use ``.balance(strategy='category')``. This strategy will oversample images from outcome categories with fewer tiles, and undersample images from outcome categories with more tiles. This strategy will also perform slide-level balancing within each category. Category-level balancing is only available when using categorical labels. + +.. code-block:: python + + # Sample from TFRecords with probability proportional + # to the number of tiles in each category + # "HPV-" and "HPV+". + dataset = dataset.balance("HPV_status", strategy='category') + +**Custom balancing**: The ``.balance()`` method saves sampling probability weights to ``Dataset.prob_weights``, a dictionary mapping TFRecord paths to sampling weights. Custom balancing can be performed by overriding this dictionary with custom weights. + +.. code-block:: python + + >>> dataset = dataset.balance(strategy='slide') + >>> dataset.prob_weights + {'/path/to/tfrecord1': 0.002, + '/path/to/tfrecord2': 0.003, + ... + } + >>> dataset.prob_weights = {...} + +Balancing is automatically applied to dataloaders created with the ``.tensorflow()`` and ``.torch()`` methods. + +Undersampling with clipping +--------------------------- + +Datasets can also be configured to undersample TFRecords using :meth:`slideflow.Dataset.clip`. Several undersampling strategies are available. + +**Slide-level clipping**: TFRecords can be clipped to a maximum number of tiles per slide using ``.clip(max_tiles)``. This strategy will clip TFRecords with more tiles than the specified ``max_tiles`` value, resulting in a maximum of ``max_tiles`` tiles per slide. + +**Patient-level clipping**: TFRecords can be clipped to a maximum number of tiles per patient using ``.clip(max_tiles, strategy='patient')``. For patients with more than one slide/TFRecord, TFRecords will be clipped proportionally. + +**Outcome-level clipping**: TFRecords can also be clipped to a maximum number of tiles per outcome category using ``.clip(max_tiles, strategy='category', headers=...)``. The outcome category is specified by the ``headers`` argument, which can be a single header name or a list of header names. Within each category, TFRecords will be clipped proportionally. + +**Custom clipping**: The ``.clip()`` method saves clipping values to ``Dataset._clip``, a dictionary mapping TFRecord paths to counts of how many tiles should be sampled from the TFRecord. Custom clipping can be performed by overriding this dictionary with custom weights. + +.. code-block:: python + + >>> dataset = dataset.clip(100) + >>> dataset._clip + {'/path/to/tfrecord1': 76, + '/path/to/tfrecord2': 100, + ... + } + >>> dataset._clip = {...} + +Undersampling via dataset clipping is automatically applied to dataloaders created with ``.tensorflow()`` and ``.torch()``. + +During training +--------------- + +If you are training a Slideflow model by directly providing a training and validation dataset to the :meth:`slideflow.Project.train` method, you can configure the datasets to perform oversampling and undersampling as described above. For example: + +.. code-block:: python + + import slideflow as sf + + # Load a project + project = sf.load_project(...) + + # Configure a training dataset with tile-level balancing + # and clipping to max 100 tiles per TFRecord + train = project.dataset(...).balance(strategy='tile').clip(100) + + # Get a validation dataset + val = project.dataset(...) + + # Train a model + project.train( + ..., + dataset=train, + val_dataset=val, + ) + +Alternatively, you can configure oversampling during training through the ``training_balance`` and ``validation_balance`` hyperparameters, as described in the :ref:`ModelParams ` documentation. Undersampling with dataset clipping can be performed with the ``max_tiles`` argument. Configuring oversampling/undersampling with this method propagates the configuration to all datasets generated during cross-validation. + +.. code-block:: python + + import slideflow as sf + + # Load a project + project = sf.load_project(...) + + # Configure hyperparameters with tile-level + # balancing/oversampling for the training data + hp = sf.ModelParams( + ..., + training_balance='tile', + validation_balance=None, + ) + + # Train a model. + # Undersample/clip data to max 100 tiles per TFRecord. + project.train( + ..., + params=hp, + max_tiles=100 + ) + + +.. _indexable_dataloader: + +Direct indexing +*************** + +An indexable, map-style dataloader can be created for PyTorch using :class:`slideflow.io.torch.IndexedInterleaver`, which returns a ``torch.utils.data.Dataset``. Indexable datasets are only available for the PyTorch backend. + +This indexable dataset is created from a list of TFRecords and accepts many arguments for controlling labels, augmentation and image transformations. + +.. code-block:: python + + from slideflow.io.torch import IndexedInterleaver + + # Create a dataset object + project = sf.load_project(...) + dataset = project.dataset(...) + + # Get the TFRecords + tfrecords = dataset.tfrecords() + + # Assemble labels + labels, _ = dataset.labels("HPV_status") + + # Create an indexable dataset + dts = IndexedInterleaver( + tfrecords, + labels=labels, + augment="xyrj", + transform=T.Compose([ + T.RandomResizedCrop(size=(224, 224), + antialias=True), + ]), + normalizer=None, + standardize=True, + shuffle=True, + seed=42, + ) + +The returned dataset is indexable, meaning that it can be indexed directly to retrieve a single image and label. + +.. code-block:: python + + >>> len(dts) + 284114 + >>> image, label = dts[0] + >>> image.shape + torch.Size([3, 224, 224]) + >>> image.dtype + torch.float32 + +The dataset can be configured to return slide names and tile locations by setting the ``incl_slidenames`` and ``incl_loc`` arguments to ``True``, as described above. + +Dataset sharding is supported with the same ``rank`` and ``num_replicas`` arguments as described above. + +.. code-block:: python + + # Shard for GPU 1 of 4 + dts = IndexedInterleaver( + ..., + rank=0, + num_replicas=4 + ) + +:class:`slideflow.io.IndexedInterleaver` supports undersampling via the `clip` argument (array of clipping values for each TFRecord), but does not support oversampling or balancing. + +.. code-block:: python + + # Specify TFRecord clipping values + dts = IndexedInterleaver( + tfrecords=..., + clip=[100, 75, ...], # Same length as tfrecords + ... + ) + +A |DataLoader|_ can then be created from the indexable dataset using the ``torch.utils.data.DataLoader`` class, as described in the PyTorch documentation. + +.. code-block:: python + + from torch.utils.data import DataLoader + + # Create a dataloader + dl = DataLoader( + dts, + batch_size=32, + num_workers=4, + pin_memory=True, + drop_last=True, + ) + + for image, label in dl: + # Do something with the image and label... + ... diff --git a/docs/_sources/dataset.rst.txt b/docs/_sources/dataset.rst.txt index 0d60b5504..eee70bbd2 100644 --- a/docs/_sources/dataset.rst.txt +++ b/docs/_sources/dataset.rst.txt @@ -1,132 +1,80 @@ -.. currentmodule:: slideflow.dataset +.. currentmodule:: slideflow .. _dataset: -slideflow.dataset -===================== - -The :class:`Dataset` class in this module is used to organize dataset sources, ROI annotations, -clinical annotations, and dataset processing. - -Dataset Organization ---------------------- - -A *source* is a set of slides, corresponding Regions of Interest (ROI) annotations (if available), and any tiles -extracted from these slides, either as loose tiles or in the binary TFRecord format. Sources are defined in the -project dataset configuration JSON file, with the following format: - -.. code-block:: json - - { - "SOURCE": - { - "slides": "/directory", - "roi": "/directory", - "tiles": "/directory", - "tfrecords": "/directory", - } - } - -A single *dataset* can have multiple sources. One example of this might be if you were performing a pan-cancer analysis; -you would likely have a unique source for each cancer subtype, in order to keep each set of slides and tiles distinct. -Another example might be if you are analyzing slides from multiple institutions, and you want to ensure that you are -not mixing your training and evaluation datasets. - -The :class:`Dataset` class is initialized from a dataset configuration file, a list of source names -to include from the configuration file, and tile size parameters (``tile_px`` and ``tile_um``). Clinical annotations can be -provided to this object, which can then be used to filter slides according to outcomes and perform a variety of other -class-aware functions. - -Filtering ---------- - -Datasets can be filtered with several different filtering mechanisms: - -- **filters**: A dictionary can be passed via the ``filters`` argument to a Dataset to perform filtering. The keys of this dictionary should be annotation headers, and the values of this dictionary indicate the categorical outcomes which should be included. Any slides with an outcome other than what is provided by this dict will be excluded. -- **filter_blank**: A list of headers can be provided to the ``filter_blank`` argument; any slide with a blank annotation in one of these columns will be excluded. -- **min_tiles**: An int can be provided to ``min_tiles``; any tfrecords with fewer than this number of tiles will be excluded. - -Filters can be provided at the time of Dataset instantiation by passing to the initializer: - -.. code-block:: python - - dataset = Dataset(..., filters={'HPV_status': ['negative', 'positive']}) - -... or with the :meth:`Dataset.filter` method: - -.. code-block:: python - - dataset = dataset.filter(min_tiles=50) - -Once applied, all dataset functions and parameters will reflect this filtering criteria, including the :attr:`Dataset.num_tiles` parameter. - -Dataset Manipulation --------------------- - -A number of different functions can be applied to Datasets in order to manipulate filters (:meth:`Dataset.filter`, :meth:`Dataset.remove_filter`, :meth:`Dataset.clear_filters`), balance datasets (:meth:`Dataset.balance`), or clip tfrecords to a maximum number of tiles (:meth:`Dataset.clip`). The full documentation of these functions is given below. Note: these functions return a Dataset copy with the functions applied, not to the original dataset. Thus, for proper use, assign the result of the function to the original dataset variable: - -.. code-block:: python - - dataset = dataset.clip(50) - -This also means that these functions can be chained for simplicity: - -.. code-block:: python - - dataset = dataset.balance('HPV_status').clip(50) - - -Manifest --------- - -The Dataset manifest is a dictionary mapping tfrecords to both the total number of slides, as well as the number of slides after any clipping or balancing. For example, after clipping: - -.. code-block:: python - - dataset = dataset.clip(500) - -... the :meth:`Dataset.manifest` function would return something like: - -.. code-block:: json - - { - "/path/tfrecord1.tfrecords": - { - "total": 1526, - "clipped": 500 - }, - "/path/tfrecord2.tfrecords": - { - "total": 455, - "clipped": 455 - } - } - -Training/Validation Splitting ------------------------------ - -Datasets can be split into training and validation datasets with :meth:`Dataset.train_val_split`, with full documentation given below. The result of this function is two datasets - the first training, the second validation - each a separate instance of :class:`Dataset`. - -Tile and TFRecord Processing ----------------------------- - -Datasets can also be used to process and extract tiles. Some example methods support tile and tfrecord processing include: - -- :meth:`Dataset.extract_tiles`: Performs tile extraction for all slides in the dataset. -- :meth:`Dataset.extract_tiles_from_tfrecords`: Extract tiles from saved TFRecords, saving in loose .jpg or .png format to a folder. -- :meth:`Dataset.resize_tfrecords`: Resizes all images in TFRecords to a new size. -- :meth:`Dataset.split_tfrecords_by_roi`: Splits a set of extracted tfrecords according to whether tiles are inside or outside the slide's ROI. -- :meth:`Dataset.tfrecord_report`: Generates a PDF report of the tiles inside a collection of TFRecords. - -Tensorflow & PyTorch Datasets ------------------------------ - -Finally, Datasets can also return either a ``tf.data.Datasets`` or ``torch.utils.data.Dataloader`` object to quickly and easily create a deep learning dataset ready to be used as model input, with the :meth:`Dataset.tensorflow` and :meth:`Dataset.torch` methods, respectively. - -.. automodule: slideflow.dataset - -Dataset --------- - -.. autoclass:: slideflow.Dataset - :inherited-members: \ No newline at end of file +slideflow.Dataset +================= + +.. autoclass:: Dataset + +Attributes +---------- + +.. autosummary:: + + Dataset.annotations + + Dataset.filters + Dataset.filter_blank + Dataset.filtered_annotations + Dataset.img_format + Dataset.min_tiles + Dataset.num_tiles + +Methods +------- + +.. autofunction:: slideflow.Dataset.balance +.. autofunction:: slideflow.Dataset.build_index +.. autofunction:: slideflow.Dataset.cell_segmentation +.. autofunction:: slideflow.Dataset.check_duplicates +.. autofunction:: slideflow.Dataset.clear_filters +.. autofunction:: slideflow.Dataset.clip +.. autofunction:: slideflow.Dataset.convert_xml_rois +.. autofunction:: slideflow.Dataset.extract_cells +.. autofunction:: slideflow.Dataset.extract_tiles +.. autofunction:: slideflow.Dataset.extract_tiles_from_tfrecords +.. autofunction:: slideflow.Dataset.filter +.. autofunction:: slideflow.Dataset.find_slide +.. autofunction:: slideflow.Dataset.find_tfrecord +.. autofunction:: slideflow.Dataset.generate_feature_bags +.. autofunction:: slideflow.Dataset.get_tfrecord_locations +.. autofunction:: slideflow.Dataset.get_tile_dataframe +.. autofunction:: slideflow.Dataset.harmonize_labels +.. autofunction:: slideflow.Dataset.is_float +.. autofunction:: slideflow.Dataset.kfold_split +.. autofunction:: slideflow.Dataset.labels +.. autofunction:: slideflow.Dataset.load_annotations +.. autofunction:: slideflow.Dataset.load_indices +.. autofunction:: slideflow.Dataset.manifest +.. autofunction:: slideflow.Dataset.manifest_histogram +.. autofunction:: slideflow.Dataset.patients +.. autofunction:: slideflow.Dataset.get_bags +.. autofunction:: slideflow.Dataset.read_tfrecord_by_location +.. autofunction:: slideflow.Dataset.remove_filter +.. autofunction:: slideflow.Dataset.rebuild_index +.. autofunction:: slideflow.Dataset.resize_tfrecords +.. autofunction:: slideflow.Dataset.rois +.. autofunction:: slideflow.Dataset.slide_manifest +.. autofunction:: slideflow.Dataset.slide_paths +.. autofunction:: slideflow.Dataset.slides +.. autofunction:: slideflow.Dataset.split +.. autofunction:: slideflow.Dataset.split_tfrecords_by_roi +.. autofunction:: slideflow.Dataset.summary +.. autofunction:: slideflow.Dataset.tensorflow +.. autofunction:: slideflow.Dataset.tfrecord_report +.. autofunction:: slideflow.Dataset.tfrecord_heatmap +.. autofunction:: slideflow.Dataset.tfrecords +.. autofunction:: slideflow.Dataset.tfrecords_by_subfolder +.. autofunction:: slideflow.Dataset.tfrecords_folders +.. autofunction:: slideflow.Dataset.tfrecords_from_tiles +.. autofunction:: slideflow.Dataset.tfrecords_have_locations +.. autofunction:: slideflow.Dataset.transform_tfrecords +.. autofunction:: slideflow.Dataset.thumbnails +.. autofunction:: slideflow.Dataset.torch +.. autofunction:: slideflow.Dataset.unclip +.. autofunction:: slideflow.Dataset.update_manifest +.. autofunction:: slideflow.Dataset.update_annotations_with_slidenames +.. autofunction:: slideflow.Dataset.verify_annotations_slides +.. autofunction:: slideflow.Dataset.verify_img_format +.. autofunction:: slideflow.Dataset.verify_slide_names diff --git a/docs/_sources/dataset_features.rst.txt b/docs/_sources/dataset_features.rst.txt new file mode 100644 index 000000000..8d9a79bdd --- /dev/null +++ b/docs/_sources/dataset_features.rst.txt @@ -0,0 +1,28 @@ +.. currentmodule:: slideflow + +slideflow.DatasetFeatures +========================= + +.. autoclass:: DatasetFeatures + +Methods +------- + +.. autofunction:: slideflow.DatasetFeatures.activations_by_category +.. autofunction:: slideflow.DatasetFeatures.box_plots +.. autofunction:: slideflow.DatasetFeatures.concat +.. autofunction:: slideflow.DatasetFeatures.from_df +.. autofunction:: slideflow.DatasetFeatures.load_cache +.. autofunction:: slideflow.DatasetFeatures.map_activations +.. autofunction:: slideflow.DatasetFeatures.map_predictions +.. autofunction:: slideflow.DatasetFeatures.merge +.. autofunction:: slideflow.DatasetFeatures.remove_slide +.. autofunction:: slideflow.DatasetFeatures.save_cache +.. autofunction:: slideflow.DatasetFeatures.save_example_tiles +.. autofunction:: slideflow.DatasetFeatures.softmax_mean +.. autofunction:: slideflow.DatasetFeatures.softmax_percent +.. autofunction:: slideflow.DatasetFeatures.softmax_predict +.. autofunction:: slideflow.DatasetFeatures.stats +.. autofunction:: slideflow.DatasetFeatures.to_csv +.. autofunction:: slideflow.DatasetFeatures.to_df +.. autofunction:: slideflow.DatasetFeatures.to_torch \ No newline at end of file diff --git a/docs/_sources/datasets_and_val.rst.txt b/docs/_sources/datasets_and_val.rst.txt new file mode 100644 index 000000000..b2644f498 --- /dev/null +++ b/docs/_sources/datasets_and_val.rst.txt @@ -0,0 +1,290 @@ +.. currentmodule:: slideflow.dataset + +.. _datasets_and_validation: + +Datasets +======== + +Working with large-scale imaging data can be both challenging and messy, so Slideflow provides the :class:`Dataset` class to assist with managing, splitting, filtering, and transforming your data for easy downstream use. :class:`Dataset` organizes a set of image tiles extracted at a specific size, along with their associated slides and clinical annotations. Datasets are used for many Slideflow functions, and can quickly generate ``torch.utils.data.DataLoader`` and ``tf.data.Datasets`` objects that provide preprocessed slide images for external applications. + +Dataset Sources +*************** + +Datasets are comprised of one or more *sources*, which are a set of slides, Regions of Interest (if available), and any tiles extracted from these slides. You might choose to organize your data into separate sources if slides are organized into distinct locations on disk - for example, if you are using multiple sets of slides from different institutions, with data from each institution stored separately. + +Loading a Dataset +***************** + +Datasets can be created either from a :ref:`Project ` - using the project's dataset configuration file - or directly by providing paths to slides, annotations, and image tile destinations. In the next sections, we'll take a look at how to create a :class:`Dataset` with each method. + +From a project +-------------- + +If you are working in the context of a :ref:`Project `, a dataset can be quickly created using :meth:`Project.dataset`. A dataset can be loaded from a given ``Project`` with the following parameters: + +- ``tile_px`` is the tile size, in pixels +- ``tile_um`` is the tile size, in microns (``int``) or magnification (``'40x'``) +- ``sources`` is an optional list of dataset sources to use + +.. code-block:: python + + import slideflow as sf + + P = sf.load_project('/project/path') + dataset = P.dataset(tile_px=299, tile_um='10x', sources=['Source1']) + +If ``sources`` is not provided, all available sources will be used. + +Alternatively, you can accomplish the same by creating a :class:`Dataset` object directly, passing in the project :ref:`dataset configuration file ` to the ``config`` argument, and a path to the annotations file to ``annotations``: + +.. code-block:: python + + dataset = sf.Dataset( + config='config.json', + sources=['Source1'], + annotations='annotations.csv', + tile_px=299, + tile_um='10x' + ) + +Manually from paths +------------------- + +You can also create a dataset by manually supplying paths to slides, destination for image tiles, and clinical annotations. A single dataset source will be created from the provided arguments, which include: + +- ``tile_px`` is the tile size, in pixels +- ``tile_um`` is the size in microns or magnification +- ``slides`` is the directory containing whole-slide images +- ``roi`` is the directory containing Regions of Interest \*.csv files +- ``tfrecords`` is the path to where image tiles should be stored in TFRecords +- ``tiles`` is the path to where image tiles should be stored as \*.jpg images +- ``annotations`` is either an annotations file (CSV) or Pandas DataFrame. + +For example, to create a dataset from a set of slides, with a configured TFRecord directory and annotations provided via Pandas DataFrame: + +.. code-block:: python + + import pandas as pd + + # Create some clinical annotations + df = pd.DataFrame(...) + + # Create a dataset + dataset = sf.Dataset( + slides='/slides', + tfrecords='/tfrecords', + annotations=df, + tile_px=299, + tile_um='10x' + ) + +When creating a :class:`Dataset` manually from paths, tfrecords should be organized into subdirectories named according to tile size. Using the above example, the tfrecords directory should look like: + +.. code-block:: none + + /tfrecords + └── 299px_10x + ├── slide1.tfrecords + ├── slide2.tfrecords + ├── slide3.tfrecords + └── ... + + +Filtering +********* + +Datasets can be filtered through several mechanisms: + +- **filters**: A dictionary, where keys are clinical annotation headers and values are the variable states which should be included. All remaining slides are removed from the dataset. +- **filter_blank**: A list of headers; any slide with a blank value in the clinical annotations in one of these columns will be excluded. +- **min_tiles**: An ``int``; any tfrecords with fewer than this number of tiles will be excluded. + +Filters can be provided at the time of Dataset creation by passing to the initializer: + +.. code-block:: python + + dataset = Dataset(..., filters={'HPV_status': ['negative', 'positive']}) + +or by using the :meth:`Dataset.filter` method: + +.. code-block:: python + + dataset = dataset.filter(min_tiles=50) + +Dataset Manipulation +******************** + +A number of functions can be applied to Datasets to manipulate patient filters (:meth:`Dataset.filter`, :meth:`Dataset.remove_filter`, :meth:`Dataset.clear_filters`), clip tfrecords to a maximum number of tiles (:meth:`Dataset.clip`), or prepare mini-batch balancing (:meth:`Dataset.balance`). The full documentation for these functions is given :ref:`in the API `. Each of these manipulations return an altered copy of the dataset for easy chaining: + +.. code-block:: python + + dataset = dataset.balance('HPV_status').clip(50) + +Each of these manipulations is performed in memory and will not affect data stored on disk. + + +Dataset Inspection +****************** + +The fastest way to inspect a :class:`Dataset` and the dataset sources loaded, number of slides found, clinical annotation columns available, and number of tiles extracted into TFRecords is the :meth:`Dataset.summary` method. + +.. code-block:: python + + dataset.summary() + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 18354 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +Manifest +******** + +:meth:`Dataset.manifest` provides a dictionary mapping tfrecords to the total number of image tiles and the number of tiles after clipping or mini-batch balancing. For example, after clipping: + +.. code-block:: python + + dataset = dataset.clip(500) + +the manifest may look something like: + +.. code-block:: json + + { + "/path/tfrecord1.tfrecords": + { + "total": 1526, + "clipped": 500 + }, + "/path/tfrecord2.tfrecords": + { + "total": 455, + "clipped": 455 + } + } + +Inspecting a dataset's manifest may be useful to better understand the effects of dataset manipulations. + +.. _validation_planning: + +Training/Validation Splitting +***************************** + +An important step when planning an experiment is to determine your validation and testing data. In total, deep learning experiments should have three groups of data: + +1) **Training** - data used for learning during training +2) **Validation** - data used for validating training parameters and early stopping (if applicable) +3) **Evaluation** - held-out data used for final testing once all training and parameter tuning has completed. Preferably an external cohort. + +| + +Slideflow includes tools for flexible training, validation, and evaluation data planning as discussed in the next sections. + +Creating a split +---------------- + +Datasets can be split into training and validation or test datasets with :meth:`Dataset.split`. The result of this function is two datasets - the first training, the second validation - each a separate instance of :class:`Dataset`. + +Slideflow provides several options for preparing a validation plan, including: + +- **strategy**: ``'bootstrap'``, ``'k-fold'``, ``'k-fold-manual'``, ``'k-fold-preserved-site'``, ``'fixed'``, and ``'none'`` +- **fraction**: (float between 0-1) [not used for k-fold validation] +- **k_fold**: int + +The default validation strategy is three-fold cross-validation (``strategy='k-fold'`` and ``k=3``). + +.. code-block:: python + + # Split a dataset into training and validation + # using 5-fold cross-validation, with this being + # the first cross-fold. + train_dataset, test_dataset = dataset.split( + model_type='classification', # Categorical labels + labels='subtype', # Label to balance between datasets + k_fold=5, # Total number of crossfolds + k_fold_iter=1, # Cross-fold iteration + splits='splits.json' # Where to save/load crossfold splits + ) + +You can also use :meth:`Dataset.kfold_split` to iterate through cross-fold splits: + +.. code-block:: python + + # Split a dataset into training and validation + # using 5-fold cross-validation + for train, test in dataset.kfold_split(k=5, labels='subtype'): + ... + + +.. _validation_strategies: + +Validation strategies +--------------------- + +.. figure:: validation.png + :width: 100% + :align: center + +The ``strategy`` option determines how the validation data is selected. + +If **fixed**, a certain percentage of your training data is set aside for testing (determined by ``fraction``). + +If **bootstrap**, validation data will be selected at random (percentage determined by ``fraction``), and all training iterations will be repeated a number of times equal to ``k_fold``. When used during training, the reported model training metrics will be an average of all bootstrap iterations. + +If **k-fold**, training data will be automatically separated into *k* number of groups (where *k* is equal to ``k_fold``), and all training iterations will be repeated *k* number of times using k-fold cross validation. The saved and reported model training metrics will be an average of all k-fold iterations. + +Datasets can be separated into manually-curated k-folds using the **k-fold-manual** strategy. Assign each slide to a k-fold cohort in the annotations file, and designate the appropriate column header with ``k_fold_header`` + +The **k-fold-preserved-site** strategy is a cross-validation strategy that ensures site is preserved across the training/validation sets, in order to reduce bias from batch effect as described by `Howard, et al `_. This strategy is recommended when using data from The Cancer Genome Atlas (`TCGA `_). + +.. note:: + Preserved-site cross-validation requires either `CPLEX `_ or `Pyomo/Bonmin `_. The original implementation of the preserved-site cross-validation algorithm described by Howard et al can be found `on GitHub `_. + +If **none**, no validation testing will be performed. + +Re-using splits +--------------- + +For all validation strategies, training/validation splits can be logged to a JSON file automatically if a splits configuration file is provided to the argument ``splits``. When provided, :meth:`Dataset.split` will prioritize using previously-generated training/validation splits rather than generating a new split. This aids with experiment reproducibility and hyperparameter tuning. If training/validation splits are being prepared by a :ref:`Project-level function `, splits will be automatically logged to a ``splits.json`` file in the project root directory. + +Creating Dataloaders +******************** + +Finally, Datasets can also return either a ``tf.data.Datasets`` or ``torch.utils.data.Dataloader`` object to quickly and easily create a deep learning dataset ready to be used as model input, with the :meth:`Dataset.tensorflow` and :meth:`Dataset.torch` methods, respectively. See :ref:`dataloaders` for more detailed information and examples. + +Datasets have many other utility functions for working with and processing data. Read more in the :ref:`Dataset API documentation `. \ No newline at end of file diff --git a/docs/_sources/evaluation.rst.txt b/docs/_sources/evaluation.rst.txt index c0b4dc2f2..ea5d06af4 100644 --- a/docs/_sources/evaluation.rst.txt +++ b/docs/_sources/evaluation.rst.txt @@ -1,48 +1,160 @@ +.. _evaluation: + Evaluation ========== -In addition to examining cross-validation training performance, model performance can be assessed with external dataset evaluation, and visualization of predictions across evaluation slides in the form of a heatmap. +Slideflow includes several tools for evaluating trained models. In the next sections, we'll review how to evaluate a model on a held-out test set, generate predictions without ground-truth labels, and visualize predictions with heatmaps. -Model evaluation -**************** +Evaluating a test set +********************* -Once training and hyperparameter tuning is complete, you can test model performance on your held-out evaluation set using the ``evaluate`` function. Specify the path to the saved with the ``model`` argument. For example: +The :meth:`slideflow.Project.evaluate` provides an easy interface for evaluating model performance on a held-out test set. Locate the saved model to evaluate (which will be in the project ``models/`` folder). :ref:`As with training `, the dataset to evaluate can be specified using either the ``filters`` or ``dataset`` arguments. If neither is provided, all slides in the project will be evaluated. .. code-block:: python + # Method 1: specifying filters P.evaluate( model="/path/to/trained_model_epoch1", - outcomes="category", - filters={"dataset": ["eval"]} + outcomes="tumor_type", + filters={"dataset": ["test"]} ) -.. autofunction:: slideflow.Project.evaluate - :noindex: + # Method 2: specify a dataset + dataset = P.dataset(tile_px=299, tile_um='10x') + test_dataset = dataset.filter({"dataset": ["test"]}) + P.evaluate( + model="/path/to/trained_model_epoch1", + outcomes="tumor_type", + dataset=test_dataset + ) -Heatmaps -******** +Results are returned from the ``Project.evaluate()`` function as a dictionary and saved in the project evaluation directory. Tile-, slide-, and patient- level predictions are also saved in the corresponding project evaluation folder, ``eval/``. + +Generating predictions +********************** -To generate a predictive heatmap for a set of slides, use the ``generate_heatmaps()`` function as below, which will automatically save heatmap images in your project directory: +For a dataset +------------- + +:meth:`slideflow.Project.predict` provides an interface for generating model predictions on an entire dataset. As above, locate the saved model from which to generate predictions, and specify the dataset with either ``filters`` or ``dataset`` arguments. .. code-block:: python - P.generate_heatmaps( + dfs = P.predict( model="/path/to/trained_model_epoch1", - filters={"dataset": ["eval"]} + filters={"dataset": ["test"]} ) + print(dfs['patient']) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + patient ... cohort-y_pred1 + 0 TCGA-05-4244-01Z-00-DX1... ... 0.032608 + 1 TCGA-05-4245-01Z-00-DX1... ... 0.216634 + 2 TCGA-05-4249-01Z-00-DX1... ... 0.000858 + 3 TCGA-05-4250-01Z-00-DX1... ... 0.015915 + 4 TCGA-05-4382-01Z-00-DX1... ... 0.020700 + .. ... ... ... + 936 TCGA-O2-A52S-01Z-00-DX1... ... 0.983500 + 937 TCGA-O2-A52V-01Z-00-DX1... ... 0.773328 + 938 TCGA-O2-A52W-01Z-00-DX1... ... 0.858558 + 939 TCGA-S2-AA1A-01Z-00-DX1... ... 0.000212 + 940 TCGA-XC-AA0X-01Z-00-DX1... ... 0.632612 + +Results are returned as a dictionary of pandas DataFrames (with the keys ``'tile'``, ``'slide'``, and ``'patient'`` for each level of prediction) and saved in the project evaluation directory, ``eval/``. + +For a single slide +------------------ + +You can also generate predictions for a single slide with either :func:`slideflow.slide.predict` or :meth:`slideflow.WSI.predict`. + +.. code-block:: python + + import slideflow as sf + + slide = '/path/to/slide.svs' + model = '/path/to/model_epoch1' + sf.slide.predict(slide, model) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + array([0.84378019, 0.15622007]) + +The returned array has the shape ``(num_classes,)``, indicating the whole-slide prediction for each outcome category. If the model was trained with uncertainty quantification, this function will return two arrays; the first with predictions, the second with estimated uncertainty. + +.. _generate_heatmaps: + +Heatmaps +******** -.. autofunction:: slideflow.Project.generate_heatmaps - :noindex: +For a dataset +------------- -If you would like to directly interact with the calculated heatmap data, create a :class:`slideflow.Heatmap` object by providing a path to a slide, a path to a model, and tile size information: +Predictive heatmaps can be created for an entire dataset using :meth:`slideflow.Project.generate_heatmaps`. Heatmaps will be saved and exported in the project directory. See the linked API documentation for arguments and customization. .. code-block:: python - from slideflow import Heatmap + P.generate_heatmaps(model="/path/to/trained_model_epoch1") - heatmap = Heatmap( +For a single slide +------------------ + +:class:`slideflow.Heatmap` provides more granular control for calculating and displaying a heatmap for a given slide. The required arguments are: + +- ``slide``: Either a path to a slide, or a :class:`slideflow.WSI` object. +- ``model``: Path to a saved Slideflow model. + +Additional keyword arguments can be used to customize and optimize the heatmap. In this example, we'll increase the batch size to 64 and allow multiprocessing by setting ``num_processes`` equal to our CPU core count, 16. + +.. code-block:: python + + heatmap = sf.Heatmap( slide='/path/to/slide.svs', model='/path/to/model' + batch_size=64, + num_processes=16 ) -The spatial map of logits, as calculated across the input slide, can be accessed through ``heatmap.logits``. The spatial map of post-convolution, penultimate activations can be accessed through ``heatmap.postconv``. The heatmap can be saved with ``heatmap.save('/path/')``. \ No newline at end of file +If ``slide`` is a :class:`slideflow.WSI`, the heatmap will be calculated only within non-masked areas and ROIs, if applicable. + +.. code-block:: python + + from slideflow.slide import qc + + # Prepare the slide + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302, rois='/path') + wsi.qc([qc.Otsu(), qc.Gaussian()]) + + # Generate a heatmap + heatmap = sf.Heatmap( + slide=wsi, + model='/path/to/model' + batch_size=64, + num_processes=16 + ) + +If ``slide`` is a path to a slide, Regions of Interest can be provided through the optional ``roi_dir`` or ``rois`` arguments. + +Once generated, heatmaps can be rendered and displayed (ie. in a Jupyter notebook) with :meth:`slideflow.Heatmap.plot`. + +.. code-block:: python + + heatmap.plot(class_idx=0, cmap='inferno') + +Insets showing zoomed-in portions of the heatmap can be added with :meth:`slideflow.Heatmap.add_inset`: + +.. code-block:: python + + heatmap.add_inset(zoom=20, x=(10000, 10500), y=(2500, 3000), loc=1, axes=False) + heatmap.add_inset(zoom=20, x=(12000, 12500), y=(7500, 8000), loc=3, axes=False) + heatmap.plot(class_idx=0, mpp=1) + +.. image:: heatmap_inset.jpg + +| + +Save rendered heatmaps for each outcome category with :meth:`slideflow.Heatmap.save`. The spatial map of predictions, as calculated across the input slide, can be accessed through ``Heatmap.predictions``. You can save the numpy array with calculated predictions (and uncertainty, if applicable) as an \*.npz file using :meth:`slideflow.Heatmap.save_npz`. \ No newline at end of file diff --git a/docs/_sources/extract_tiles.rst.txt b/docs/_sources/extract_tiles.rst.txt deleted file mode 100644 index b421288b3..000000000 --- a/docs/_sources/extract_tiles.rst.txt +++ /dev/null @@ -1,124 +0,0 @@ -.. _filtering: - -Tile extraction -=============== - -The next step is tile extraction, which is accomplished using the ``extract_tiles()`` function. The only arguments required are ``tile_px`` and ``tile_um``, which determine the size of the extracted tiles in pixels and microns, respectively: - -.. code-block:: python - - P.extract_tiles(tile_px=299, tile_um=302) - -To filter according to a columns in your annotations file, pass a dictionary to ``filters``, with keys equal to column names and values equal to a list of all acceptable values you want to include. If this argument is not supplied, all valid slides will be extracted. - -For example, to extract tiles only for slides that are labeled as "train" in the "dataset" column header in your annotations file, do: - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - filters={"dataset": ["train"]} - ) - -To further filter by the annotation header "mutation_status", including only slides with the category "braf" or "ras", do: - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - filters={ - "dataset": ["train"], - "mutation_status": ["braf", "ras"] - } - ) - -.. note:: - The ``filters`` argument can be also used for filtering input slides in many slideflow functions, including ``train()``, ``evaluate()``, ``generate_heatmaps()``, and ``generate_mosaic()``. - -Tiles will be extracted at the specified pixel and micron size. Tiles will be automatically stored in TFRecord format, although loose tiles can also be saved by passing ``save_tiles=True``. - -The full documentation for the ``extract_tiles`` function is given below: - -.. autofunction:: slideflow.Project.extract_tiles - :noindex: - -ROIs -**** - -By default, slides with valid ROIs will only have tiles extracted from within ROIs, and slides without ROIs will have tiles extracted across the whole-slide image. To skip slides that are missing ROIs, use ``skip_missing_roi=True``. To ignore ROIs entirely and extract tiles from whole-slide images, pass ``roi_method='ignore'``. You can alternatively extract *outside* the annotated ROIs by passing ``roi_method='outside'``. - -Stain Normalization -******************* - -Tiles can be normalized to account for differing strengths of H&E staining, which has been shown to improve machine learning accuracy on some datasets. Several normalization algorithms exist, and none have shown clear superiority over the other. However, while tile normalization may improve training performance, some tiles and slides may be prone to artifacts as a result of normalization algorithms. - -If you choose to use normalization, you may either normalize images to an internal H&E-stained control image contained within the pipeline, or you may explicitly provide a reference image for normalization. - -Normalization can be performed at the time of tile extraction or in real-time during training. Real-time normalization adds CPU overhead and may increase training or inference times for some models, although it allows greater flexibility, as normalization strategies can be changed without re-extracting tiles from your entire dataset. - -To normalize tiles during tile extraction, use the ``normalizer`` and ``normalizer_source`` arguments; ``normalizer`` is the name of the algorithm to use. A path to a normalization reference image may optionally be provided through ``normalizer_source``. Available stain normalization algorithms include: - -- **macenko**: M. Macenko et al., ‘A method for normalizing histology slides for quantitative analysis’, *IEEE International Symposium on Biomedical Imaging: From Nano to Macro*, 2009, pp. 1107–1110. -- **vahadane**: A. Vahadane et al., ‘Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images’, *IEEE Transactions on Medical Imaging*, vol. 35, no. 8, pp. 1962–1971, Aug. 2016. -- **reinhard**: E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, ‘Color transfer between images’, *IEEE Computer Graphics and Applications*, vol. 21, no. 5, pp. 34–41, Sep. 2001. -- **reinhard_fast**: A modification of the Reinhard algorithm with the brightness standardization step removed for computational efficiency. - -.. code-block:: python - - P.extract_tiles( - tile_px=299, - tile_um=302, - normalizer='reinhard' - ) - -Alternatively, real-time normalization can be performed with all pipeline functions that process TFRecords. For example, real-time normalization during training is enabled by setting the appropriate hyperparameter: - -.. code-block:: python - - from slideflow.model import ModelParams - hp = ModelParams(..., normalizer='reinhard') - -If a normalizer was used during model training, the appropriate information will be stored in the model metadata file, `params.json`, located in the saved model folder. Any function within `slideflow` that uses this model will then process images using the same normalization strategy. - -Background filtering -******************** - -Slide background can be detected and filtered by two types of methods - tile-based methods and slide-based methods. - -Whitespace and grayspace filtering are two tile-based methods that detect the amount of whitespace or grayspace in a given tile, discarding the tile if the content exceeds a set threshold. Whitespace is calculated using overall brightness for each pixel, then counting the fraction of pixels with a brightness above some threshold. Grayspace is calculated by converting RGB images to the HSV spectrum, then counting the fraction of pixels with a saturation below some threshold. This filtering is performed separately for each tile as it is being extracted. Grayspace filtering is the default background filtering behavior. The arguments ``whitespace_fraction``, ``whitespace_threshold``, ``grayspace_fraction``, and ``grayspace_threshold`` are used for these methods, as described in the documentation for the tile extraction function (:func:`slideflow.Dataset.extract_tiles`). - -Alternatively, Otsu's thresholding can be performed on the lowest downsample level for a whole slide. This method generates a mask that identifies areas of foreground and marks areas of background to be discarded. Otsu's thresholding is performed in the HSV colorspace, and generally yields identical results to grayspace filtering. Otsu's thresholding is ~30% faster than grayspace filtering for slides with accessible downsample layers, but if downsample layers are not stored in a given slide or are inaccessible (e.g. ``enable_downsample=False``, which should be set for any system that does not have a patched pixman library), grayspace filtering will be significantly faster. To use Otsu's thresholding, set the argument ``qc='otsu'`` (and disable grayspace filtering by setting ``grayspace_threshold=1``) - -If you have pixman>0.38 and use slides with accessible downsample layers, Otsu's thresholding should be used. Otherwise, grayspace filtering will be faster. - -Quality control -*************** - -In addition to background filtering, additional blur-detection quality control can be used to identify out-of-focus areas, or areas with artifact. If annotated Regions of Interest (ROIs) are not available for your dataset, blur detection quality control should be enabled in order to ensure that high quality image tiles are extracted. If ROIs *are* available, it may be unnecessary. Blur detection may increase tile extraction time by 50% or more. - -To use blur detection QC, set ``qc='blur'`` (or ``qc='both'`` if also using Otsu's thresholding). - -If both Otsu's thresholding and blur detection are being used, Slideflow will automatically calculate Blur Burden, a metric used to assess the degree to which non-background tiles are either out-of-focus or contain artifact. In the tile extraction PDF report that is generated, the distribution of blur burden for slides in the dataset will be plotted on the first page. The report will contain the number of slides meeting criteria for warning, when the blur burden exceeds 5% for a given slide. A text file containing names of slides with high blur burden will be saved in the exported TFRecords directory. These slides should be manually reviewed to ensure they are of high enough quality to include in the dataset. - -Performance optimization -************************ - -The ``libvips`` library is used for all slide reading and tile extraction. As tile extraction is heavily reliant on random access reading, significant performance gains can be experienced by either 1) moving all slides to an SSD, or 2) utilizing an SSD or ramdisk buffer (to which slides will be copied prior to extraction). The use of a ramdisk buffer can improve tile extraction speed by 10-fold or greater! To maximize performance, pass the buffer path to the argument ``buffer``. - -Multiprocessing and multithreading is used during tile extraction to maximize performance efficiency. The number of process workers and threads per worker can be manually specified with ``num_workers`` and ``num_threads``, respectively. Optimal results are generally seen by setting ``num_workers=2`` and ``num_threads`` equal to the number of CPU cores available. Tile extraction speed scales linearly with CPU core availability. - -Extraction reports -****************** - -Once tiles have been extracted, a PDF report will be generated with a summary and sample of tiles extracted from their corresponding slides. An example of such a report is given below. It is generally good practice to review this report, as you may catch slides with data corruption, artifacts with stain normalization, or suboptimal whitespace/grayspace filtering. The report is saved in the project root directory. - -In addition to viewing reports after tile extraction, you may generate new reports on existing tfrecords with :func:`slideflow.Dataset.tfrecord_report`, by calling this function on a given dataset (see :ref:`dataset` for more information on datasets). For example: - -.. code-block:: python - - dataset = P.dataset(tile_px=299, tile_um=302) - dataset.tfrecord_report("/path/to/dest") - -You can also generate reports for slides that have not yet been extracted by passing ``dry_run=True`` to :meth:`slideflow.Dataset.extract_tiles`. \ No newline at end of file diff --git a/docs/_sources/features.rst.txt b/docs/_sources/features.rst.txt new file mode 100644 index 000000000..3c3072330 --- /dev/null +++ b/docs/_sources/features.rst.txt @@ -0,0 +1,485 @@ +.. _features: + +Generating Features +=================== + +Converting images into feature vectors is a common step for many machine learning tasks, including `feature space analysis `_ and `multiple-instance learning (MIL) `_. Slideflow provides a simple API for generating features from image tiles and includes several pretrained feature extractors. You can see a list of all available feature extractors with :func:`slideflow.list_extractors`. + +Generating Features +******************* + +The first step in generating features from a dataset of images is creating a feature extractor. Many types of feature extractors can be used, including imagenet-pretrained models, models finetuned in Slideflow, histology-specific pretrained feature extractors (ie. "foundation models"), or fine-tuned SSL models. In all cases, feature extractors are built with :func:`slideflow.build_feature_extractor`, and features are generated for a `Dataset `_ using :meth:`slideflow.Dataset.generate_feature_bags`, as described :ref:`below `. + +.. code-block:: python + + # Build a feature extractor + ctranspath = sf.build_feature_extractor('ctranspath') + + # Generate features for a dataset + dataset.generate_feature_bags(ctranspath, outdir='/path/to/features') + + +Pretrained Extractors +********************* + +Slideflow includes several pathology-specific feature extractors, also referred to as foundation models, pretrained on large-scale histology datasets. + +.. list-table:: **Pretrained feature extractors.** Note: "histossl" was renamed to "phikon" in Slideflow 3.0. + :header-rows: 1 + :widths: 14 10 8 8 8 14 28 10 + + * - Model + - Type + - WSIs + - Input size + - Dim + - Source + - Package + - Link + * - **Virchow** + - DINOv2 + - 1.5M + - 224 + - 2560 + - Paige + - ``slideflow`` + - `Paper `__ + * - **CTransPath** + - SRCL + - 32K + - 224 + - 768 + - Tencent AI Lab + - ``slideflow-gpl`` + - `Paper `__ + * - **RetCCL** + - CCL + - 32K + - 256 + - 2048 + - Tencent AI Lab + - ``slideflow-gpl`` + - `Paper `__ + * - **Phikon** + - iBOT + - 6.1K + - 224 + - 768 + - Owkin + - ``slideflow-noncommercial`` + - `Paper `__ + * - **PLIP** + - CLIP + - N/A + - 224 + - 512 + - Zhao Lab + - ``slideflow-noncommercial`` + - `Paper `__ + * - **UNI** + - DINOv2 + - 100K + - 224 + - 1024 + - Mahmood Lab + - ``slideflow-noncommercial`` + - `Paper `__ + * - **GigaPath** + - DINOv2 + - 170K + - 256 + - 1536 + - Microsoft + - ``slideflow-noncommercial`` + - `Paper `__ + + +In order to respect the original licensing agreements, pretrained models are distributed in separate packages. The core ``slideflow`` package provides access to models under the **Apache-2.0** license, while models under **GPL-3.0** are available in the ``slideflow-gpl`` package. Models restricted to non-commercial use are available under the **CC BY-NC 4.0** license through the ``slideflow-noncommercial`` package. + +Loading weights +--------------- + +Pretrained feature extractors will automatically download their weights from Hugging Face upon creation. Some models, such as PLIP, GigaPath, UNI, and Phikon, require approval for access. Request approval on Hugging Face and ensure your local machine has been `authenticated `_. + +All pretrained models can also be loaded using local weights. Use the ``weights`` argument when creating a feature extractor. + +.. code-block:: python + + # Load UNI with local weights + uni = sf.build_feature_extractor('uni', weights='../pytorch_model.bin') + +Image preprocessing +------------------- + +Each feature extractor includes a default image preprocessing pipeline that matches the original implementation. However, preprocessing can also be manually adjusted using various keyword arguments when creating a feature extractor. + +- **resize**: ``int`` or ``bool``. If an ``int``, resizes images to this size. If ``True``, resizes images to the input size of the feature extractor. Default is ``False``. +- **center_crop**: ``int`` or ``bool``. If an ``int``, crops images to this size. If ``True``, crops images to the input size of the feature extractor. Center-cropping happens after resizing, if both are used. Default is ``False``. +- **interpolation**: ``str``. Interpolation method for resizing images. Default is ``bilinear`` for most models, but is ``bicubic`` for GigaPath and Virchow. +- **antialias**: ``bool``. Whether to apply antialiasing to resized images. Default is ``False`` (matching the default behavior of torchvision < 0.17). +- **norm_mean**: ``list``. Mean values for image normalization. Default is ``[0.485, 0.456, 0.406]`` for all models except PLIP. +- **norm_std**: ``list``. Standard deviation values for image normalization. Default is ``[0.229, 0.224, 0.225]`` for all models except PLIP. + + +Example: + +.. code-block:: python + + # Load a feature extractor with custom preprocessing + extractor = sf.build_feature_extractor( + 'ctranspath', + resize=224, + interpolation='bicubic', + antialias=True + ) + +Default values for these processing arguments are determined by the feature extractor. One notable exception to the standard preprocessing algorithm is GigaPath, for which images are resized first (default to 256x256) and then center cropped (default to 224x224), which mirrors the official implementation. + +For transparency, you can see the current preprocessing pipeline with ``extractor.transform``: + +.. code-block:: python + + >>> import slideflow as sf + >>> ctranspath = sf.build_feature_extractor( + ... 'ctranspath', + ... resize=256, + ... interpolation='bicubic', + ... center_crop=224 + ... ) + >>> ctranspath.transform + Compose( + CenterCrop(size=(224, 224)) + Resize(size=256, interpolation=bicubic, max_size=None, antialias=False) + Lambda() + Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)) + ) + + +GigaPath +-------- + +GigaPath is a DINOv2-based model from Microsoft/Providence trained on 170k whole-slide images and is bundled with ``slideflow-noncommercial``. The GigaPath model includes additional dependencies which are not broadly compatible with all OS distributions, and are thus not installed by default. To install the GigaPath dependencies: + +.. code-block:: bash + + pip install slideflow-noncommercial[gigapath] git+ssh://git@github.com/prov-gigapath/prov-gigapath + + +GigaPath has two stages: a tile encoder and slide-level encoder. The tile encoder (``"gigapath.tile"``) works the same as all other feature extractors in Slideflow. You can build this encoder directly: + +.. code-block:: python + + # Build the tile encoder + gigapath_tile = sf.build_feature_extractor("gigapath.tile") + + # Use the tile encoder + project.generate_feature_bags(gigapath_tile, ...) + + +or you can build the combined tile+slide model, and then use ``gigapath.tile``: + +.. code-block:: python + + # Build the tile encoder + gigapath = sf.build_feature_extractor("gigapath") + + # Use the tile encoder + project.generate_feature_bags(gigapath.tile, ...) + +As there are two stages to GigaPath, there are also separate model weights. As with other pretrained feature extractors, the weights will be auto-downloaded from Hugging Face upon first use if you are logged into Hugging Face and have been granted access to the repository. If you have manually downloaded the weights, these can be used with the following: + +.. code-block:: python + + # Example of how to supply tile + slide weights + # For the full GigaPath model + gigapath = sf.build_feature_extractor( + 'gigapath', + tile_encoder_weights='../pytorch_model.bin', + slide_encoder_weights='../slide_encoder.pth' + ) + + # Or, just supply the tile weights + gigapath_tile = sf.build_feature_extractor( + 'gigapath.tile', + weights='pytorch_model.bin' + ) + + +Once feature bags have been generated and saved with the GigaPath tile encoder, you can then generate slide-level embeddings with ``gigapath.slide``: + +.. code-block:: python + + # Load GigaPath + gigapath = sf.build_feature_extractor('gigapath') + + # Generate tile-level features + project.generate_feature_bags(gigapath.tile, ..., outdir='/gigapath_bags') + + # Generate slide-level embeddings + gigapath.slide.generate_and_save('/gigapath_bags', outdir='/gigapath_embeddings') + +In addition to running the tile and slide encoder steps separately, you can also run the combined pipeline all at once on a whole-slide image, generating a final slide-level embedding. + +.. code-block:: python + + # Load GigaPath + gigapath = sf.build_feature_extractor('gigapath') + + # Load slide + wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128) + + # Generate slide embedding + embedding = gigapath(wsi) + + +ImageNet Features +***************** + +To calculate features from an ImageNet-pretrained network, first build an imagenet feature extractor with :func:`slideflow.build_feature_extractor`. The first argument should be the name of an architecture followed by ``_imagenet``, and the expected tile size should be passed to the keyword argument ``tile_px``. You can optionally specify the layer from which to generate features with the ``layers`` argument; if not provided, it will default to calculating features from post-convolutional layer activations. For example, to build a ResNet50 feature extractor for images at 299 x 299 pixels: + +.. code-block:: python + + resnet50 = sf.build_feature_extractor( + 'resnet50_imagenet', + tile_px=299 + ) + +This will calculate features using activations from the post-convolutional layer. You can also concatenate activations from multiple neural network layers and apply pooling for layers with 2D output shapes. + +.. code-block:: python + + resnet50 = sf.build_feature_extractor( + 'resnet50_imagenet', + layers=['conv1_relu', 'conv3_block1_2_relu'], + pooling='avg', + tile_px=299 + ) + +If a model architecture is available in both the Tensorflow and PyTorch backends, Slideflow will default to using the active backend. You can manually set the feature extractor backend using ``backend``. + +.. code-block:: python + + # Create a PyTorch feature extractor + extractor = sf.build_feature_extractor( + 'resnet50_imagenet', + layers=['layer2.0.conv1', 'layer3.1.conv2'], + pooling='avg', + tile_px=299, + backend='torch' + ) + +You can view all available feature extractors with :func:`slideflow.model.list_extractors`. + +Layer Activations +***************** + +You can also calculate features from any model trained in Slideflow. The first argument to ``build_feature_extractor()`` should be the path of the trained model. You can optionally specify the layer at which to calculate activations using the ``layers`` keyword argument. If not specified, activations are calculated at the post-convolutional layer. + +.. code-block:: python + + # Calculate features from trained model. + features = build_feature_extractor( + '/path/to/model', + layers='sepconv3_bn' + ) + +Self-Supervised Learning +************************ + +Finally, you can also generate features from a trained :ref:`self-supervised learning ` model (either `SimCLR `_ or `DinoV2 `_). + +For SimCLR models, use ``'simclr'`` as the first argument to ``build_feature_extractor()``, and pass the path to a saved model (or saved checkpoint file) via the keyword argument ``ckpt``. + +.. code-block:: python + + simclr = sf.build_feature_extractor( + 'simclr', + ckpt='/path/to/simclr.ckpt' + ) + +For DinoV2 models, use ``'dinov2'`` as the first argument, and pass the model configuration YAML file to ``cfg`` and the teacher checkpoint weights to ``weights``. + +.. code-block:: python + + dinov2 = sf.build_feature_extractor( + 'dinov2', + weights='/path/to/teacher_checkpoint.pth', + cfg='/path/to/config.yaml' + ) + + + +Custom Extractors +***************** + +Slideflow also provides an API for integrating your own custom, pretrained feature extractor. See :ref:`custom_extractors` for additional information. + +.. _bags: + +Exporting Features +****************** + +Feature bags +------------ + +Once you have prepared a feature extractor, features can be generated for a dataset and exported to disk for later use. Pass a feature extractor to the first argument of :meth:`slideflow.Project.generate_feature_bags`, with a :class:`slideflow.Dataset` as the second argument. + +.. code-block:: python + + # Load a project and dataset. + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Create a feature extractor. + ctranspath = sf.build_feature_extractor('ctranspath', resize=True) + + # Calculate & export feature bags. + P.generate_feature_bags(ctranspath, dataset) + +.. note:: + + If you are generating features from a SimCLR model trained with stain normalization, + you should specify the stain normalizer using the ``normalizer`` argument to :meth:`slideflow.Project.generate_feature_bags` or :class:`slideflow.DatasetFeatures`. + +Features are calculated for slides in batches, keeping memory usage low. By default, features are saved to disk in a directory named ``pt_files`` within the project directory, but you can override the destination directory using the ``outdir`` argument. + +Alternatively, you can calculate features for a dataset using :class:`slideflow.DatasetFeatures` and the ``.to_torch()`` method. This will calculate features for your entire dataset at once, which may require a large amount of memory. The first argument should be the feature extractor, and the second argument should be a :class:`slideflow.Dataset`. + +.. code-block:: python + + # Calculate features for the entire dataset. + features = sf.DatasetFeatures(ctranspath, dataset) + + # Export feature bags. + features.to_torch('/path/to/bag_directory/') + + +.. warning:: + + Using :class:`slideflow.DatasetFeatures` directly may result in a large amount of memory usage, particularly for sizable datasets. When generating feature bags for training MIL models, it is recommended to use :meth:`slideflow.Project.generate_feature_bags` instead. + +Feature "bags" are PyTorch tensors of features for all images in a slide, saved to disk as ``.pt`` files. These bags are used to train MIL models. Bags can be manually loaded and inspected using :func:`torch.load`. + +.. code-block:: python + + >>> import torch + >>> bag = torch.load('/path/to/bag.pt') + >>> bag.shape + torch.Size([2310, 768]) + >>> bag.dtype + torch.float32 + +When image features are exported for a dataset, the feature extractor configuration is saved to ``bags_config.json`` in the same directory as the exported features. This configuration file can be used to rebuild the feature extractor. An example file is shown below. + +.. code-block:: json + + { + "extractor": { + "class": "slideflow.model.extractors.ctranspath.CTransPathFeatures", + "kwargs": { + "center_crop": true + } + }, + "normalizer": { + "method": "macenko", + "fit": { + "stain_matrix_target": [ + [ + 0.5062568187713623, + 0.22186939418315887 + ], + [ + 0.7532230615615845, + 0.8652154803276062 + ], + [ + 0.4069173336029053, + 0.42241501808166504 + ] + ], + "target_concentrations": [ + 1.7656903266906738, + 1.2797492742538452 + ] + } + }, + "num_features": 2048, + "tile_px": 299, + "tile_um": 302 + } + +The feature extractor can be manually rebuilt using :func:`slideflow.model.rebuild_extractor()`: + +.. code-block:: python + + from slideflow.model import rebuild_extractor + + # Recreate the feature extractor + # and stain normalizer, if applicable + extractor, normalizer = rebuild_extractor('/path/to/bags_config.json') + + +From a TFRecord +--------------- + +In addition to generating and exporting feature bags for a dataset, features can also be generated from a single TFRecord file. This may be useful for debugging or testing purposes. + +.. code-block:: python + + import slideflow as sf + + # Create a feature extractor + ctranspath = sf.build_feature_extractor('ctranspath') + + # Bags is a tensor of shape (n_tiles, n_features) + # Coords is a tensor of shape (n_tiles, 2), containing x/y tile coordinates. + bags, coords = ctranspath('file.tfrecords') + + +From a whole-slide image +------------------------ + +Feature extractors can also create features from a whole-slide image. This is useful for single-slide analysis, MIL inference, and other tasks where features are needed for the entire slide. Features are returned as a 3D tensor, with shape ``(width, height, n_features)``, reflecting the spatial arrangement of features for tiles across the image. + +.. code-block:: python + + # Load a feature extractor. + ctranspath = sf.build_feature_extractor('ctranspath') + + # Load a whole-slide image. + wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128) + + # Generate features for the whole slide. + # Shape: (width, height, n_features) + features = ctranspath(wsi) + + +Mixed precision +--------------- + +All feature extractors will use mixed precision by default. This can be disabled by setting the ``mixed_precision`` argument to ``False`` when creating the feature extractor. + +.. code-block:: python + + # Load a feature extractor without mixed precision + extractor = sf.build_feature_extractor('ctranspath', mixed_precision=False) + + +License & Citation +------------------ + +Licensing and citation information for the pretrained feature extractors is accessible with the ``.license`` and ``.citation`` attributes. + +.. code-block:: python + + >>> ctranspath.license + 'GNU General Public License v3.0' + >>> print(ctranspath.citation) + + @{wang2022, + title={Transformer-based Unsupervised Contrastive Learning for Histopathological Image Classification}, + author={Wang, Xiyue and Yang, Sen and Zhang, Jun and Wang, Minghui and Zhang, Jing and Yang, Wei and Huang, Junzhou and Han, Xiao}, + journal={Medical Image Analysis}, + year={2022}, + publisher={Elsevier} + } diff --git a/docs/_sources/gan.rst.txt b/docs/_sources/gan.rst.txt new file mode 100644 index 000000000..05dabf5f7 --- /dev/null +++ b/docs/_sources/gan.rst.txt @@ -0,0 +1,21 @@ +.. currentmodule:: slideflow.gan + +slideflow.gan +============= + +.. automodule:: slideflow.gan + :members: + +See :ref:`stylegan` for more information on working with GANs. + +StyleGAN2 Interpolator +---------------------- + +.. autoclass:: StyleGAN2Interpolator + :inherited-members: + +Utility functions +----------------- + +.. automodule:: slideflow.gan.utils + :members: \ No newline at end of file diff --git a/docs/_sources/grad.rst.txt b/docs/_sources/grad.rst.txt new file mode 100644 index 000000000..f6cbb7170 --- /dev/null +++ b/docs/_sources/grad.rst.txt @@ -0,0 +1,25 @@ +.. currentmodule:: slideflow.grad + +slideflow.grad +============== + +This submodule contains tools for calculating and display pixel attribution, or +saliency, maps. See :ref:`saliency` for more information. + +.. autoclass:: SaliencyMap + :inherited-members: + +.. automodule:: slideflow.grad + :members: + +.. autofunction:: comparison_plot + +.. autofunction:: inferno + +.. autofunction:: multi_plot + +.. autofunction:: oranges + +.. autofunction:: overlay + +.. autofunction:: saliency_map_comparison \ No newline at end of file diff --git a/docs/_sources/heatmap.rst.txt b/docs/_sources/heatmap.rst.txt index 2c6a29c25..a4a6804f5 100644 --- a/docs/_sources/heatmap.rst.txt +++ b/docs/_sources/heatmap.rst.txt @@ -1,14 +1,22 @@ .. currentmodule:: slideflow -slideflow.heatmap -===================== +slideflow.Heatmap +================= -:class:`slideflow.Heatmap` uses a model to generate predictions across a whole-slide image through -progressive convolution. These prediction heatmaps can be interactively displayed or saved for later use. - -.. automodule: slideflow.heatmap +.. autoclass:: Heatmap -Heatmap +Methods ------- -.. autoclass:: Heatmap - :inherited-members: \ No newline at end of file + +.. autofunction:: slideflow.Heatmap.add_inset +.. autofunction:: slideflow.Heatmap.clear_insets +.. autofunction:: slideflow.Heatmap.generate +.. autofunction:: slideflow.Heatmap.load +.. autofunction:: slideflow.Heatmap.load_npz +.. autofunction:: slideflow.Heatmap.plot +.. autofunction:: slideflow.Heatmap.plot_thumbnail +.. autofunction:: slideflow.Heatmap.plot_with_logit_cmap +.. autofunction:: slideflow.Heatmap.plot_uncertainty +.. autofunction:: slideflow.Heatmap.save +.. autofunction:: slideflow.Heatmap.save_npz +.. autofunction:: slideflow.Heatmap.view \ No newline at end of file diff --git a/docs/_sources/index.rst.txt b/docs/_sources/index.rst.txt index 8be9286f0..cd172de66 100644 --- a/docs/_sources/index.rst.txt +++ b/docs/_sources/index.rst.txt @@ -7,44 +7,77 @@ Slideflow Documentation ======================= -``slideflow`` is a Python package that provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch. +Slideflow is a Python package that provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch. -Slideflow includes tools for efficient whole-slide image processing, easy and highly customizable model training with uncertainty quantification (UQ), and a number of functional tools to assist with analysis and interpretability, including predictive heatmaps, mosaic maps, and more. It is built with both `Tensorflow/Keras `_ and `PyTorch `_ backends, with fully cross-compatible TFRecord data storage. +Slideflow includes tools for efficient whole-slide image processing, easy and highly customizable model training with uncertainty quantification (UQ), and a number of functional tools to assist with analysis and interpretability, including predictive heatmaps, mosaic maps, GANs, saliency maps, and more. It is built with both `Tensorflow/Keras `_ and `PyTorch `_ backends, with fully cross-compatible TFRecord data storage. -The ``slideflow`` package includes a ``Project`` class to help coordinate project organization and supervise execution of the pipeline. This documentation starts with a high-level overview of the pipeline, and will include examples of how to execute functions using the ``Project`` class. We also provide several tutorials with examples of how Slideflow can be used on your own data. +This documentation starts with a high-level overview of the pipeline and includes examples of how to perform common tasks using the ``Project`` helper class. We also provide several tutorials with examples of how Slideflow can be used and extended for additional functionality. .. toctree:: :maxdepth: 1 - :caption: Overview + :caption: Introduction installation - pipeline + overview + quickstart project_setup - validation - extract_tiles + datasets_and_val + slide_processing training evaluation - layer_activations - custom_loops + posthoc uq - clam + features + mil + ssl + stylegan + saliency + segmentation + cellseg + custom_loops + studio troubleshooting - appendix .. toctree:: :maxdepth: 1 - :caption: Source + :caption: Developer Notes + + tfrecords + dataloaders + custom_extractors + tile_labels + plugins + +.. toctree:: + :maxdepth: 1 + :caption: API + slideflow project dataset + dataset_features heatmap + model_params + mosaic + slidemap + biscuit + slideflow_cellseg + io io_tensorflow io_torch + gan + grad + mil_module model - mosaic + model_tensorflow + model_torch + norm + simclr slide + slide_qc stats util + studio_module .. toctree:: :maxdepth: 1 @@ -54,4 +87,7 @@ The ``slideflow`` package includes a ``Project`` class to help coordinate projec tutorial2 tutorial3 tutorial4 - tutorial5 \ No newline at end of file + tutorial5 + tutorial6 + tutorial7 + tutorial8 \ No newline at end of file diff --git a/docs/_sources/installation.rst.txt b/docs/_sources/installation.rst.txt index 9070fb5d3..8ff81844f 100644 --- a/docs/_sources/installation.rst.txt +++ b/docs/_sources/installation.rst.txt @@ -1,79 +1,125 @@ Installation ============ -Slideflow has been tested and is supported on the following systems: +.. figure:: https://github.com/user-attachments/assets/53d5c1f8-8fbc-4e0f-bd62-db16797492b0 -- Ubuntu 18.04 -- Ubuntu 20.04 -- Centos 7 -- Centos 8 -- Centos 8 Stream +Slideflow is tested on **Linux-based systems** (Ubuntu, CentOS, Red Hat, and Raspberry Pi OS) and **macOS** (Intel and Apple). Windows support is experimental. -Software Requirements -********************* +Requirements +************ + +- Python >= 3.7 (<3.10 if using `cuCIM `_) +- `PyTorch `_ (1.9+) *or* `Tensorflow `_ (2.5-2.11) + - Core functionality, including tile extraction, data processing, and tile-based model training, is supported for both PyTorch and Tensorflow. Additional advanced tools, such as Multiple-Instance Learning (MIL), GANs, and pretrained foundation models, require PyTorch. + +Optional +-------- + +- `Libvips >= 8.9 `_ (alternative slide reader, adds support for \*.scn, \*.mrxs, \*.ndpi, \*.vms, and \*.vmu files) +- Linear solver (for site-preserved cross-validation): + + - `CPLEX 20.1.0 `_ with `Python API `_ + - *or* `Pyomo `_ with `Bonmin `_ solver -- Python 3.7 - 3.10 -- `OpenSlide `_ -- `Libvips 8.9+ `_ -- `CPLEX 20.1.0 `_ with `Python API `_ [*optional*] - used for preserved-site cross-validation -- `QuPath `_ [*optional*] - used for ROI annotations -- `Tensorflow 2.5-2.8 `_ or `PyTorch 1.9-1.11 `_ Download with pip ***************** +Slideflow can be installed either with PyPI or as a Docker container. To install via pip: + .. code-block:: bash # Update to latest pip - $ pip install --upgrade pip + pip install --upgrade pip wheel + + # Current stable release, Tensorflow backend + pip install slideflow[tf] cucim cupy-cuda11x + + # Alternatively, install with PyTorch backend + pip install slideflow[torch] cucim cupy-cuda11x + +The ``cupy`` package name depends on the installed CUDA version; `see here `_ for installation instructions. ``cucim`` and ``cupy`` are not required if using Libvips. - # Current stable release - $ pip install slideflow Run a Docker container ********************** -The `Slideflow docker images `_ have been pre-configured with OpenSlide, Libvips, and either PyTorch 1.11 or Tensorflow 2.8. Using a preconfigured `Docker `_ container is the easiest way to get started with compatible dependencies and GPU support. +Alternatively, pre-configured `docker images `_ are available with cuCIM, Libvips, and either PyTorch 1.11 or Tensorflow 2.9 pre-installed. Using a preconfigured `Docker `_ container is the easiest way to get started with compatible dependencies and GPU support. -To install with the Tensorflow 2.8 backend: +To run a Docker container with the Tensorflow backend: .. code-block:: bash - $ docker pull jamesdolezal/slideflow:latest-tf - $ docker run -it --gpus all jamesdolezal/slideflow:latest-tf + docker pull jamesdolezal/slideflow:latest-tf + docker run -it --gpus all jamesdolezal/slideflow:latest-tf -To install with the PyTorch 1.11 backend: +To run a Docker container with the PyTorch backend: .. code-block:: bash - $ docker pull jamesdolezal/slideflow:latest-torch - $ docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch + docker pull jamesdolezal/slideflow:latest-torch + docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch Build from source ***************** -To build Slideflow from source, clone the repository from the project `Github page `_: +To build Slideflow from source, clone the repository from the project `Github page `_: .. code-block:: bash - $ git clone https://github.com/jamesdolezal/slideflow - $ cd slideflow - $ pip install -r requirements.txt - $ python setup.py bdist_wheel - $ pip install dist/slideflow-1.X.X-py3-any.whl + git clone https://github.com/slideflow/slideflow + cd slideflow + conda env create -f environment.yml + conda activate slideflow + python setup.py bdist_wheel + pip install dist/slideflow* cupy-cuda11x -.. warning:: - A bug in the pixman library (version=0.38) will corrupt downsampled slide images, resulting in large black boxes across the slide. We have provided a patch for version 0.38 that has been tested for Ubuntu, which is provided in the project `Github page `_ (``pixman_repair.sh``), although it may not be suitable for all environments and we make no guarantees regarding its use. The `Slideflow docker images `_ already have this applied. If you are installing from source, have pixman version 0.38, and are unable to apply this patch, the use of downsampled image layers must be disabled to avoid corruption (pass ``enable_downsample=False`` to tile extraction functions). -Changing backends +Extensions +********** + +The core Slideflow package is licensed under the **Apache-2.0** license. Additional functionality, such as pretrained foundation models, are distributed in separate packages according to their licensing terms. Available extensions include: + +- **Slideflow-GPL**: GPL-3.0 licensed extensions (`GitHub `__) + - Includes: `RetCCL `__, `CTransPath `__, and `CLAM `__. +- **Slideflow-NonCommercial**: CC BY-NC 4.0 licensed extensions for non-commercial use (`GitHub `__) + - Includes: `HistoSSL `__, `PLIP `__, `GigaPath `__, `UNI `__, `BISCUIT `__, and `StyleGAN3 `__. + +These extensions can be installed via pip. The GigaPath feature extractor has additional, more restrictive dependencies that must be installed separately. + +.. code-block:: bash + + # Install Slideflow-GPL and Slideflow-NonCommercial + pip install slideflow-gpl slideflow-noncommercial + + # Install GigaPath dependencies, if desired + pip install slideflow-noncommercial[gigapath] git+ssh://git@github.com/prov-gigapath/prov-gigapath + + +.. note:: + The Slideflow-GPL and Slideflow-NonCommercial extensions are not included in the default Slideflow package due to their licensing terms. Please review the licensing terms of each extension before use. + + +PyTorch vs. Tensorflow +********************** + +Slideflow supports both PyTorch and Tensorflow, with cross-compatible TFRecord storage. Slideflow will default to using PyTorch if both are available, but the backend can be manually specified using the environmental variable ``SF_BACKEND``. For example: + +.. code-block:: bash + + export SF_BACKEND=tensorflow + +.. _slide_backend: + +cuCIM vs. Libvips ***************** -The default backend for this package is Tensorflow/Keras, but a full PyTorch backend is also included, with a dedicated TFRecord reader/writer that ensures saved image tiles can be served to both Tensorflow and PyTorch models in cross-compatible fashion. +By default, Slideflow reads whole-slide images using `cuCIM `_. Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a `Libvips `_ backend, which adds support for \*.scn, \*.mrxs, \*.ndpi, \*.vms, and \*.vmu files. You can set the active slide backend with the environmental variable ``SF_SLIDE_BACKEND``: -If using the Tensorflow backend, PyTorch does not need to be installed; the reverse is true as well. +.. code-block:: bash -To switch backends, simply set the environmental variable ``SF_BACKEND`` equal to either ``torch`` or ``tensorflow``: + export SF_SLIDE_BACKEND=libvips -.. code-block:: console - export SF_BACKEND=torch \ No newline at end of file +.. warning:: + A bug in the pixman library (version=0.38) will corrupt downsampled slide images, resulting in large black boxes across the slide. We have provided a patch for version 0.38 that has been tested for Ubuntu, which is provided in the project `Github page `_ (``pixman_repair.sh``), although it may not be suitable for all environments and we make no guarantees regarding its use. The `Slideflow docker images `_ already have this applied. If you are installing from source, have pixman version 0.38, and are unable to apply this patch, the use of downsampled image layers must be disabled to avoid corruption (pass ``enable_downsample=False`` to tile extraction functions). diff --git a/docs/_sources/io.rst.txt b/docs/_sources/io.rst.txt new file mode 100644 index 000000000..7ce32cf68 --- /dev/null +++ b/docs/_sources/io.rst.txt @@ -0,0 +1,34 @@ +.. currentmodule:: slideflow.io + +slideflow.io +============ + +This module contains utility functions for working with TFRecords, cross-compatible +with both Tensorflow and PyTorch. + +Functions included in this module assist with processing TFRecords, detecting image and data format, +extracting tiles, splitting and merging TFrecords, and a variety of other manipulations. + +Additional Tensorflow-specific TFRecord reading/writing utility functions are +available in :py:mod:`slideflow.io.tensorflow`, and additional PyTorch-specific +functions are in :py:mod:`slideflow.io.torch`. + +.. autofunction:: convert_dtype +.. autofunction:: detect_tfrecord_format +.. autofunction:: extract_tiles +.. autofunction:: get_locations_from_tfrecord +.. autofunction:: get_tfrecord_by_index +.. autofunction:: get_tfrecord_by_location +.. autofunction:: get_tfrecord_parser +.. autofunction:: get_tfrecord_length +.. autofunction:: read_and_return_record +.. autofunction:: serialized_record +.. autofunction:: tfrecord_has_locations +.. autofunction:: update_manifest_at_dir +.. autofunction:: write_tfrecords_multi +.. autofunction:: write_tfrecords_single +.. autofunction:: write_tfrecords_merge + +slideflow.io.preservedsite +************************** +.. autofunction:: slideflow.io.preservedsite.generate_crossfolds \ No newline at end of file diff --git a/docs/_sources/io_tensorflow.rst.txt b/docs/_sources/io_tensorflow.rst.txt index 9fcd4ed6f..92dd7c2b8 100644 --- a/docs/_sources/io_tensorflow.rst.txt +++ b/docs/_sources/io_tensorflow.rst.txt @@ -3,17 +3,10 @@ slideflow.io.tensorflow ======================= -This module contains functions for processing TFRecords, including detecting contents and image format of saved -TFRecords, extracting tiles from TFRecords, splitting and merging TFRecrds, and a variety of other manipulations. - -The more important compontent of this module, however, is the :func:`slideflow.io.tensorflow.interleave` function, -which interleaves a set of tfrecords together into a :class:`tf.data.Datasets` object that can be used for training. -This interleaving can include patient or category-level balancing for returned batches (see :ref:`balancing`). +TFRecord interleaving in the Tensorflow backend is accomplished with :func:`slideflow.io.tensorflow.interleave`, which interleaves a set of tfrecords together into a :class:`tf.data.Datasets` object that can be used for training. This interleaving can include patient or category-level balancing for returned batches (see :ref:`balancing`). .. note:: - The TFRecord reading and interleaving implemented in this module is only compatible with Tensorflow models. - The :mod:`slideflow.io.torch` module includes an optimized, PyTorch-specific TFRecord reader based on a modified - version of the tfrecord reader/writer at: https://github.com/vahidk/tfrecord. + The TFRecord reading and interleaving implemented in this module is only compatible with Tensorflow models. The :mod:`slideflow.io.torch` module includes a PyTorch-specific TFRecord reader. .. automodule:: slideflow.io.tensorflow :members: \ No newline at end of file diff --git a/docs/_sources/io_torch.rst.txt b/docs/_sources/io_torch.rst.txt index 76ef6e8d1..3a77a0270 100644 --- a/docs/_sources/io_torch.rst.txt +++ b/docs/_sources/io_torch.rst.txt @@ -10,4 +10,9 @@ interleaving is supervised by :func:`slideflow.io.torch.interleave`, while the :func:`slideflow.io.torch.interleave_dataloader` function provides a PyTorch DataLoader object which can be directly used. .. automodule:: slideflow.io.torch - :members: \ No newline at end of file + :members: + :exclude-members: StyleGAN2Interleaver, TileLabelInterleaver, InterleaveIterator, IndexedInterleaver + +.. autoclass:: slideflow.io.torch.InterleaveIterator + +.. autoclass:: slideflow.io.torch.IndexedInterleaver \ No newline at end of file diff --git a/docs/_sources/layer_activations.rst.txt b/docs/_sources/layer_activations.rst.txt deleted file mode 100644 index 091df113d..000000000 --- a/docs/_sources/layer_activations.rst.txt +++ /dev/null @@ -1,136 +0,0 @@ -Features / layer activations -============================ - -Once a model has been fully trained and evaluated, you may use the model to generate features from layer activations to gain better insight into the kinds of image features the model has learned. - -Working with Layer Features -*************************** - -To work with features / intermediate layer activations calculated from a model, the :class:`slideflow.model.Features` class will generate features on a tile or slide level, and the :class:`slideflow.model.DatasetFeatures` class will generate features for an entire dataset. - -DatasetFeatures ---------------- - -The easiest way to get started with intermediate layer activations is the :class:`slideflow.model.DatasetFeatures` class, which is used to calculate and examine activations across an entire dataset. Instancing the class supervises the calculation and caching of layer activations, which can then be exported, viewed (as a mosaic map), or analyzed with various statistical methods. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class. - -.. code-block:: python - - features = P.generate_features('/path/to/trained_model') - -Alternatively, you can create an instance of this class directly: - -.. code-block:: python - - from slideflow.model import DatasetFeatures - - dataset = P.dataset(299, 302) - labels, unique_outcomes = dataset.labels('HPV') - - features = DatasetFeatures( - model='/path/to/trained_model', - dataset=dataset, - annotations=labels - ) - -Tile-level feature activations for each slide can be accessed directly from ``slideflow.model.DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Logits are stored in ``slideflow.model.DatasetFeatures.logits``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_logits)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``slideflow.model.DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``). - -To return the average logits value for each slide (averaged across constituent tiles), use :func:`slideflow.model.DatasetFeatures.logits_mean`. Similarly, :func:`slideflow.model.DatasetFeatures.logits_predict` can be used to generate final slide-level logit predictions. - -Features across categories can be statistically compared using :func:`slideflow.model.DatasetFeatures.stats`, which will calculate and save statistics to a specified directory. - -.. code-block:: python - - features.stats('/outdir', method='mean') - -To compare layer features across outcome categories and find features which differ significantly across categories, use the :func:`slideflow.model.DatasetFeatures.box_plots` function. For example, to generate boxplots for the first 100 features: - -.. code-block:: python - - features.box_plots(range(100), '/outdir') - -.. image:: boxplot_example.png - -Many other functions are available, as described in the documentation, :class:`slideflow.model.DatasetFeatures`. - -Features --------- - -The :class:`slideflow.model.Features` class can be used to generate layer activations / features for a single batch of images. For example, to calculate features for a batch of images while looping through a dataset: - -.. code-block:: python - - from slideflow.model import Features - - features = Features(layer='postconv') - for img_batch in dataset: - postconv_features = features(img_batch) - -You can choose to return features from any combination of intermediate layers by passing layer name(s) to the argument ``layer``. The interface can also return logits, by passing ``include_logits=True``. - -To calculate layer features across an entire slide, the same interface can be called on a :class:`slideflow.WSI` object, generating a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``: - -.. code-block:: python - - from slideflow import WSI - from slideflow.model import Features - - slide = WSI(...) - interface = Features('/model/path', layers='postconv') - feature_grid = interface(slide) - - -Mosaic maps -*********** - -To visualize the distribution of features across a dataset, a mosaic map can be created from a :class:`slideflow.model.DatasetFeatures` instance. Mosaic maps are generated by using features (layer activations) from a dataset, performing dimensionality reduction (UMAP) on the activations (via :class:`slideflow.SlideMap`), and overlaying tile images onto the UMAP (via :class:`slideflow.Mosaic`). By default, the post-convolutional ('postconv') layer is used when calculating features, but any combination of other layers can be also be used. The ``Project`` class has a function which can supervise these steps automatically and save the final figure to the project directory. - -.. code-block:: python - - features = P.generate_features('/path/to/trained_model') - mosaic = project.generate_mosaic(features) - mosaic.save('mosaic.png') - -.. autofunction:: slideflow.Project.generate_mosaic - :noindex: - -.. image:: mosaic_example.png - -To plot the underlying UMAP without overlaid images, the :class:`slideflow.SlideMap` used to create the mosaic map can be accesssed via ``slideflow.Mosaic.slide_map``. You can then use the :func:`slideflow.SlideMap.save` function to save the plot: - -.. code-block:: python - - mosaic = project.generate_mosaic(...) - mosiac.slide_map.save('umap.png') - -Tiles on the plot can be labeled using slide labels from the project annotations file, using the function :func:`slideflow.SlideMap.label_by_slide`. For example, the following will label the slide map according to the categorical outcome "HPV_status" in the project annotations file: - -.. code-block:: python - - # Get slide labels - dataset = project.dataset(tile_px=299, tile_um=302) - labels, unique_lables = dataset.labels('HPV_status') - - # Create the mosaic map and access the underlying SlideMap - mosaic = project.generate_mosaic(...) - - # Label the slide map with our outcome - mosiac.slide_map.label_by_slide(labels) - - # Save - mosiac.slide_map.save('umap_labeled.png') - -By default, all tiles in a dataset (which may be hundreds of thousands or millions of images) will be mapped onto the mosaic map. Instead of mapping all tiles within a slide, you can alternatively choose to map only a single tile per slide with the argument ``map_slide='centroid'``. This will calculate the tile nearest to centroid for each slide and display only this tile: - -.. code-block:: python - - # Create the mosaic map and access the underlying SlideMap - mosaic = project.generate_mosaic(..., map_slide='centroid') - -There are many additional arguments that can be provided to the :meth:`slideflow.Project.generate_mosaic()` function to customize the mosaic and UMAP plots, and many additional functions that can be applied to :class:`slideflow.Mosaic` and :class:`slideflow.SlideMap`. For example, it may be interesting to view a UMAP of tiles with an added third dimension, such as the activation value of a particular penultimate layer node. With this kind of plot, one can visualize how the activation of a particular node varies across the UMAP. To make such a plot, use the ``save_3d_plot`` function of the ``SlideMap``: - -.. code-block:: python - - mosaic = project.generate_mosaic(...) - mosiac.slide_map.save_3d_plot('3d_plot.png', feature=497) - -.. image:: 3d_umap.png diff --git a/docs/_sources/mil.rst.txt b/docs/_sources/mil.rst.txt new file mode 100644 index 000000000..87e58e6f9 --- /dev/null +++ b/docs/_sources/mil.rst.txt @@ -0,0 +1,332 @@ +.. _mil: + +Multiple-Instance Learning (MIL) +================================ + +In addition to standard tile-based neural networks, Slideflow also supports training multiple-instance learning (MIL) models. Several architectures are available, including `attention-based MIL `_ (``"Attention_MIL"``), `CLAM `_ (``"CLAM_SB",`` ``"CLAM_MB"``, ``"MIL_fc"``, ``"MIL_fc_mc"``), `TransMIL `_ (``"TransMIL"``), and `HistoBistro Transformer `_ (``"bistro.transformer"``). Custom architectures can also be trained. MIL training requires PyTorch. + +Skip to :ref:`tutorial8` for a complete example of MIL training. + +See :ref:`mil_api` for more information on the MIL API. + +Generating Features +******************* + +The first step in MIL model development is generating features from image tiles, as discussed in the :ref:`features` section. Features from whole-slide images are exported as "bags" of features, where each bag contains a set of features from a single slide. Each bag is a PyTorch tensor saved in ``*.pt`` format. Bags are saved in a directory, and the directory path is passed to the MIL model during training and evaluation. + +Training +******** + +Model Configuration +------------------- + +To train an MIL model using exported features, first prepare an MIL configuration using :func:`slideflow.mil.mil_config`. + +The first argument to this function is the model architecture (which can be a name or a custom ``torch.nn.Module`` model), and the remaining arguments are used to configure the training process, such as learning rate and number of epochs. Training is executed using `FastAI `_ with `1cycle learning rate scheduling `_. + +.. code-block:: python + + import slideflow as sf + from slideflow.mil import mil_config + + config = mil_config('attention_mil', lr=1e-3) + +Available models out-of-the-box include `attention-based MIL `_ (``"Attention_MIL"``), `transformer MIL `_ (``"TransMIL"``), and `HistoBistro Transformer `_ (``"bistro.transformer"``). `CLAM `_ (``"CLAM_SB",`` ``"CLAM_MB"``, ``"MIL_fc"``, ``"MIL_fc_mc"``) models are available through ``slideflow-gpl``: + +.. code-block:: bash + + pip install slideflow-gpl + +Custom MIL models can also be trained with this API, as discussed `below `_. + + +Classification & Regression +--------------------------- + +MIL models can be trained for both classification and regression tasks. The type of outcome is determined through the loss function, which defaults to ``"cross_entropy"``. To train a model for regression, set the loss function to one of the following regression losses, and ensure that your outcome labels are continuous. You can also train to multiple outcomes by passing a list of outcome names. + +- **"mse"** (``nn.CrossEntropyLoss``): Mean squared error. +- **"mae"** (``nn.L1Loss``): Mean absolute error. +- **"huber"** (``nn.SmoothL1Loss``): Huber loss. + +.. code-block:: python + + # Prepare a regression-compatible MIL configuration + config = mil_config('attention_mil', lr=1e-3, loss='mse') + + # Train the model + project.train_mil( + config=config, + ..., + outcomes=['age', 'grade'] + ) + + +Training an MIL Model +--------------------- + +Next, prepare a :ref:`training and validation dataset ` and use :func:`slideflow.Project.train_mil` to start training. For example, to train a model using three-fold cross-validation to the outcome "HPV_status": + +.. code-block:: python + + ... + + # Prepare a project and dataset + P = sf.Project(...) + full_dataset = dataset = P.dataset(tile_px=299, tile_um=302) + + # Split the dataset using three-fold, site-preserved cross-validation + splits = full_dataset.kfold_split( + k=3, + labels='HPV_status', + preserved_site=True + ) + + # Train on each cross-fold + for train, val in splits: + P.train_mil( + config=config, + outcomes='HPV_status', + train_dataset=train, + val_dataset=val, + bags='/path/to/bag_directory' + ) + +Model training statistics, including validation performance (AUROC, AP) and predictions on the validation dataset, will be saved in an ``mil`` subfolder within the main project directory. + +If you are training an attention-based MIL model (``attention_mil``, ``clam_sb``, ``clam_mb``), heatmaps of attention can be generated for each slide in the validation dataset by using the argument ``attention_heatmaps=True``. You can customize these heatmaps with ``interpolation`` and ``cmap`` arguments to control the heatmap interpolation and colormap, respectively. + +.. code-block:: python + + # Generate attention heatmaps, + # using the 'magma' colormap and no interpolation. + P.train_mil( + attention_heatmaps=True, + cmap='magma', + interpolation=None + ) + +Hyperparameters, model configuration, and feature extractor information is logged to ``mil_params.json`` in the model directory. This file also contains information about the input and output shapes of the MIL network and outcome labels. An example file is shown below. + +.. code-block:: json + + { + "trainer": "fastai", + "params": { + + }, + "outcomes": "histology", + "outcome_labels": { + "0": "Adenocarcinoma", + "1": "Squamous" + }, + "bags": "/mnt/data/projects/example_project/bags/simclr-263510/", + "input_shape": 1024, + "output_shape": 2, + "bags_encoder": { + "extractor": { + "class": "slideflow.model.extractors.simclr.SimCLR_Features", + "kwargs": { + "center_crop": false, + "ckpt": "/mnt/data/projects/example_project/simclr/00001-EXAMPLE/ckpt-263510.ckpt" + } + }, + "normalizer": null, + "num_features": 1024, + "tile_px": 299, + "tile_um": 302 + } + } + +.. _multimag: + +Multi-Magnification MIL +----------------------- + +Slideflow 2.2 introduced a multi-magnification, multi-modal MIL model, ``MultiModal_Attention_MIL`` (``"mm_attention_mil"``). This late-fusion multimodal model is based on standard attention-based MIL, but accepts multiple input modalities (e.g., multiple magnifications) simultaneously. Each input modality is processed by a separate encoder network and a separate attention module. The attention-weighted features from each modality are then concatenated and passed to a fully-connected layer. + +Multimodal models are trained using the same API as standard MIL models. Modalities are specified using the ``bags`` argument to :func:`slideflow.Project.train_mil`, where the number of modes is determined by the number of bag directories provided. Within each bag directory, bags should be generated using the same feature extractor and at the same magnification, but feature extractors and magnifications can vary between bag directories. + +For example, to train a multimodal model using two magnifications, you would pass two bag paths to the model. In this case, the ``/path/to/bags_10x`` directory contains bags generated from a 10x feature extractor, and the ``/path/to/bags_40x`` directory contains bags generated from a 40x feature extractor. + +.. code-block:: python + + # Configure a multimodal MIL model. + config = mil_config('mm_attention_mil', lr=1e-4) + + # Set the bags paths for each modality. + bags_10x = '/path/to/bags_10x' + bags_40x = '/path/to/bags_40x' + + P.train_mil( + config=config, + outcomes='HPV_status', + train_dataset=train, + val_dataset=val, + bags=[bags_10x, bags_40x] + ) + +You can use any number of modalities, and the feature extractors for each modality can be different. For example, you could train a multimodal model using features from a custom SimCLR model at 5x and features from a pretrained CTransPath model at 20x. + +The feature extractors used for each modality, as specified in the ``bags_config.json`` files in the bag directories, will be logged in the final ``mil_params.json`` file. Multimodal MIL models can be interactively viewed in :ref:`Slideflow Studio `, allowing you to visualize the attention weights for each modality separately. + +.. _custom_mil: + +Custom Architectures +-------------------- + +Training custom MIL models is straightforward with Slideflow, particularly if your model can adhere to a few simple guidelines: + +- Initialized with ``(num_feats, num_outputs)`` (e.g., ``Attention_MIL(768, 2)``) +- Input is feature bags with shape ``(batch, num_tiles, num_feats)``. If the model needs a "lens" input, then the model attribute ``use_lens`` should be True. +- Has a ``relocate()`` function that moves the model to detected device/GPU +- Ability to get attention through one of two methods: + - ``forward()`` function includes an optional ``return_attention`` argument, which if True returns attention scores after model output + - Has a ``calculate_attention()`` function that returns attention scores + +If the above applies to your model, you can train it simply by passing it as the first argument to :func:`slideflow.mil.mil_config`. + +.. code-block:: python + + import slideflow as sf + from slideflow.mil import mil_config + from my_module import CustomMIL + + config = mil_config(CustomMIL, lr=1e-3) + + +For larger projects, or if you are designing a plugin/extension for Slideflow, custom models can be registered to facilitate easy creation. If your model adheres to the above guidelines, you can register it for use with the following: + +.. code-block:: python + + from slideflow.mil import register_model + + @register_model + def my_model(): + return MyModelClass + + +You can then use your model when creating an MIL configuration: + +.. code-block:: python + + config = sf.mil.mil_config('my_model', ...) + + +If the above guidelines do *not* apply to your model, or if you want to customize model logic or functionality, you can supply a custom MIL configuration class that will supervise model building and dataset preparation. Your custom configuration class should inherit ``slideflow.mil.MILModelConfig``, and methods in this class can be overloaded to provide additional functionality. For example, to create an MIL configuration that uses a custom loss and custom metrics: + +.. code-block:: python + + from slideflow.mil import MILModelConfig + + class MyModelConfig(MILModelConfig): + + @property + def loss_fn(self): + return my_custom_loss + + def get_metrics(self): + return [my_metric1, my_metric2] + + +When registering your model, you should specify that it should use your custom configuration: + +.. code-block:: python + + @register_model(config=MyModelConfig) + def my_model(): + return MyModelClass + + +For an example of how to utilize model registration and configuration customization, see our `CLAM implementation `__ available through ``slideflow-gpl``. + + +Evaluation +********** + +To evaluate a saved MIL model on an external dataset, first extract features from a dataset, then use :func:`slideflow.Project.evaluate_mil`, which displays evaluation metrics and returns predictions as a DataFrame. + +.. code-block:: python + + import slideflow as sf + + # Prepare a project and dataset + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Generate features using CTransPath + ctranspath = sf.build_feature_extractor('ctranspath', resize=True) + features = sf.DatasetFeatures(ctranspath, dataset=dataset) + features.to_torch('/path/to/bag_directory') + + # Evaluate a saved MIL model + df = P.evaluate_mil( + '/path/to/saved_model' + outcomes='HPV_status', + dataset=dataset, + bags='/path/to/bag_directory', + ) + +As with training, attention heatmaps can be generated for attention-based MIL models with the argument ``attention_heatmaps=True``, and these can be customized using ``cmap`` and ``interpolation`` arguments. + +.. image:: att_heatmap.jpg + +Generating Predictions +********************** + +In addition to generating slide-level predictions during training and evaluation, you can also generate tile-level predictions and attention scores for a dataset using :func:`slideflow.mil.get_mil_tile_predictions`. This function returns a DataFrame containing tile-level predictions and attention. + +.. code-block:: python + + >>> from slideflow.mil import get_mil_tile_predictions + >>> df = get_mil_tile_predictions(model, dataset, bags) + >>> df + slide loc_x loc_y ... y_pred3 y_pred4 y_pred5 + 0 TCGA-4V-A9QI-01Z-0... 2210 7349 ... 0.181155 0.468446 0.070175 + 1 TCGA-4V-A9QI-01Z-0... 5795 1971 ... 0.243721 0.131991 0.009169 + 2 TCGA-4V-A9QI-01Z-0... 6273 5437 ... 0.096196 0.583367 0.090258 + 3 TCGA-4V-A9QI-01Z-0... 2330 3047 ... 0.056426 0.264386 0.300199 + 4 TCGA-4V-A9QI-01Z-0... 3644 3525 ... 0.134535 0.534353 0.013619 + ... ... ... ... ... ... ... ... + 391809 TCGA-4X-A9FA-01Z-0... 6034 3352 ... 0.004119 0.003636 0.005673 + 391810 TCGA-4X-A9FA-01Z-0... 6643 1401 ... 0.012790 0.010269 0.011726 + 391811 TCGA-4X-A9FA-01Z-0... 5546 2011 ... 0.009777 0.013556 0.025255 + 391812 TCGA-4X-A9FA-01Z-0... 6277 2864 ... 0.026638 0.018499 0.031061 + 391813 TCGA-4X-A9FA-01Z-0... 4083 4205 ... 0.009875 0.009582 0.022125 + + [391814 rows x 15 columns] + + +Single-Slide Inference +********************** + +Predictions can also be generated for individual slides, without requiring the user to manually generate feature bags. Use :func:`slideflow.model.predict_slide` to generate predictions for a single slide. The first argument is th path to the saved MIL model (a directory containing ``mil_params.json``), and the second argument can either be a path to a slide or a loaded :class:`sf.WSI` object. + +.. code-block:: python + + from slideflow.mil import predict_slide + from slideflow.slide import qc + + # Load a slide and apply Otsu thresholding + slide = '/path/to/slide.svs' + wsi = sf.WSI(slide, tile_px=299, tile_um=302) + wsi.qc(qc.Otsu()) + + # Calculate predictions and attention heatmap + model = '/path/to/mil_model' + y_pred, y_att = predict_slide(model, wsi) + + +The function will return a tuple of predictions and attention heatmaps. If the model is not attention-based, the attention heatmap will be ``None``. To calculate attention for a model, set ``attention=True``: + +.. code-block:: python + + y_pred, y_att = predict_slide(model, slide, attention=True) + +The returned attention values will be a masked ``numpy.ndarray`` with the same shape as the slide tile extraction grid. Unused tiles will have masked attention values. + + +Visualizing Predictions +*********************** + +Heatmaps of attention and tile-level predictions can be interactively visualized in Slideflow Studio by enabling the Multiple-Instance Learning extension (new in Slideflow 2.1.0). This extension is discussed in more detail in the :ref:`extensions` section. \ No newline at end of file diff --git a/docs/_sources/mil_module.rst.txt b/docs/_sources/mil_module.rst.txt new file mode 100644 index 000000000..a42d054d0 --- /dev/null +++ b/docs/_sources/mil_module.rst.txt @@ -0,0 +1,102 @@ +.. _mil_api: + +.. currentmodule:: slideflow.mil + +slideflow.mil +============== + +This submodule contains tools for multiple-instance learning (MIL) model training and evaluation. See :ref:`mil` for more information. A summary of the API is given below. + +**Training:** + - :func:`train_mil()`: Train an MIL model, using an MIL configuration, Datasets, and a directory of bags. + - :func:`build_fastai_learner()`: Build and return the FastAI Learner, but do not execute training. Useful for customizing training. + - :func:`build_multimodal_learner()`: Build and return a FastAI Learner designed for multi-modal/multi-magnification input. + +**Evaluation/Inference:** + - :func:`eval_mil()`: Evaluate an MIL model using a path to a saved model, a Dataset, and path to bags. Generates metrics. + - :func:`predict_mil()`: Generate predictions from an MIL model and saved bags. Returns a pandas dataframe. + - :func:`predict_multimodal_mil()`: Generate predictions from a multimodal MIL model. Returns a dataframe. + - :func:`predict_slide()`: Generate MIL predictions for a single slide. Returns a 2D array of predictions and attention. + - :func:`predict_from_bags()`: Low-level interface for generating predictions from a loaded MIL model and pre-loaded bag Tensors. + - :func:`predict_from_multimodal_bags()`: Low-level interface for generating multimodal predictions from a loaded MIL model and bag Tensors. + - :func:`get_mil_tile_predictions()`: Get tile-level predictions and attention from a saved MIL model for a given Dataset and saved bags. + - :func:`generate_attention_heatmaps()`: Generate and save attention heatmaps. + - :func:`generate_mil_features()`: Get last-layer activations from an MIL model. Returns an MILFeatures object. + + +Main functions +************** + +.. autofunction:: mil_config +.. autofunction:: train_mil +.. autofunction:: build_fastai_learner +.. autofunction:: build_multimodal_learner +.. autofunction:: eval_mil +.. autofunction:: predict_mil +.. autofunction:: predict_multimodal_mil +.. autofunction:: predict_from_bags +.. autofunction:: predict_from_multimodal_bags +.. autofunction:: predict_slide +.. autofunction:: get_mil_tile_predictions +.. autofunction:: generate_attention_heatmaps +.. autofunction:: generate_mil_features + +TrainerConfig +************* + +.. autoclass:: slideflow.mil.TrainerConfig +.. autosummary:: + + TrainerConfig.model_fn + TrainerConfig.loss_fn + TrainerConfig.is_multimodal + TrainerConfig.model_type + +.. autofunction:: slideflow.mil.TrainerConfig.to_dict +.. autofunction:: slideflow.mil.TrainerConfig.json_dump +.. autofunction:: slideflow.mil.TrainerConfig.is_classification +.. autofunction:: slideflow.mil.TrainerConfig.get_metrics +.. autofunction:: slideflow.mil.TrainerConfig.prepare_training +.. autofunction:: slideflow.mil.TrainerConfig.build_model +.. autofunction:: slideflow.mil.TrainerConfig.predict +.. autofunction:: slideflow.mil.TrainerConfig.batched_predict +.. autofunction:: slideflow.mil.TrainerConfig.train +.. autofunction:: slideflow.mil.TrainerConfig.eval +.. autofunction:: slideflow.mil.TrainerConfig.build_train_dataloader +.. autofunction:: slideflow.mil.TrainerConfig.build_val_dataloader +.. autofunction:: slideflow.mil.TrainerConfig.inspect_batch +.. autofunction:: slideflow.mil.TrainerConfig.run_metrics + +MILModelConfig +************** + +.. autoclass:: MILModelConfig +.. autosummary:: + + MILModelConfig.apply_softmax + MILModelConfig.loss_fn + MILModelConfig.model_fn + MILModelConfig.model_type + MILModelConfig.is_multimodal + +.. autofunction:: slideflow.mil.MILModelConfig.is_classification +.. autofunction:: slideflow.mil.MILModelConfig.to_dict +.. autofunction:: slideflow.mil.MILModelConfig.inspect_batch +.. autofunction:: slideflow.mil.MILModelConfig.build_model +.. autofunction:: slideflow.mil.MILModelConfig.predict +.. autofunction:: slideflow.mil.MILModelConfig.batched_predict +.. autofunction:: slideflow.mil.MILModelConfig.run_metrics + +CLAMModelConfig +*************** + +The CLAM model configuration class requires ``slideflow-gpl``, which can be installed with: + +.. code-block:: bash + + pip install slideflow-gpl + +Once installed, the class is available at ``slideflow.clam.CLAMModelConfig``. + +.. autoclass:: slideflow.clam.CLAMModelConfig + diff --git a/docs/_sources/model.rst.txt b/docs/_sources/model.rst.txt index 5d9796c6c..ce1302ac8 100644 --- a/docs/_sources/model.rst.txt +++ b/docs/_sources/model.rst.txt @@ -5,9 +5,9 @@ slideflow.model This module provides the :class:`ModelParams` class to organize model and training parameters/hyperparameters and assist with model building, as well as the :class:`Trainer` class that -executes model training and evaluation. :class:`LinearTrainer` and :class:`CPHTrainer` -are extensions of this class, supporting linear and Cox Proportional Hazards outcomes, respectively. The function -:func:`trainer_from_hp` can choose and return the correct model instance based on the provided +executes model training and evaluation. :class:`RegressionTrainer` and :class:`SurvivalTrainer` +are extensions of this class, supporting regression and Cox Proportional Hazards outcomes, respectively. The function +:func:`build_trainer` can choose and return the correct model instance based on the provided hyperparameters. .. note:: @@ -15,64 +15,39 @@ hyperparameters. :mod:`slideflow.model.tensorflow` or :mod:`slideflow.model.torch` according to the currently active backend, indicated by the environmental variable ``SF_BACKEND``. -Configuring and training models -******************************* - -:class:`slideflow.model.ModelParams` will build models according to a set of model parameters and a given set of -outcome labels. To change the core image convolutional model to another architecture, set the ``model`` parameter -to the custom model class. - -.. code-block:: python - - import CustomModel - from slideflow.model import ModelParams - - mp = ModelParams(model=CustomModel, ...) - -Working with layer activations -****************************** - -:class:`slideflow.model.Features` creates an interface to efficiently generate features/layer activations and logits -from either a batch of images (returning a batch of activations/logits) or a whole-slide image (returning a grid of -activations/logits). - -:class:`slideflow.model.DatasetFeatures` calculates features and logits for an entire dataset, storing -result arrays into a dictionary mapping slide names to the generated activations. This buffer of whole-dataset -activations can then be used for functions requiring analysis of whole-dataset activations, including -:class:`slideflow.SlideMap` and :class:`slideflow.mosiac.Mosaic`. - -.. automodule: slideflow.model - -ModelParams -*********** -.. autoclass:: ModelParams - :inherited-members: +See :ref:`training` for a detailed look at how to train models. Trainer -*********** +******* .. autoclass:: Trainer - :inherited-members: +.. autofunction:: slideflow.model.Trainer.load +.. autofunction:: slideflow.model.Trainer.evaluate +.. autofunction:: slideflow.model.Trainer.predict +.. autofunction:: slideflow.model.Trainer.train -LinearTrainer -************* -.. autoclass:: LinearTrainer - :inherited-members: +RegressionTrainer +***************** +.. autoclass:: RegressionTrainer -CPHTrainer -*********** -.. autoclass:: CPHTrainer - :inherited-members: - -trainer_from_hp +SurvivalTrainer *************** -.. autofunction:: trainer_from_hp +.. autoclass:: SurvivalTrainer Features -*********** +******** .. autoclass:: Features - :inherited-members: +.. autofunction:: slideflow.model.Features.from_model +.. autofunction:: slideflow.model.Features.__call__ -DatasetFeatures -**************** -.. autoclass:: DatasetFeatures - :inherited-members: \ No newline at end of file +Other functions +*************** +.. autofunction:: build_trainer +.. autofunction:: build_feature_extractor +.. autofunction:: list_extractors +.. autofunction:: load +.. autofunction:: is_tensorflow_model +.. autofunction:: is_tensorflow_tensor +.. autofunction:: is_torch_model +.. autofunction:: is_torch_tensor +.. autofunction:: read_hp_sweep +.. autofunction:: rebuild_extractor \ No newline at end of file diff --git a/docs/_sources/model_params.rst.txt b/docs/_sources/model_params.rst.txt new file mode 100644 index 000000000..041374bdd --- /dev/null +++ b/docs/_sources/model_params.rst.txt @@ -0,0 +1,39 @@ +.. currentmodule:: slideflow + +.. _model_params: + +slideflow.ModelParams +===================== + +The :class:`ModelParams` class organizes model and training parameters/hyperparameters and assists with model building. + +See :ref:`training` for a detailed look at how to train models. + +ModelParams +*********** +.. autoclass:: ModelParams +.. autofunction:: slideflow.ModelParams.to_dict +.. autofunction:: slideflow.ModelParams.get_normalizer +.. autofunction:: slideflow.ModelParams.validate +.. autofunction:: slideflow.ModelParams.model_type + +Mini-batch balancing +******************** + +During training, mini-batch balancing can be customized to assist with increasing representation of sparse outcomes or small slides. Five mini-batch balancing methods are available when configuring :class:`slideflow.ModelParams`, set through the parameters ``training_balance`` and ``validation_balance``. These are ``'tile'``, ``'category'``, ``'patient'``, ``'slide'``, and ``'none'``. + +If **tile-level balancing** ("tile") is used, tiles will be selected randomly from the population of all extracted tiles. + +If **slide-based balancing** ("patient") is used, batches will contain equal representation of images from each slide. + +If **patient-based balancing** ("patient") is used, batches will balance image tiles across patients. The balancing is similar to slide-based balancing, except across patients (as each patient may have more than one slide). + +If **category-based balancing** ("category") is used, batches will contain equal representation from each outcome category. + +If **no balancing** is performed, batches will be assembled by randomly selecting from TFRecords. This is equivalent to slide-based balancing if each slide has its own TFRecord (default behavior). + +See :ref:`balancing` for more discussion on sampling and mini-batch balancing. + +.. note:: + + If you are :ref:`using a Trainer ` to train your models, you can further customize the mini-batch balancing strategy by using :meth:`slideflow.Dataset.balance` on your training and/or validation datasets. \ No newline at end of file diff --git a/docs/_sources/model_tensorflow.rst.txt b/docs/_sources/model_tensorflow.rst.txt new file mode 100644 index 000000000..cd53d4e62 --- /dev/null +++ b/docs/_sources/model_tensorflow.rst.txt @@ -0,0 +1,11 @@ +.. currentmodule:: slideflow.model.tensorflow + +slideflow.model.tensorflow +========================== + +This submodule contains Tensorflow-specific utility functions when working in the Tensorflow backend. + +.. autofunction:: slideflow.model.tensorflow.flatten +.. autofunction:: slideflow.model.tensorflow.load +.. autofunction:: slideflow.model.tensorflow.log_manifest +.. autofunction:: slideflow.model.tensorflow.unwrap diff --git a/docs/_sources/model_torch.rst.txt b/docs/_sources/model_torch.rst.txt new file mode 100644 index 000000000..9b374a8d1 --- /dev/null +++ b/docs/_sources/model_torch.rst.txt @@ -0,0 +1,10 @@ +.. currentmodule:: slideflow.model.torch + +slideflow.model.torch +========================== + +This submodule contains PyTorch-specific utility functions when working in the PyTorch backend. + +.. autofunction:: slideflow.model.torch.lazy_load_pretrained +.. autofunction:: slideflow.model.torch.load +.. autofunction:: slideflow.model.torch.log_manifest diff --git a/docs/_sources/mosaic.rst.txt b/docs/_sources/mosaic.rst.txt index a3095844f..1cd293503 100644 --- a/docs/_sources/mosaic.rst.txt +++ b/docs/_sources/mosaic.rst.txt @@ -1,15 +1,16 @@ -.. currentmodule:: slideflow.mosaic +.. currentmodule:: slideflow -slideflow.mosaic +.. _mosaic: + +slideflow.Mosaic ================ -This module provides the :class:`slideflow.Mosaic` class, which plots tile images onto a map of slides, -generating mosaic maps. +:class:`slideflow.Mosaic` plots tile images onto a map of slides, generating a mosaic map. The idea of a mosaic map is to visualize image feature variation across slides and among categories, in an attempt to better understand the kinds of image features discriminative models might be using to generate class predictions. They are created by first generating whole-dataset layer features (using -:class:`slideflow.model.DatasetFeatures`), which are then mapped into two-dimensional space using UMAP +:class:`slideflow.DatasetFeatures`), which are then mapped into two-dimensional space using UMAP dimensionality reduction (:class:`slideflow.SlideMap`). This resulting SlideMap is then passed to :class:`slideflow.Mosaic`, which overlays tile images onto the dimensionality-reduced slide map. @@ -17,10 +18,16 @@ An example of a mosaic map can be found in Figure 4 of `this paper `_, without the use of feature inversion. -.. automodule: slideflow.mosaic +See :ref:`mosaic_map` for an example of how a mosaic map can be used in the context of a project. + +.. autoclass:: Mosaic -Mosaic ------- +Methods +------- -.. autoclass:: slideflow.Mosaic - :inherited-members: \ No newline at end of file +.. autofunction:: slideflow.Mosaic.generate_grid +.. autofunction:: slideflow.Mosaic.plot +.. autofunction:: slideflow.Mosaic.points_at_grid_index +.. autofunction:: slideflow.Mosaic.save +.. autofunction:: slideflow.Mosaic.save_report +.. autofunction:: slideflow.Mosaic.view \ No newline at end of file diff --git a/docs/_sources/norm.rst.txt b/docs/_sources/norm.rst.txt new file mode 100644 index 000000000..cba2f1c2f --- /dev/null +++ b/docs/_sources/norm.rst.txt @@ -0,0 +1,358 @@ +.. currentmodule:: slideflow.norm + +slideflow.norm +=============== + +The ``slideflow.norm`` submodule includes tools for H&E stain normalization and augmentation. + +Available stain normalization algorithms include: + +- **macenko**: `Original Macenko paper `_. +- **macenko_fast**: Modified Macenko algorithm with the brightness standardization step removed. +- **reinhard**: `Original Reinhard paper `_. +- **reinhard_fast**: Modified Reinhard algorithm with the brightness standardization step removed. +- **reinhard_mask**: Modified Reinhard algorithm, with background/whitespace removed. +- **reinhard_fast_mask**: Modified Reinhard-Fast algorithm, with background/whitespace removed. +- **vahadane**: `Original Vahadane paper `_. +- **augment**: HSV colorspace augmentation. +- **cyclegan**: CycleGAN-based stain normalization, as implemented by `Zingman et al `_ (PyTorch only) + +Overview +******** + +The main normalizer interface, :class:`slideflow.norm.StainNormalizer`, offers +efficient numpy implementations for the Macenko, Reinhard, and Vahadane H&E stain normalization algorithms, as well +as an HSV colorspace stain augmentation method. This normalizer can convert +images to and from Tensors, numpy arrays, and raw JPEG/PNG images. + +In addition to these numpy implementations, PyTorch-native and Tensorflow-native +implementations are also provided, which offer performance improvements, GPU acceleration, +and/or vectorized application. The native normalizers are found in +``slideflow.norm.tensorflow`` and ``slideflow.norm.torch``, respectively. + +The Vahadane normalizer has two numpy implementations available: SPAMS +(``vahadane_spams``) and sklearn (``vahadane_sklearn``). By default, +the SPAMS implementation will be used if unspecified (``method='vahadane'``). + +Use :func:`slideflow.norm.autoselect` to get the fastest available normalizer +for a given method and active backend (Tensorflow/PyTorch). + +How to use +********** + +There are four ways you can use stain normalizers: 1) on individual images, 2) during dataset iteration, 3) during tile extraction, or 4) on-the-fly during training. + +Individual images +----------------- + +Stain normalizers can be used directly on individual images or batches of images. The Tensorflow and PyTorch-native stain normalizers perform operations on Tensors, allowing you to incoporate stain normalization into an external preprocessing pipeline. + +Load a backend-native stain normalizer with ``autoselect``, then transform an image with ``StainNormalizer.transform()``. This function will auto-detect the source image type, perform the most efficient transformation possible, and return normalized images of the same type. + +.. code-block:: python + + import slideflow as sf + + macenko = sf.norm.autoselect('macenko') + image = macenko.transform(image) + +You can use :meth:`slideflow.norm.StainNormalizer.fit` to fit the normalizer to a custom reference image, or use one of our preset fits. + +Dataloader pre-processing +------------------------- + +You can apply stain normalization during dataloader preprocessing by passing the ``StainNormalizer`` object to the ``normalizer`` argument of either ``Dataset.tensorflow()`` or ``Dataset.torch()``. + +.. code-block:: python + + import slideflow as sf + + # Get a PyTorch-native Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Create a PyTorch dataloader that applies stain normalization + dataset = sf.Dataset(...) + dataloader = dataset.torch(..., normalizer=macenko) + +.. note:: + + GPU acceleration cannot be performed within a PyTorch dataloader. Stain normalizers have a ``.preprocess()`` function that stain-normalizes and standardizes a batch of images, so the workflow to normalize on GPU in a custom PyTorch training loop would be: + + - Get a Dataloader with ``dataset.torch(standardize=False, normalize=False)`` + - On an image batch, preprocess with ``normalizer.preprocess()``: + + .. code-block:: python + + # Slideflow dataset + dataset = Project.dataset(tile_px=..., tile_um=...) + + # Create PyTorch dataloader + dataloader = dataset.torch(..., standardize=False) + + # Get a stain normalizer + normalizer = sf.norm.autoselect('reinhard') + + # Iterate through the dataloader + for img_batch, labels in dataloader: + + # Stain normalize using GPU + img_batch = img_batch.to('cuda') + with torch.no_grad(): + proc_batch = normalizer.preprocess(img_batch) + + ... + + +During tile extraction +---------------------- + +Image tiles can be normalized during tile extraction by using the ``normalizer`` and ``normalizer_source`` arguments. ``normalizer`` is the name of the algorithm. The normalizer source - either a path to a reference image, or a ``str`` indicating one of our presets (e.g. ``'v1'``, ``'v2'``, ``'v3'``) - can also be set with ``normalizer_source``. + +.. code-block:: python + + P.extract_tiles( + tile_px=299, + tile_um=302, + normalizer='reinhard' + ) + +On-the-fly +---------- + +Performing stain normalization on-the-fly provides greater flexibility, as it allows you to change normalization strategies without re-extracting all of your image tiles. + +Real-time normalization can be performed for most pipeline functions - such as model training or feature generation - by setting the ``normalizer`` and/or ``normalizer_source`` hyperparameters. + +.. code-block:: python + + from slideflow.model import ModelParams + hp = ModelParams(..., normalizer='reinhard') + +If a model was trained using a normalizer, the normalizer algorithm and fit information will be stored in the model metadata file, ``params.json``, in the saved model folder. Any Slideflow function that uses this model will automatically process images using the same normalization strategy. + +.. _normalizer_performance: + +Performance +*********** + +Slideflow has Tensorflow, PyTorch, and Numpy/OpenCV implementations of stain normalization algorithms. Performance benchmarks for these implementations +are given below: + +.. list-table:: **Performance Benchmarks** (299 x 299 images, Slideflow 2.0.0, benchmarked on 3960X and A100 40GB) + :header-rows: 1 + + * - + - Tensorflow backend + - PyTorch backend + * - macenko + - 929 img/s (**native**) + - 881 img/s (**native**) + * - macenko_fast + - 1,404 img/s (**native**) + - 1,088 img/s (**native**) + * - reinhard + - 1,136 img/s (**native**) + - 3,329 img/s (**native**) + * - reinhard_fast + - 4,226 img/s (**native**) + - 4,187 img/s (**native**) + * - reinhard_mask + - 1,136 img/s (**native**) + - 3,941 img/s (**native**) + * - reinhard_fast_mask + - 4,496 img/s (**native**) + - 4,058 img/s (**native**) + * - vahadane_spams + - 0.7 img/s + - 2.2 img/s + * - vahadane_sklearn + - 0.9 img/s + - 1.0 img/s + +.. _contextual_normalization: + +Contextual Normalization +************************ + +Contextual stain normalization allows you to stain normalize an image using the staining context of a separate image. When the context image is a thumbnail of the whole slide, this may provide slight improvements in normalization quality for areas of a slide that are predominantly eosin (e.g. necrosis or low cellularity). For the Macenko normalizer, this works by determining the maximum H&E concentrations from the context image rather than the image being transformed. For the Reinhard normalizer, channel means and standard deviations are calculated from the context image instead of the image being transformed. This normalization approach can result in poor quality images if the context image has pen marks or other artifacts, so we do not recommend using this approach without ROIs or effective slide-level filtering. + +Contextual normalization can be enabled during tile extraction by passing the argument ``context_normalize=True`` to :meth:`slideflow.Dataset.extract_tiles()`. + +You can use contextual normalization when manually using a ``StainNormalizer`` object by using the ``.context()`` function. The context can either be a slide (path or ``sf.WSI``) or an image (Tensor or np.ndarray). + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Use a given slide as context + slide = sf.WSI('slide.svs', ...) + + # Context normalize an image + with macenko.context(slide): + img = macenko.transform(img) + +You can also manually set or clear the normalizer context with ``.set_context()`` and ``.clear_context()``: + +.. code-block:: python + + # Set the normalizer context + macenko.set_context(slide) + + # Context normalize an image + img = macenko.transform(img) + + # Remove the normalizer context + macenko.clear_context() + +Contextual normalization is not supported with on-the-fly normalization during training or dataset iteration. + +.. _stain_augmentation: + +Stain Augmentation +****************** + +One of the benefits of on-the-fly stain normalization is the ability to perform dynamic stain augmentation with normalization. For Reinhard normalizers, this is performed by randomizing the channel means and channel standard deviations. For Macenko normalizers, stain augmentation is performed by randomizing the stain matrix target and the target concentrations. In all cases, randomization is performed by sampling from a normal distribution whose mean is the reference fit and whose standard deviation is a predefined value (in ``sf.norm.utils.augment_presets``). Of note, this strategy differs from the more commonly used strategy `described by Tellez `_, where augmentation is performed by randomly perturbing images in the stain matrix space without normalization. + +To enable stain augmentation, add the letter 'n' to the ``augment`` parameter when training a model. + +.. code-block:: python + + import slideflow as sf + + # Open a project + project = sf.Project(...) + + # Add stain augmentation to augmentation pipeline + params = sf.ModelParams(..., augment='xryjn') + + # Train a model + project.train(..., params=params) + +When using a StainNormalizer object, you can perform a combination of normalization and augmention for an image by using the argument ``augment=True`` when calling :meth:`StainNormalizer.transform`: + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Perform combination of stain normalization and augmentation + img = macenko.transform(img, augment=True) + +To stain augment an image without normalization, use the method :meth:`StainNormalizer.augment`: + +.. code-block:: python + + import slideflow as sf + + # Get a Macenko normalizer + macenko = sf.norm.autoselect('macenko') + + # Perform stain augmentation + img = macenko.augment(img) + + +StainNormalizer +*************** + +.. autoclass:: StainNormalizer +.. autofunction:: slideflow.norm.StainNormalizer.fit +.. autofunction:: slideflow.norm.StainNormalizer.get_fit +.. autofunction:: slideflow.norm.StainNormalizer.set_fit +.. autofunction:: slideflow.norm.StainNormalizer.augment +.. autofunction:: slideflow.norm.StainNormalizer.transform +.. autofunction:: slideflow.norm.StainNormalizer.jpeg_to_jpeg +.. autofunction:: slideflow.norm.StainNormalizer.jpeg_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.png_to_png +.. autofunction:: slideflow.norm.StainNormalizer.png_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.rgb_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.tf_to_rgb +.. autofunction:: slideflow.norm.StainNormalizer.tf_to_tf +.. autofunction:: slideflow.norm.StainNormalizer.torch_to_torch + +Example images +************** + +.. figure:: norm_compare/wsi_norm_compare.jpg + + Comparison of normalizers applied to a whole-slide image. + +.. figure:: norm_compare/tile_norm_compare.jpg + + Comparison of normalizers applied to an image tile. + +.. figure:: norm_compare/wsi_unnormalized.jpg + + Unnormalized whole-slide images. + +.. figure:: norm_compare/wsi_reinhard_v1.jpg + + Whole-slide images normalized with **Reinhard**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_reinhard_v2.jpg + + Whole-slide images normalized with **Reinhard**, fit to preset "v2" + +.. figure:: norm_compare/wsi_macenko_v1.jpg + + Whole-slide images normalized with **Macenko**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_macenko_v2.jpg + + Whole-slide images normalized with **Macenko**, fit to preset "v2" + +.. figure:: norm_compare/wsi_vahadane_v1.jpg + + Whole-slide images normalized with **Vahadane**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_vahadane_v2.jpg + + Whole-slide images normalized with **Vahadane**, fit to preset "v2" + +.. figure:: norm_compare/wsi_vahadane_spams_v1.jpg + + Whole-slide images normalized with **Vahadane (SPAMS)**, fit to preset "v1" (default) + +.. figure:: norm_compare/wsi_vahadane_spams_v2.jpg + + Whole-slide images normalized with **Vahadane (SPAMS)**, fit to preset "v2" + +.. figure:: norm_compare/tile_unnormalized.jpg + + Unnormalized image tiles. + +.. figure:: norm_compare/tile_reinhard_v1.jpg + + Image tiles normalized with **Reinhard Mask**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_reinhard_v2.jpg + + Image tiles normalized with **Reinhard Mask**, fit to preset "v2" + +.. figure:: norm_compare/tile_macenko_v1.jpg + + Image tiles normalized with **Macenko**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_macenko_v2.jpg + + Image tiles normalized with **Macenko**, fit to preset "v2" + +.. figure:: norm_compare/tile_vahadane_v1.jpg + + Image tiles normalized with **Vahadane**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_vahadane_v2.jpg + + Image tiles normalized with **Vahadane**, fit to preset "v2" + +.. figure:: norm_compare/tile_vahadane_spams_v1.jpg + + Image tiles normalized with **Vahadane (SPAMS)**, fit to preset "v1" (default) + +.. figure:: norm_compare/tile_vahadane_spams_v2.jpg + + Image tiles normalized with **Vahadane (SPAMS)**, fit to preset "v2" \ No newline at end of file diff --git a/docs/_sources/overview.rst.txt b/docs/_sources/overview.rst.txt new file mode 100644 index 000000000..49b84f3aa --- /dev/null +++ b/docs/_sources/overview.rst.txt @@ -0,0 +1,65 @@ +Overview +======== + +Slideflow provides tools for easily building and testing a variety of deep learning models for digital pathology. + +This section provides a high-level overview of the most common application: building and testing a weakly supervised predictive model. Slideflow supports many other tasks, including :ref:`multiple-instance learning (MIL) `, :ref:`self-supervised learning (SSL) `, :ref:`generative adversarial networks (GANs) `, :ref:`tissue ` and :ref:`cell ` segmentation, and :ref:`deployment & visualization `, which are discussed in subsequent sections. + +.. figure:: overview.png + + *High-level overview of model building.* + +The pipeline for a deep learning classification experiment is separated into three phases. + +1) **Tile extraction** - annotate slides with regions of interest (ROIs) [*optional*] and extract image tiles from whole-slide images. + +2) **Model training** - determine model parameters, train a model, and evaluate the model on a held-out test set. + +3) **Explainability** - generate predictive heatmaps and analyze learned image features. + +| + +A brief introduction to the steps needed to execute a basic experiment is provided below. Each process will be described in more detail in the following sections. + +Step 1: Prepare a dataset +************************* + +- **Extract tiles**. :ref:`Tiles are extracted ` from slides at a given magnification size in microns (or a magnification layer, such as "10x"), and saved at a given resolution in pixels. The optimal extraction size in both microns and pixels will depend on your dataset and model architecture. Poor quality tiles - including background tiles or tiles with high whitespace content - can be discarded with quality control methods. Tiles will be stored as TFRecords, a binary file format used to improve dataset reading performance during training. Each slide will have its own TFRecord file containing its extracted tiles. + +- **Set aside final evaluation set**. :ref:`Split the dataset ` into a training/validation set and held-out test set. + +- **Determing validation plan**. By default, three-fold cross-validation will be performed during training. Many other validation strategies are also supported (:ref:`validation_planning`). + +Step 2: Train a model +********************* + +- **Choose model type**. Choose the endpoint (e.g. classification, regression, time-to-event) and type of model (tile-based or multiple-instance learning). + +- **Set hyperparameters**. Choose a model architecture (e.g. InceptionV3, VGG16, ResNet, etc.) and a set of hyperparameters (e.g. batch size, learning rate, etc.). This can be done manually, or :ref:`hyperparameters can be optimized ` via grid search or Bayesian optimization. + +- **Initiate training**. :ref:`Train your model `, taking note of training and validation performance (e.g. accuracy, AUROC, AP, R-squared, C-index). + +Step 3: Evaluate the model +************************** + +- **Evaluate on held-out set**: :ref:`Evaluate your final model ` model on the held-out dataset. + +Step 4: Generate heatmaps +************************* + +- **Generate heatmaps**: :ref:`Generate heatmaps ` of predictions across slides in the held-out dataset to assist with interpretability. For MIL models, heatmaps of both predictions and attention can be generated. + +.. image:: heatmap_example.png + +Step 5: Make a Mosaic map +************************* + +- **Generate a mosaic map**: :ref:`Create a mosaic map `, which visually illustrates the latent space of your trained model and held-out dataset, to assist with interpretability. + +.. image:: mosaic_example.png + +Step 6: Live visualization +************************** +- **Deploy the model**: Finally, use a trained model to visualize predictions for whole-slide images with the interactive tool :ref:`Slideflow Studio `. This whole-slide image viewer includes deep learning tools enabling you to visualize model predictions on whole-slide images, standard JPG/PNG files, real-time camera feeds, and even Generative Adversarial Network (GAN)-generated images. + +.. image:: workbench_preview.png diff --git a/docs/_sources/pipeline.rst.txt b/docs/_sources/pipeline.rst.txt deleted file mode 100644 index 0c23996a5..000000000 --- a/docs/_sources/pipeline.rst.txt +++ /dev/null @@ -1,65 +0,0 @@ -Pipeline Overview -================= - -.. figure:: overview.png - - *High-level overview of main functions.* - -The overall pipeline for a deep learning experiment is separated into three phases. - -1) **Tile extraction** - involves annotating slides with regions of interest (ROIs) (*optional*), setting up a project, and extracting image tiles from whole-slide images. - -2) **Model training** - includes performing a hyperparameter sweep [*optional*], training a model, and evaluating the trained model on a held-out test set. - -3) **Explainability** - involves generating predictive heatmaps and analyzing learned image features. - -| - -A high-level overview of each of these phases is provided below. We will examine execution of each step in more detail in the following sections. - -Step 1: ROI Annotation -********************** - -1) **Label ROIs** (optional). Using `QuPath `_, annotate whole-slide images with the Polygon tool. Then, click **Automate** -> **Show script editor**. In the box that comes up, click **File** -> **Open** and load the ``qupath_roi.groovy`` script (QuPath 0.2 or greater) or ``qupath_roi_legacy.groovy`` (QuPath 0.1.x). Click **Run** -> **Run** if using QuPath 0.2 or greater, or **Run** -> **Run for Project** if using QuPath 0.1.x. ROIs will be exported in CSV format in the QuPath project directory, in the subdirectory "ROI". - -.. note:: - This step may be skipped if you are performing analysis on whole-slide images, rather than annotated tumor regions. - -Step 2: Dataset preparation -*************************** - -2) **Extract tiles**. Once ROIs have been created, tiles will need to be extracted from the ROIs across all of your slides. Tiles will be extracted at a given magnification size in microns, and saved at a given resolution in pixels. The optimal extraction size in both microns and pixels will depend on your dataset and model architecture. Poor quality tiles - including background tiles or tiles with high whitespace content - will be automatically discarded. Tiles will be stored as TFRecords, a binary file format used to improve dataset reading performance during training. Each slide will have its own TFRecord file containing its extracted tiles. - -3) **Set aside final evaluation set**. Using the project annotations CSV file, designate which slides should be saved for final evaluation. - -4) **Establish training and validation dataset**. By default, three-fold cross-validation will be performed during training. Many other validation strategies are also supported (:ref:`validation_planning`). - -Step 3: Model training -********************** - -5) **Choose hyperparameters**. Before training can begin, you must choose both a model architecture (e.g. InceptionV3, VGG16, ResNet, etc.) and a set of hyperparameters (e.g. batch size, learning rate, etc.). This can be done explicitly one at a time, or an automatic hyperparameter sweep can be configured. - -6) **Initiate training**. Train your model across all desired hyperparameters and select the best-performing hyperparameter combination for final evaluation testing. - -Step 4: Model evaluation -************************ -Validation testing is performed both during training - at specified epochs - and after training has completed. Various metrics are recorded in the project directory at these intervals to assist with model performance assessment, including: - -- **Training and validation loss** -- **Training and validation accuracy** (for categorical outcomes) -- **Tile-level, slide-level, and patient-level AUROC and AP** (for categorical outcomes) -- **Tile-level, slide-level, and patient-level scatter plots with R-squared** (for continuous outcomes) -- **Tile-level, slide-level, and patient-level C-index** (for Cox Proportional Hazards models) -- **Histograms of predictions** (for continuous outcomes) - -Step 5: Heatmaps -**************** -In addition to the above metrics, performance of a trained model can be assessed by visualizing predictions for a set slides as heatmaps. - -.. image:: heatmap_example.png - -Step 6: Mosaic maps -******************* -Finally, learned image features can be visualized using dimensionality reduction on model layer activations. A set of image tiles is first provided to your trained model, which calculates activations at a specified intermediate layer. Tile-level activations are then plotted with dimensionality reduction (UMAP), and points on the plot are replaced with image tiles, generating a mosaic map. - -.. image:: mosaic_example.png diff --git a/docs/_sources/plugins.rst.txt b/docs/_sources/plugins.rst.txt new file mode 100644 index 000000000..d4b88d1c0 --- /dev/null +++ b/docs/_sources/plugins.rst.txt @@ -0,0 +1,95 @@ +.. _plugins: + +Creating a Slideflow Plugin +=========================== + +Slideflow has been designed to be extensible, and we encourage users to contribute their own plugins to the Slideflow ecosystem. Plugins can be used to add new functionality to Slideflow, such as new feature extractors or new model architectures. This page provides an overview of how to create and use plugins with Slideflow. + + +MIL Model Registration +---------------------- + +As discussed in :ref:`custom_mil`, Slideflow supports the registration of custom MIL models. This is done by using the ``register_model`` decorator to register a custom MIL model. + +For example, suppose you have a custom MIL model called ``MyMILModel`` that you want to register with Slideflow. You've already designed the model such that it meets Slideflow's MIL `requirements `__. Now you want to make it available for use directly within Slideflow. You can accomplish this by using the ``register_model`` decorator: + +.. code-block:: python + + from slideflow.model.mil import register_model + + @register_model + def my_mil_model(**kwargs): + from . import MyMILModel + return MyMILModel(**kwargs) + +Once this code is run, the custom MIL model will be available for use with Slideflow: + +.. code-block:: python + + import slideflow as sf + + model = sf.build_mil_model("my_mil_model") + + +Feature Extractors +------------------ + +Similarly, Slideflow supports the integration of custom feature extractors via the ``register_torch`` and ``register_tf`` decorators. Please see our detailed `developer note `__ for more information on how to create and register custom extractors. Briefly, you can register a custom feature extractor with Slideflow as follows: + +.. code-block:: python + + from slideflow.model.extractors import register_torch + + @register_torch + def my_foundation_model(**kwargs): + from . import MyFoundationModel + return MyFoundationModel(**kwargs) + + +Creating a Plugin +----------------- + +Once you have a custom MIL model or feature extractor that you want to integrate with Slideflow, you can create a plugin to make it available to other users. + +Slideflow supports external plugins via standard Python entry points, allowing you to publish your own package that integrates with Slideflow. + +In your package's ``setup.py`` file, use the "entry_points" key to connect with the Slideflow plugin interface: + +.. code-block:: python + + ..., + entry_points={ + 'slideflow.plugins': [ + 'extras = my_package:register_extras', + ], + }, + +Then, in your package's root ``__init__.py`` file, write a ``register_extras()`` function that does any preparation needed to initialize or import your model. + +(in ``my_package/__init__.py``) + +.. code-block:: python + + def register_extras(): + # Import the model, and do any other necessary preparation. + # If my_module contains the @register_model decorator, + # the model will be registered with Slideflow automatically. + from . import my_module + + print("Registered MyFoundationModel") + +You can then build and distribute your plugin, and once installed, the registration with Slideflow will happen automatically: + +.. code-block:: bash + + pip install my_package + + +.. code-block:: python + + import slideflow as sf + + model = sf.build_feature_extractor("my_foundation_model") + + +For a complete example, head over to our `Slideflow-GPL `_ and `Slideflow-NonCommercial `_ repositories, which have been built using the plugin system described above. \ No newline at end of file diff --git a/docs/_sources/posthoc.rst.txt b/docs/_sources/posthoc.rst.txt new file mode 100644 index 000000000..11f46acc7 --- /dev/null +++ b/docs/_sources/posthoc.rst.txt @@ -0,0 +1,259 @@ +.. currentmodule:: slideflow.model + +.. _activations: + +Layer Activations +================= + +Investigating the latent space of a neural network can provide useful insights into the structure of your data and what models have learned during training. Slideflow provides several tools for post-hoc latent space analysis of trained neural networks, primarily by calculating activations at one or more neural network layers for all images in a dataset. In the next sections, we will take a look at how these layer activations can be calculated for downstream analysis and provide examples of analyses that can be performed. + +Calculating Layer Activations +***************************** + +Activations at one or more layers of a trained network can be calculated with :class:`slideflow.model.Features` and :class:`slideflow.DatasetFeatures`. The former provides an interface for calculating layer activations for a batch of images, and the latter supervises calculations across an entire dataset. + +Batch of images +--------------- + +:class:`Features` provides an interface for calculating layer activations and predictions on a batch of images. The following arguments are available: + +- ``path``: Path to model, from which layer activations are calculated. Required. +- ``layers``: Layer(s) at which to calculate activations. +- ``include_preds``: Also return the final network output (predictions) +- ``pooling``: Apply pooling to layer activations, to reduce dimensionality to one dimension. + +If ``layers`` is not supplied, activations at the post-convolutional layer will be calculated by default. + +Once initialized, the resulting object can be called on a batch of images and will return the layer activations for all images in the batch. For example, to calculate activations at the ``sep_conv_3`` layer of a model while looping through a dataset: + +.. code-block:: python + + import slideflow as sf + + sepconv3 = sf.model.Features('model/path', layer='sep_conv_3') + for img_batch in dataset: + postconv_activations = sepconv3(img_batch) + +If ``layer`` is a list of layer names, activations at each layer will be calculated and concatenated. If ``include_preds`` is ``True``, the interface will also return the final predictions: + +.. code-block:: python + + sepconv3_and_preds = sf.model.Features(..., include_preds=True) + layer_activations, preds = sepconv3_and_preds(img_batch) + +.. note:: + + :class:`Features` assumes that image batches already have any necessary preprocessing already applied, including standardization and stain normalization. + +See the API documentation for :class:`Features` for more information. + +Single slide +------------ + +Layer activations can also be calculated across an entire slide using the same :class:`Features` interface. Calling the object on a :class:`slideflow.WSI` object will generate a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``: + +.. code-block:: python + + import slideflow as sf + + slide = sf.WSI(...) + postconv = sf.model.Features('/model/path', layers='postconv') + feature_grid = postconv(slide) + print(feature_grid.shape) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + (50, 45, 2048) + +.. _dataset_features: + +Entire dataset +-------------- + +Finally, layer activations can also be calculated for an entire dataset using :class:`slideflow.DatasetFeatures`. Instancing the class supervises the calculation and caching of layer activations, which can then be used for downstream analysis. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class. + +.. code-block:: python + + dts_ftrs = P.generate_features('/path/to/trained_model') + +Alternatively, you can create an instance of this class directly: + +.. code-block:: python + + import slideflow as sf + + dataset = P.dataset(tile_px=299, tile_um=302) + dts_ftrs = sf.DatasetFeatures( + model='/path/to/trained_model', + dataset=dataset, + ) + +Tile-level feature activations for each slide can be accessed directly from ``DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Predictions are stored in ``DatasetFeatures.predictions``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_classes)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``). + +Activations can be exported to a Pandas DataFrame with :meth:`slideflow.DatasetFeatures.to_df` or exported into PyTorch format with :meth:`slideflow.DatasetFeatures.to_torch`. See :ref:`features` for more information about generating and exporting features for MIL models. + +Read the API documentation for :class:`slideflow.DatasetFeatures` for more information. + +.. _slidemap: + +Mapping Activations +******************* + +Layer activations across a dataset can be dimensionality reduced with UMAP and plotted for visualization using :meth:`slideflow.DatasetFeatures.map_activations`. This function returns an instance of :class:`slideflow.SlideMap`, a class that provides easy access to labeling and plotting. + +The below example calculates layer activations at the neural network layer ``sep_conv_3`` for an entire dataset, and then reduces the activations into two dimensions for easy visualization using UMAP. Any valid `UMAP parameters `_ can be passed via keyword argument. + +.. code-block:: python + + dts_ftrs = P.generate_features( + model='/path/to/trained_model', + layers='sep_conv_3' + ) + slide_map = dts_ftrs.map_activations( + n_neighbors=10, # UMAP parameter + min_dist=0.2 # UMAP parameter + ) + +We can then plot the activations with :meth:`slideflow.SlideMap.plot`. All keyword arguments are passed to the `matplotlib scatter `_ function. + +.. code-block:: python + + import matplotlib.pyplot as plt + + slide_map.plot(s=10) + plt.show() + +We can add labels to our plot by first passing a dictionary with slide labels to the function :meth:`slideflow.SlideMap.label_by_slide`. + +.. code-block:: python + + # Get a dictionary mapping slide names to category labels + dataset = P.dataset(tile_px=299, tile_um='10x') + labels, unique_labels = dataset.labels('subtype', format='name') + + # Assign the labels to the slide map, then plot + slide_map.label_by_slide(labels) + slide_map.plot() + +.. image:: umap_example.png + +| + +Finally, we can use :meth:`SlideMap.umap_transform` to project new data into two dimensions using the previously fit UMAP. + +.. code-block:: python + + import slideflow as sf + import numpy as np + + # Create a SlideMap using layer activations reduced with UMAP + dts_ftrs = P.generate_features( + model='/path/to/trained_model', + layers='sep_conv_3' + ) + slide_map = dts_ftrs.map_activations() + + # Load some dummy data. + # Second dimension must match size of activation vector. + dummy = np.random.random((100, 1024)) + + # Transform the data using the already-fit UMAP. + transformed = slide_map.umap_transform(dummy) + print(transformed.shape) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + (100, 2) + +Read more about additional :class:`slideflow.SlideMap` functions, including saving, loading, and clustering, in the linked API documentation. + +.. _mosaic_map: + +Mosaic Maps +*********** + +Mosaic maps provide a tool for visualizing the distribution of histologic image features in a dataset through analysis of neural network layer activations. Similar to `activation atlases `_, a mosaic map is generated by first calculating layer activations for a dataset, dimensionality reducing these activations with `UMAP `_, and then overlaying corresponding images in a grid-wise fashion. + +.. image:: mosaic_example.png + +| + +In the previous sections, we reviewed how to calculate layer activations across a dataset, and then dimensionality reduce these activations into two dimensions using UMAP. :class:`slideflow.Mosaic` provides a tool for converting these activation maps into a grid of image tiles plotted according to their associated activation vectors. + +Quickstart +---------- + +The fastest way to build a mosaic map is using :class:`slideflow.Project.generate_mosaic`, which requires a ``DatasetFeatures`` object as its only mandatory argument and returns an instance of :class:`slideflow.Mosaic`. + +.. code-block:: python + + dts_ftrs = P.generate_features('/path/to/trained_model', layers='postconv') + mosaic = P.generate_mosaic(dts_ftrs) + mosaic.save('mosaic.png') + +When created with this interface, the underlying :class:`slideflow.SlideMap` object used to create the mosaic map is accessible via ``slideflow.Mosaic.slide_map``. You could, for example, use :func:`slideflow.SlideMap.save` to save the UMAP plot: + +.. code-block:: python + + mosiac.slide_map.save('umap.png') + +From a SlideMap +--------------- + +Any ``SlideMap`` can be converted to a mosaic map with :meth:`slideflow.SlideMap.generate_mosaic()`. + +.. code-block:: python + + ftrs = P.generate_features('/path/to/model') + slide_map = ftrs.map_activations() + mosaic = slide_map.generate_mosaic() + mosaic.save('mosaic.png') + +Manual creation +--------------- + +Mosaic maps can be flexibly created with :class:`slideflow.Mosaic`, requiring two components: a set of images and corresponding coordinates. Images and coordinates can either be manually provided, or the mosaic can dynamically read images from TFRecords (as is done with :meth:`Project.generate_mosaic()`). + +The first argument of :class:`slideflow.Mosaic` provides the images, and may be either of the following: + +- A list or array of images (np.ndarray, HxWxC) +- A list of tuples, containing ``(slide_name, tfrecord_index)`` + + +The second argument provides the coordinates: + +- A list or array of (x, y) coordinates for each image + + +For example, to create a mosaic map from a list of images and coordinates: + +.. code-block:: python + + # Example data (images are HxWxC, np.ndarray) + images = [np.ndarray(...), ...] + coords = [(0.2, 0.9), ...] + + # Generate the mosaic + mosaic = Mosaic(images, coordinates) + mosaic.plot() + +You can also generate a mosaic map where the images are tuples of `(tfrecord, tfrecord_index)`. In this case, the mosaic map will dynamically read images from TFRecords during plotting. + +.. code-block:: python + + # Example data + tfrecords = ['/path/to/tfrecord`.tfrecords', ...] + idx = [253, 112, ...] + coords = [(0.2, 0.9), ...] + + # Generate mosaic map + mosaic = sf.Mosaic( + images=[(tfr, idx) for tfr, idx in zip(tfrecords, idx)], + coords=coords + ) + +There are several additional arguments that can be used to customize the mosaic map plotting. Read the linked API documentation for :class:`slideflow.Mosaic` for more information. \ No newline at end of file diff --git a/docs/_sources/project.rst.txt b/docs/_sources/project.rst.txt index de0821f16..b423dad7a 100644 --- a/docs/_sources/project.rst.txt +++ b/docs/_sources/project.rst.txt @@ -1,11 +1,79 @@ .. currentmodule:: slideflow +.. _project: + slideflow.Project ================= -This class provides a high-level interface that simplifies execution of pipeline functions. Nearly all pipeline tasks -can be accomplished with the methods in this class, although directly interacting with the various objects in this -package will enable more granular control. - .. autoclass:: Project - :members: + +Attributes +---------- + +.. autosummary:: + + Project.annotations + Project.dataset_config + Project.eval_dir + Project.models_dir + Project.name + Project.neptune_api + Project.neptune_workspace + Project.sources + +Methods +------- + +.. autofunction:: slideflow.Project.add_source + +.. autofunction:: slideflow.Project.associate_slide_names + +.. autofunction:: slideflow.Project.cell_segmentation + +.. autofunction:: slideflow.Project.create_blank_annotations + +.. autofunction:: slideflow.Project.create_hp_sweep + +.. autofunction:: slideflow.Project.evaluate + +.. autofunction:: slideflow.Project.evaluate_mil + +.. autofunction:: slideflow.Project.extract_cells + +.. autofunction:: slideflow.Project.extract_tiles + +.. autofunction:: slideflow.Project.gan_train + +.. autofunction:: slideflow.Project.gan_generate + +.. autofunction:: slideflow.Project.generate_features + +.. autofunction:: slideflow.Project.generate_feature_bags + +.. autofunction:: slideflow.Project.generate_heatmaps + +.. autofunction:: slideflow.Project.generate_mosaic + +.. autofunction:: slideflow.Project.generate_mosaic_from_annotations + +.. autofunction:: slideflow.Project.generate_tfrecord_heatmap + +.. autofunction:: slideflow.Project.dataset + +.. autofunction:: slideflow.Project.predict + +.. autofunction:: slideflow.Project.predict_ensemble + +.. autofunction:: slideflow.Project.predict_wsi + +.. autofunction:: slideflow.Project.save + +.. autofunction:: slideflow.Project.smac_search + +.. autofunction:: slideflow.Project.train + +.. autofunction:: slideflow.Project.train_ensemble + +.. autofunction:: slideflow.Project.train_mil + +.. autofunction:: slideflow.Project.train_simclr diff --git a/docs/_sources/project_setup.rst.txt b/docs/_sources/project_setup.rst.txt index 0027830bd..54d8dfd7d 100644 --- a/docs/_sources/project_setup.rst.txt +++ b/docs/_sources/project_setup.rst.txt @@ -1,22 +1,23 @@ +.. _project_setup: + Setting up a Project ==================== -The easiest way to use ``slideflow`` is through the bundled project management class, :class:`slideflow.Project`, which supports unified datasets, annotations, and project directory structure for all pipeline functions. +Slideflow :ref:`Projects ` organize datasets, annotations, and results into a unified directory and provide a high-level API for common tasks. -To initialize a new project, pass keyword arguments to :class:`slideflow.Project` with project settings: +Use :func:`slideflow.create_project` to create a new project, supplying an annotations file (with patient labels) and path to slides. A new dataset source (collection of slides and tfrecords) will be configured. Additional keyword arguments can be used to specify the location of trecords and saved models. .. code-block:: python import slideflow as sf - P = sf.Project( - '/path/to/project/directory', - name="MyProject", + P = sf.create_project( + root='project_path', annotations="./annotations.csv" - ... + slides='/path/to/slides/' ) -A project will then be initialized at the given directory, with settings saved in a ``settings.json`` file. Any project settings not provided via keyword arguments will use defaults. Each project will have the following settings: +Project settings are saved in a ``settings.json`` file in the root project directory. Each project will have the following settings: +-------------------------------+-------------------------------------------------------+ | **name** | Project name. | @@ -29,7 +30,7 @@ A project will then be initialized at the given directory, with settings saved i | **dataset_config** | Path to JSON file containing dataset configuration. | | | Defaults to "./datasets.json" | +-------------------------------+-------------------------------------------------------+ -| **sources** | Names of dataset(s) to include in the project. | +| **sources** | Names of dataset source(s) to include in the project. | | | Defaults to an empty list. | +-------------------------------+-------------------------------------------------------+ | **models_dir** | Path, where model files and results are saved. | @@ -44,29 +45,17 @@ Once a project has been initialized at a directory, you may then load the projec .. code-block:: python import slideflow as sf - P = sf.Project('/path/to/project/directory') + P = sf.load_project('/path/to/project/directory') -Pipeline functions are then called on the project object ``P``. +.. _dataset_sources: -Alternatively, you can use the bundled ``run_project.py`` script to execute project functions stored in ``actions.py`` files in project directories. When ``run_project.py`` is run, it initializes a ``Project`` object at a given directory, then looks for and loads an ``actions.py`` file in this directory, executing functions contained therein. +Dataset Sources +*************** -To create a new project with this script, or execute functions on an existing project, use the following syntax: +A :ref:`dataset source ` is a collection of slides, Regions of Interest (ROI) annotations (if available), and extracted tiles. Sources are defined in the project dataset configuration file, which can be shared and used across multiple projects or saved locally within a project directory. These configuration files have the following format: .. code-block:: bash - $ python3 run_project.py -p /path/to/project/directory - -where the -p flag is used to designate the path to your project directory. Other available flags can be seen by running ``python3 run_project.py --help``. - -Configuring Datasets -******************** - -Once initial project settings are established, you will need to either create or load a dataset configuration, which will specify directory locations for slides, ROIs, tiles, and TFRecords for each group of slides. - -Dataset configurations are saved in a JSON file with the below syntax. Dataset configuration files can be shared and used across multiple projects, or saved locally within a project directory. - -.. code-block:: json - { "SOURCE": { @@ -77,20 +66,24 @@ Dataset configurations are saved in a JSON file with the below syntax. Dataset c } } -Add a new dataset source to a project with ``Project.add_dataset()``, which will save the dataset in JSON format to the project dataset configuration file. +When a project is created with :func:`slideflow.create_project`, a dataset source is automatically created. You can change where slides and extracted tiles are stored by editing the project's dataset configuration file. + +It is possible for a project to have multiple dataset sources - for example, you may choose to organize data from multiple institutions into separate sources. You can add a new dataset source to a project with :meth:`Project.add_source`, which will update the project dataset configuration file accordingly. .. code-block:: python P.add_source( - name="NAME", + name="SOURCE_NAME", slides="/slides/directory", roi="/roi/directory", tiles="/tiles/directory", tfrecords="/tfrecords/directory" ) -Setting up annotations -********************** +Read more about :ref:`working with datasets `. + +Annotations +*********** Your annotations file is used to label patients and slides with clinical data and/or other outcome variables that will be used for training. Each line in the annotations file should correspond to a unique slide. Patients may have more than one slide. @@ -119,37 +112,4 @@ An example annotations file is generated each time a new project is initialized. P.create_blank_annotations() -The ``slide`` column may not need to be explicitly set in the annotations file by the user. Rather, once a dataset has been set up, slideflow will search through the linked slide directories and attempt to match slides to entries in the annotations file using **patient**. Entries that are blank in the **slide** column will be auto-populated with any detected and matching slides, if available. - -.. _execute: - -Executing commands -****************** - -If you plan to use the ``run_project.py`` script for your projects, open the ``actions.py`` file located in the project directory. It should look something like this: - -.. code-block:: python - - def main(P): - #P.extract_tiles(tile_px=299, tile_um=302) - - #P.train( - # "category", - # filters = { - # 'category': ['NEG', 'POS'], - # 'dataset': 'train' - # }, - #) - - #model = '/path_to_model/' - #P.evaluate(model, outcomes="category", filters={'dataset': 'eval'}) - #P.generate_heatmaps(model_to_evaluate) - pass - -The ``main()`` function contains several example functions. These serve as examples to help remind you of functions and arguments you can use on projects. - -To execute the commands you have prepared in this file, execute the ``run_project.py`` script pointing to your project directory. - -.. code-block:: bash - - $ python3 run_project.py -p /path/to/project/directory \ No newline at end of file +The ``slide`` column may not need to be explicitly set in the annotations file by the user. Rather, once a dataset has been set up, slideflow will search through the linked slide directories and attempt to match slides to entries in the annotations file using **patient**. Entries that are blank in the **slide** column will be auto-populated with any detected and matching slides, if available. \ No newline at end of file diff --git a/docs/_sources/quickstart.rst.txt b/docs/_sources/quickstart.rst.txt new file mode 100644 index 000000000..880eb0fb9 --- /dev/null +++ b/docs/_sources/quickstart.rst.txt @@ -0,0 +1,168 @@ +Quickstart +========== + +This section provides an example of using Slideflow to build a deep learning classifier from digital pathology slides. Follow the links in each section for more information. + +Preparing a project +******************* + +Slideflow experiments are organized using :class:`slideflow.Project`, which supervises storage of data, saved models, and results. The ``slideflow.project`` module has three preconfigured projects with associated slides and clinical annotations: ``LungAdenoSquam``, ``ThyroidBRS``, and ``BreastER``. + +For this example, we will the ``LungAdenoSquam`` project to train a classifier to predict lung adenocarcinoma (Adeno) vs. squamous cell carcinoma (Squam). + +.. code-block:: python + + import slideflow as sf + + # Download preconfigured project, with slides and annotations. + project = sf.create_project( + root='data', + cfg=sf.project.LungAdenoSquam(), + download=True + ) + +Read more about :ref:`setting up a project on your own data `. + +Data preparation +**************** + +The core imaging data used in Slideflow are image tiles :ref:`extracted from slides ` at a specific magnification and pixel resolution. Tile extraction and downstream image processing is handled through the primitive :ref:`slideflow.Dataset `. We can request a ``Dataset`` at a given tile size from our project using :meth:`slideflow.Project.dataset`. Tile magnification can be specified in microns (as an ``int``) or as optical magnification (e.g. ``'40x'``). + +.. code-block:: python + + # Prepare a dataset of image tiles. + dataset = project.dataset( + tile_px=299, # Tile size, in pixels. + tile_um='10x' # Tile size, in microns or magnification. + ) + dataset.summary() + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 0 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +Tile extraction +--------------- + +We prepare imaging data for training by extracting tiles from slides. Background areas of slides will be filtered out with Otsu's thresholding. + +.. code-block:: python + + # Extract tiles from all slides in the dataset. + dataset.extract_tiles(qc='otsu') + +Read more about tile extraction and :ref:`slide processing in Slideflow `. + +Held-out test sets +------------------ + +Now that we have our dataset and we've completed the initial tile image processing, we'll split the dataset into a training cohort and a held-out test cohort with :meth:`slideflow.Dataset.split`. We'll split while balancing the outcome ``'subtype'`` equally in the training and test dataset, with 30% of the data retained in the held-out set. + +.. code-block:: python + + # Split our dataset into a training and held-out test set. + train_dataset, test_dataset = dataset.split( + model_type='classification', + labels='subtype', + val_fraction=0.3 + ) + +Read more about :ref:`Dataset management `. + +Configuring models +****************** + +Neural network models are prepared for training with :class:`slideflow.ModelParams`, through which we define the model architecture, loss, and hyperparameters. Dozens of architectures are available in both the Tensorflow and PyTorch backends, and both neural network :ref:`architectures ` and :ref:`loss ` functions can be customized. In this example, we will use the included Xception network. + +.. code-block:: python + + # Prepare a model and hyperparameters. + params = sf.ModelParams( + tile_px=299, + tile_um='10x', + model='xception', + batch_size=64, + learning_rate=0.0001 + ) + +Read more about :ref:`hyperparameter optimization in Slideflow `. + +Training a model +**************** + +Models can be trained from these hyperparameter configurations using :meth:`Project.train`. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is :ref:`highly configurable `. In this case, we are training a binary categorization model to predict the outcome ``'subtype'``, and we will distribute training across multiple GPUs. + +By default, Slideflow will train/validate on the full dataset using k-fold cross-validation, but validation settings :ref:`can be customized `. If you would like to restrict training to only a subset of your data - for example, to leave a held-out test set untouched - you can manually specify a dataset for training. In this case, we will train on ``train_dataset``, and allow Slideflow to further split this into training and validation using three-fold cross-validation. + +.. code-block:: python + + # Train a model from a set of hyperparameters. + results = P.train( + 'subtype', + dataset=train_dataset, + params=params, + val_strategy='k-fold', + val_k_fold=3, + multi_gpu=True, + ) + +Models and training results will be saved in the project ``models/`` folder. + +Read more about :ref:`training a model `. + +Evaluating a trained model +************************** + +After training, you can test model performance on a held-out test dataset with :meth:`Project.evaluate`, or generate predictions without evaluation (when ground-truth labels are not available) with :meth:`Project.predict`. As with :meth:`Project.train`, we can specify a :class:`slideflow.Dataset` to evaluate. + +.. code-block:: python + + # Train a model from a set of hyperparameters. + test_results = P.evaluate( + model='/path/to/trained_model_epoch1' + outcomes='subtype', + dataset=test_dataset + ) + +Read more about :ref:`model evaluation `. + +Post-hoc analysis +***************** + +Slideflow includes a number of analytical tools for working with trained models. Read more about :ref:`heatmaps `, :ref:`model explainability `, :ref:`analysis of layer activations `, and real-time inference in an interactive :ref:`whole-slide image reader `. \ No newline at end of file diff --git a/docs/_sources/saliency.rst.txt b/docs/_sources/saliency.rst.txt new file mode 100644 index 000000000..114f26ba8 --- /dev/null +++ b/docs/_sources/saliency.rst.txt @@ -0,0 +1,137 @@ +.. _saliency: + +Saliency Maps +============= + +Slideflow provides an API for calculating gradient-based pixel attribution (saliency maps), as implemented by `PAIR `_. Saliency maps can be calculated manually (as described below), or interactively in :ref:`Slideflow Studio `. + +:class:`slideflow.grad.SaliencyMap` provides an interface for preparing a saliency map generator from a loaded model (Tensorflow or PyTorch) and calculating maps from preprocessed images. Supported methods include: + +- Vanilla gradients +- Integrated gradients +- Guided integrated gradients +- Blur integrated gradients +- XRAI +- Grad-CAM + +Generating a Saliency Map +------------------------- + +Creating a saliency map with :class:`slideflow.grad.SaliencyMap` requires two components: a loaded model and a preprocessed image. Trained models can be loaded from disk with :func:`slideflow.model.load`, and the model's preprocessing function can be prepared with :func:`slideflow.util.get_preprocess_fn`. + +.. code-block:: python + + import slideflow as sf + + # Load a trained model and preprocessing function. + model = sf.model.load('../saved_model') + preprocess = sf.util.get_preprocess_fn('../saved_model') + + # Prepare a SaliencyMap + sal_map = SaliencyMap(model, class_idx=0) + + +There are several ways you might acquire an image to use for a saliency map. To load an image tile from a whole-slide image, you can index a :class:`slideflow.WSI` object: + +.. code-block:: python + + import slideflow as sf + + # Load a whole-slide image. + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + + # Extract a tile using grid indexing. + image = wsi[10, 25] + +.. image:: saliency_source.jpg + :width: 299px + +| + +Alternatively, if you know the coordinates for an image tile and want to extract it from TFRecords, you can use :meth:`slideflow.Dataset.read_tfrecord_by_location`: + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + P = sf.Project(...) + dataset = P.dataset(tile_px=299, tile_um=302) + + # Get the tile from slide "12345" at location (2000, 2000) + slide, image = dataset.read_tfrecord_by_location( + slide='12345', + loc=(2000, 2000) + ) + +Once you have an image and a loaded ``SaliencyMap`` object, you can calculate a saliency map from the preprocessed image: + +.. code-block:: python + + mask = sal_map.integrated_gradients(preprocess(image)) + + +Plotting a Saliency Map +----------------------- + +Once a saliency map has been created, you can plot the image as a heatmap or as an overlay. The ``slideflow.grad`` submodule includes several utility functions to assist with plotting. For example, to plot a basic heatmap using the ``inferno`` matplotlib colormap, use :func:`slideflow.grad.plot_utils.inferno`: + +.. code-block:: python + + from PIL import Image + from slideflow.grad.plot_utils import inferno + + pil_image = Image.fromarray(inferno(mask)) + pil_image.show() + +.. image:: saliency_heatmap.jpg + :width: 299px + +| + +To plot this saliency map as an overlay, use :func:`slideflow.grad.plot_utils.overlay`, passing in both the unprocessed image and the saliency map: + +.. code-block:: python + + from PIL import Image + from slideflow.grad.plot_utils import overlay + + overlay_img = overlay(image.numpy(), mask) + pil_image = Image.fromarray(overlay_img) + pil_image.show() + +.. image:: saliency_overlay.jpg + :width: 299px + +| + +Complete Example +---------------- + +The following is a complete example for how to calculate and plot a saliency map for an image tile taken from a whole-slide image. + + +.. code-block:: python + + import slideflow as sf + from slideflow.grad import SaliencyMap + from slideflow.grad.plot_utils import overlay + from PIL import Image + + # Load a slide and find the desired image tile. + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + image = wsi[20, 20] + + # Load a model and preprocessing function. + model = sf.model.load_model(../saved_model) + preprocess = sf.util.get_preprocess_fn('../saved_model') + + # Prepare the saliency map + sal_map = SaliencyMap(model, class_idx=0) + + # Calculate saliency map using integrated gradients. + ig_map = sal_map.integrated_gradients(preprocess(image)) + + # Display the saliency map as an overlay. + overlay_img = overlay(image, ig_map) + Image.fromarray(overlay_img).show() diff --git a/docs/_sources/segmentation.rst.txt b/docs/_sources/segmentation.rst.txt new file mode 100644 index 000000000..11aac6600 --- /dev/null +++ b/docs/_sources/segmentation.rst.txt @@ -0,0 +1,292 @@ +.. currentmodule:: slideflow.segmentation + +.. _segmentation: + +Tissue Segmentation +=================== + +In addition to classification tasks, Slideflow also supports training and deploying whole-slide tissue segmentation models. Segmentation models identify and label regions of interest in a slide, and can be used for tasks such as tumor identification, tissue labeling, or quality control. Once trained, these models can be used for :ref:`slide QC `, generating :ref:`regions of interest `, or live deployment in :ref:`Slideflow Studio `. + +.. note:: + + Tissue segmentation requires PyTorch. Dependencies can be installed with ``pip install slideflow[torch]``. + +Segmentation Modes +------------------ + +Tissue segmentation is performed at the whole-slide level, trained on randomly cropped sections of the slide thumbnail at a specified resolution. Slideflow supports three segmentation modes: + +- ``'binary'``: For binary segmentation, the goal is to differentiate a single tissue type from background. +- ``'multiclass'``: For multiclass segmentation, the goal is twofold: differentiate tissue from background, and assign a class label to each identified region. This is useful in instances where regions have non-overlapping labels. +- ``'multilabel'``: For multilabel segmentation, the goal is to assign each tissue type to a class, but regions may have overlapping labels. + +Generating Data +--------------- + +.. note:: + Segmentation thumbnails and masks do not need to be explicitly exported prior to training. They will be generated automatically during training if they do not exist. However, exporting them beforehand can be useful for data visualization, troubleshooting, and computational efficiency. + + +Segmentation models in Slideflow are trained on regions of interest, which can be generated as discussed in :ref:`regions_of_interest` and :ref:`studio_roi`. Once ROIs have been generated and (optionally) labeled, whole-slide thumbnails and ROI masks can be exported using ``segment.export_thumbs_and_masks()``. The ``mpp`` argument specifies the resolution of the exported images in microns-per-pixel. We recommend ``mpp=20`` for a good balance between image size and memory requirements, or ``mpp=10`` for tasks needing higher resolution. + +.. code-block:: python + + from slideflow import segment + + # Load a project and dataset + project = slideflow.load_project('path/to/project') + dataset = project.dataset() + + # Export thumbnails and masks + segment.export_thumbs_and_masks( + dataset, + mpp=20, # Microns-per-pixel resolution + dest='path/to/output' + ) + +By default, ROIs are exported as binary masks. To export multidimensional masks for multiclass or multilabel applications, use the ``mode`` and ``labels`` arguments. When ``mode`` is ``'multiclass'`` or ``'multilabel'``, masks will be exported in (N, W, H) format, where N is the number of unique ROI labels. The ``labels`` argument should be a list of strings corresponding to the ROI labels in the dataset that should be included. + +.. code-block:: python + + ... + + # Export thumbnails and masks + segment.export_thumbs_and_masks( + dataset, + mpp=20, # Microns-per-pixel resolution + dest='path/to/output', + mode='multiclass', + labels=['tumor', 'stroma', 'necrosis'] + ) + + +Training a Model +---------------- + +Segmentation models are configured using a :class:`segment.SegmentConfig` object. This object specifies the model architecture, image resolution (MPP), training parameters, and other settings. For example, to configure a model for multiclass segmentation with a resolution of 20 MPP, use: + +.. code-block:: python + + from slideflow import segment + + # Create a config object + config = segment.SegmentConfig( + mpp=20, # Microns-per-pixel resolution + size=1024, # Size of cropped/rotated images during training + mode='multiclass', + labels=['tumor', 'stroma', 'necrosis'], + arch='Unet', + encoder_name='resnet34', + train_batch_size=16, + epochs=10, + lr=1e-4, + ) + +Slideflow uses the `segmentation_models_pytorch `_ library to implement segmentation models. The ``arch`` argument specifies the model architecture, and the ``encoder_name`` argument specifies the encoder backbone. See available models and encoders in the `segmentation_models_pytorch documentation `_. + +The segmentation model can then be trained using the :func:`segment.train` function. This function takes a :class:`segment.SegmentConfig` object and a :class:`slideflow.Dataset` object as arguments. During training, segmentation thumbnails and masks are randomly cropped to the specified ``size``, and images/masks then undergo augmentation with random flipping/rotating. + +For example, to train a model for binary segmentation with a resolution of 20 MPP, use: + +.. code-block:: python + + from slideflow import segment + + # Create a config object + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + + # Train the model + segment.train(config, dataset, dest='path/to/output') + +To use thumbnails and masks previously exported with :func:`segment.export_thumbs_and_masks`, specify the path to the exported data using the ``data_source`` argument. This is more computationally efficient than generating data on-the-fly during training. For example: + +.. code-block:: python + + from slideflow import segment + + # Export thumbnails and masks + segment.export_thumbs_and_masks(dataset, mpp=20, dest='masks/') + + # Create a config object + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + + # Train the model + segment.train(config, dataset, data_source='masks/', dest='path/to/output') + +After training, the model will be saved as a ``model.pth`` file in the destination directory specified by ``dest``, and the model configuration will be saved as a ``segment_config.json`` file. + +Model Inference +--------------- + +After training, models can be loaded using :func:`segment.load_model_and_config`. This function takes a path to a model file as an argument, and returns a tuple containing the model and configuration object. For example: + +.. code-block:: python + + from slideflow import segment + + # Load the model and config + model, config = segment.load_model_and_config('path/to/model.pth') + +To run inference on a slide, use the :meth:`segment.SegmentModel.run_slide_inference` method. This method takes a :class:`slideflow.WSI` object or str (path to slide) as an argument, and returns an array of pixel-level predictions. For binary models, the output shape will be ``(H, W)``. For multiclass models, the output shape will be ``(N+1, H, W)`` (the first channel is predicted background), and for multilabel models, the output shape will be ``(N, H, W)``, where ``N`` is the number of labels. + +.. code-block:: python + + from slideflow import segment + + # Load the model and config + model, config = segment.load_model_and_config('path/to/model.pth') + + # Run inference, returning an np.ndarray + pred = model.run_slide_inference('/path/to/slide') + +You can also run inference directly on an arbitrary image using the :meth:`segment.SegmentModel.run_tiled_inference` method. This method takes an image array (np.ndarray, in W, H, C format) as an argument, and returns an array of pixel-level predictions. Predictions are generated in tiles and merged. The output shape will be ``(H, W)`` for binary models, ``(N+1, H, W)`` for multiclass models, and ``(N, H, W)`` for multilabel models. + +Generating QC Masks +------------------- + +The :class:`slideflow.slide.qc.Segment` class provides an easy interface for generating QC masks from a segmentation model. This class takes a path to a trained segmentation model as an argument, and can be used for QC :ref:`as previously described `. For example: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load a project and dataset + project = sf.load_project('path/to/project') + dataset = project.dataset(299, 302) + + # Create a QC mask + segmenter = qc.Segment('/path/to/model.pth') + + # Extract tiles with this QC + dataset.extract_tiles(..., qc=segmenter) + +You can also use this interface for applying QC to a single slide: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load the slide + wsi = sf.WSI('/path/to/slide', ...) + + # Create the QC algorithm + segmenter = qc.Segment('/path/to/model.pth') + + # Apply QC + applied_mask = wsi.qc(segmenter) + +For binary models, the QC mask will filter out tiles that are predicted to be background. + +For multiclass models, the QC mask will filter out tiles predicted to be background (class index 0). This can be customized by setting ``class_idx`` to another value. For example, to create a QC algorithm that filters out tiles predicted to be tumor (class index 1), use: + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', class_idx=1) + +For multilabel models, the QC mask will filter out tiles predicted to be background for all class labels. This can be customized to filter out tiles based only on a specific class label by setting ``class_idx``. For example, to create a QC algorithm that filters out tiles that are not predicted to be tumor (class index 1) while ignoring predictions for necrosis (class index 2), use: + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', class_idx=1) + +In all cases, the thresholding direction can be reversed with by setting ``threshold_direction='greater'``. This might be useful, for example, if the segmentation model was trained to identify pen marks or artifacts, and you want to filter out areas predicted to be artifacts. + +.. code-block:: python + + segmenter = qc.Segment('/path/to/model.pth', threshold_direction='greater') + +Generating ROIs +--------------- + +The :class:`slideflow.slide.qc.Segment` also provides an easy interface for generating regions of interest (ROIs). Use :meth:`slideflow.slide.qc.Segment.generate_rois` method to generate and apply ROIs to a slide. If the segmentation model is multiclass or multilabel, generated ROIs will be labeled. For example: + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Load a project and dataset + wsi = sf.WSI('/path/to/slide', ...) + + # Create a QC mask + segmenter = qc.Segment('/path/to/model.pth') + + # Generate and apply ROIs to a slide + roi_outlines = segmenter.generate_rois(wsi) + +By default, this will apply generated ROIs directly to the :class:`slideflow.WSI` object. If you wish to calculate ROI outlines without applying them to the slide, use the argument ``apply=False``. + +In addition to generating ROIs for a single slide, you can also generate ROIs for an entire dataset using :meth:`slideflow.Dataset.generate_rois`. For example: + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Generate ROIs for all slides in the dataset. + dataset.generate_rois('path/to/model.pth') + +ROIs will be saved in the ROIs directory as configured in the dataset settings. Alternatively, ROIs can be exported to a user-defined directory using the ``dest`` argument. + +By default, ROIs will be generated for all slides in the dataset, skipping slides with existing ROIs. To overwrite any existing ROIs, use the ``overwrite=True`` argument. + + +Deployment in Studio +-------------------- + +.. video:: tissue_seg.mp4 + :autoplay: + +| + +Segmentation models can be deployed in :ref:`Slideflow Studio ` for live segmentation and QC. To do this, start by training a segmentation model as described above. Then, see the :ref:`studio_segmentation` documentation for instructions on how to deploy the model for live QC and/or ROI generation. + + +Complete Example +---------------- + +1. Label ROIs +************* + +Create labeled ROIs as described in :ref:`studio_roi`. + +2. Train a model +**************** + +.. code-block:: python + + import slideflow as sf + from slideflow import segment + + # Load a project and dataset + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Train a binary segmentation model + config = segment.SegmentConfig(mpp=20, mode='binary', arch='FPN') + segment.train(config, dataset, dest='path/to/output') + +3. Generate ROIs (optional) +*************************** + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset. + project = sf.load_project('path/to/project') + dataset = project.dataset() + + # Generate ROIs for all slides in the dataset. + dataset.generate_rois('path/to/model.pth') + +4. Deploy in Studio +******************* + +Use the model for either QC or ROI generation in Slideflow Studio, as described in :ref:`studio_segmentation`. + diff --git a/docs/_sources/simclr.rst.txt b/docs/_sources/simclr.rst.txt new file mode 100644 index 000000000..20fd6d6d1 --- /dev/null +++ b/docs/_sources/simclr.rst.txt @@ -0,0 +1,16 @@ +.. currentmodule:: slideflow.simclr + +slideflow.simclr +================ + +This module contains utility functions for training a SimCLR model. Please see +:ref:`simclr_ssl` for more information on the high-level API and recommended use. + +.. autofunction:: slideflow.simclr.get_args +.. autofunction:: slideflow.simclr.load +.. autofunction:: slideflow.simclr.load_model_args +.. autofunction:: slideflow.simclr.run_simclr + +.. autoclass:: slideflow.simclr.SimCLR +.. autoclass:: slideflow.simclr.SimCLR_Args +.. autoclass:: slideflow.simclr.DatasetBuilder diff --git a/docs/_sources/slide.rst.txt b/docs/_sources/slide.rst.txt index 12978469d..a1e97548e 100644 --- a/docs/_sources/slide.rst.txt +++ b/docs/_sources/slide.rst.txt @@ -1,19 +1,68 @@ .. currentmodule:: slideflow.slide slideflow.slide -===================== +=============== This module contains classes to load slides and extract tiles. For optimal performance, tile extraction should generally not be performed by instancing these classes directly, but by calling either :func:`slideflow.Project.extract_tiles` or :func:`slideflow.Dataset.extract_tiles`, which include performance optimizations and additional functionality. -WSI -*** +slideflow.WSI +************* + .. autoclass:: WSI - :inherited-members: -TMA -*** -.. autoclass:: TMA - :inherited-members: \ No newline at end of file +Attributes +---------- + +.. autosummary:: + + WSI.dimensions + WSI.qc_mask + WSI.levels + WSI.level_dimensions + WSI.level_downsamples + WSI.level_mpp + WSI.properties + WSI.slide + WSI.vendor + +Methods +------- + +.. autofunction:: slideflow.WSI.align_to +.. autofunction:: slideflow.WSI.align_tiles_to +.. autofunction:: slideflow.WSI.apply_qc_mask +.. autofunction:: slideflow.WSI.apply_segmentation +.. autofunction:: slideflow.WSI.area +.. autofunction:: slideflow.WSI.build_generator +.. autofunction:: slideflow.WSI.dim_to_mpp +.. autofunction:: slideflow.WSI.get_tile_mask +.. autofunction:: slideflow.WSI.get_tile_dataframe +.. autofunction:: slideflow.WSI.extract_cells +.. autofunction:: slideflow.WSI.extract_tiles +.. autofunction:: slideflow.WSI.export_rois +.. autofunction:: slideflow.WSI.has_rois +.. autofunction:: slideflow.WSI.load_csv_roi +.. autofunction:: slideflow.WSI.load_json_roi +.. autofunction:: slideflow.WSI.load_roi_array +.. autofunction:: slideflow.WSI.mpp_to_dim +.. autofunction:: slideflow.WSI.predict +.. autofunction:: slideflow.WSI.preview +.. autofunction:: slideflow.WSI.process_rois +.. autofunction:: slideflow.WSI.show_alignment +.. autofunction:: slideflow.WSI.square_thumb +.. autofunction:: slideflow.WSI.qc +.. autofunction:: slideflow.WSI.remove_qc +.. autofunction:: slideflow.WSI.remove_roi +.. autofunction:: slideflow.WSI.tensorflow +.. autofunction:: slideflow.WSI.torch +.. autofunction:: slideflow.WSI.thumb +.. autofunction:: slideflow.WSI.verify_alignment +.. autofunction:: slideflow.WSI.view + +Other functions +*************** + +.. autofunction:: slideflow.slide.predict \ No newline at end of file diff --git a/docs/_sources/slide_processing.rst.txt b/docs/_sources/slide_processing.rst.txt new file mode 100644 index 000000000..6b15b6275 --- /dev/null +++ b/docs/_sources/slide_processing.rst.txt @@ -0,0 +1,324 @@ +.. _filtering: + +Slide Processing +================ + +.. image:: tile_extraction_overview.png + +| + +Whole-slide histopathological images present many challenges for machine learning researchers, as these large gigapixel images may contain out-of-focus regions, pen marks, uneven staining, or varying optical resolutions. Slideflow provides tools for both flexible and computationally efficient slide processing in order to build datasets ready for machine learning applications. + +Most tools in Slideflow work with image tiles - extracted sub-regions of a whole-slide image - as the primary data source. For efficiency, image tiles are first buffered into :ref:`TFRecords ` , a binary file format that greatly improves IO throughput. Although training can be performed without using TFRecords (see :ref:`from_wsi`), we recommend tile extraction as the first step for most projects. + +Tile extraction +*************** + +Image tiles are extracted from whole-slide images using either :meth:`slideflow.Project.extract_tiles` or :meth:`slideflow.Dataset.extract_tiles`. When using the Project interface, the only arguments required are ``tile_px`` and ``tile_um``, which determine the size of the extracted image tiles in pixels and microns: + +.. code-block:: python + + P.extract_tiles(tile_px=299, tile_um=302) + +and when using a :class:`slideflow.Dataset`, no arguments are required. + +.. code-block:: python + + dataset.extract_tiles() + +Tiles will be extracted at the specified pixel and micron size and stored in TFRecord format. Loose image tiles (\*.jpg or \*.png format) can also be saved with the argument ``save_tiles=True``. + +See the :meth:`slideflow.Dataset.extract_tiles` API documentation for customization options. + +.. note:: + + Slide scanners may have differing microns-per-pixel (MPP) resolutions, so "10X" magnification from one scanner may be slightly different than "10X" on another scanner. Specifying a fixed ``tile_um`` ensures all image tiles have both the same pixel size and micron size. This MPP-harmonization step uses the `Libvips resize `_ function on extracted images. To disable this step and instead extract tiles at a given `downsample layer `_, set ``tile_um`` equal to a magnification level rather than micron size: + + .. code-block:: python + + P.extract_tiles(tile_px=299, tile_um="10x") + +Cell segmentation +***************** + +An alternative to extracting tiles in a grid across whole-slide images is extracting tiles at detected cell centroids. This is discussed separately in :ref:`cellseg`. + +.. _regions_of_interest: + +Regions of Interest +******************* + +Tile extraction can be optionally restricted based on pathologist-annotated Regions of Interest (ROI), allowing you to enrich your dataset by only using relevant sections of a slide. + +We offer two methods for annotating ROIs - :ref:`Slideflow Studio ` and `QuPath `_. Please see the Slideflow Studio section for instructions on generating ROI annotations using the Slideflow interface. + +If you are using QuPath, annotate whole-slide images using the Polygon tool. Then, click **Automate** -> **Show script editor**. In the box that comes up, click **File** -> **Open** and load the ``qupath_roi.groovy`` script (QuPath 0.2 or greater) or ``qupath_roi_legacy.groovy`` (QuPath 0.1.x), scripts `available on GitHub `_. Click **Run** -> **Run** if using QuPath 0.2 or greater, or **Run** -> **Run for Project** if using QuPath 0.1.x. ROIs will be exported in CSV format in the QuPath project directory, in the subdirectory "ROI". + +Once ROI CSV files are generated, ensure they are placed in the folder expected by your :ref:`Project ` or :ref:`Dataset ` based on their respective configurations. + +The ``roi_method`` argument to the ``extract_tiles()`` functions allow you to control how ROIs are used. Options include: + +- ``'auto'``: Default behavior. For slides with a valid ROI, extract tiles from within ROIs only. For slides without ROIs, extract from the whole-slide image. +- ``'inside'``: Extract from within ROIs, and skip any slides missing ROIs. +- ``'outside'``: Extract from outside ROIs, and skip any slides missing ROIs. +- ``'ignore'``: Ignore all ROIs, extracting from whole-slide images. + +.. note:: + + Nested ROIs will be rendered as holes. + +By default, ROIs filter tiles based on the center point of the tile. Alternatively, you can filter tiles based on the proportion of the tile inside an ROI by using the argument ``roi_filter_method``. If ``roi_filter_method`` is set to a float (0-1), this value will be interpreted as a proportion threshold. If the proportion of a tile inside an ROI is greater than this number, the tile is included. For example, if ``roi_filter_method=0.7``, a tile that is 80% inside of an ROI will be included, but a tile that is only 60% inside of an ROI will be excluded. + +.. image:: roi_filter.jpg + +| + +.. _roi_labels: + +ROIs can optionally be assigned a label. Labels can be added or changed using :ref:`Slideflow Studio `, or by adding a "label" column in the ROI CSV file. Labels can be used to train strongly supervised models, where each tile is assigned a label based on the ROI it is extracted from, rather than inheriting the label of the whole-slide image. See the developer note :ref:`tile_labels` for more information. + +To retrieve the ROI name (and label, if present) for all tiles in a slide, use :meth:`slideflow.WSI.get_tile_dataframe`. This will return a Pandas DataFrame with the following columns: + + - **loc_x**: X-coordinate of tile center + - **loc_y**: Y-coordinate of tile center + - **grid_x**: X grid index of the tile + - **grid_y**: Y grid index of the tile + - **roi_name**: Name of the ROI if tile is in an ROI, else None + - **roi_desc**: Description of the ROI if tile is in ROI, else None + - **label**: ROI label, if present. + +The **loc_x** and **loc_y** columns contain the same tile location information :ref:`stored in TFRecords `. + +You can also retrieve this information for all slides in a dataset by using :meth:`slideflow.Dataset.get_tile_dataframe`, which will return a DataFrame with the same columns as above, plus ``slide`` column. + + +Masking & Filtering +******************* + +Slideflow provides two approaches for refining where image tiles should be extracted from whole-slide images: **slide-level masking** and **tile-level filtering**. In these next sections, we'll review options for both approaches. + +Otsu's thresholding +------------------- + +.. image:: otsu.png + +| + +Otsu's thresholding is a **slide-based method** that distinguishes foreground (tissue) from background (empty slide). Otsu's thresholding is performed in the HSV colorspace and yields similar results to grayspace filtering, a tile-level filtering method described below. + +To apply Otsu's thresholding to slides before tile extraction, use the ``qc`` argument of the ``.extract_tiles()`` functions. + +.. code-block:: python + + from slideflow.slide import qc + + # Use this QC during tile extraction + P.extract_tiles(qc=qc.Otsu()) + + +You can also apply Otsu's thresholding to a single slide with the :meth:`slideflow.WSI.qc` method. See :class:`the WSI API documentation ` for more information on working with individual slides. + +.. code-block:: python + + # Apply Otsu's thresholding to a WSI object + wsi = sf.WSI(...) + wsi.qc(qc).show() + + +Gaussian blur filtering +----------------------- + +.. image:: blur.png + +| + +Gaussian blur masking is another **slide-based method** that can detect pen marks and out-of-focus areas, and is particularly useful for datasets lacking annotated Regions of Interest (ROIs). Gaussian blur masking is applied similarly, using the ``qc`` argument. + +Two versions of Gaussian blur masking are available: ``qc.Gaussian`` and ``qc.GaussianV2`` (new in Slideflow 2.1.0). The latter is the default and recommended version, as it is more computationally efficient. The former is provided for backwards compatibility. + +.. code-block:: python + + from slideflow.slide import qc + + # Use this QC during tile extraction + P.extract_tiles(qc=qc.GaussianV2()) + +By default, Gaussian blur masking is calculated at 4 times lower magnification than the tile extraction MPP (e.g., when extracting tiles at 10X effective magnification, Gaussian filtering would be calculated at 2.5X). This is to reduce computation time. You can change this behavior by manually setting the ``mpp`` argument to a specific microns-per-pixel value. + +Gaussian blur masking is performed on gray images. The ``sigma`` argument controls the standard deviation of the Gaussian blur kernel. The default value of 3 is recommended, but you may need to adjust this value for your dataset. A higher value will result in more areas being masked, while a lower value will result in fewer areas being masked. + +.. code-block:: python + + from slideflow.slide import qc + + # Customize the Gaussian filter, + # using a sigma of 2 and a mpp of 1 (10X magnification) + gaussian = qc.GaussianV2(mpp=1, sigma=2) + +You can also use multiple slide-level masking methods by providing a list to ``qc``. + +.. code-block:: python + + from slideflow.slide import qc + + qc = [ + qc.Otsu(), + qc.Gaussian() + ] + P.extract_tiles(qc=qc) + +If both Otsu's thresholding and blur detection are being used, Slideflow will calculate Blur Burden, a metric used to assess the degree to which non-background tiles are either out-of-focus or contain artifact. In the tile extraction PDF report that is generated (see next section), the distribution of blur burden for slides in the dataset will be plotted on the first page. The report will contain the number of slides meeting criteria for warning, when the blur burden exceeds 5% for a given slide. A text file containing names of slides with high blur burden will be saved in the exported TFRecords directory. These slides should be manually reviewed to ensure they are of high enough quality to include in the dataset. + +DeepFocus +--------- + +Slideflow also provides an interface for using `DeepFocus `_ to identify in-focus regions. DeepFocus is a lightweight neural network that predicts whether a section of a slide is in- or out-of-focus. When used as a slide-level masking method, DeepFocus will filter out-of-focus tiles from a slide. By default, DeepFocus is applied to slides at 40X magnification, although this can be customized with the ``tile_um`` argument. + +.. code-block:: python + + from slideflow.slide import qc + + deepfocus = qc.DeepFocus(tile_um='20x') + slide.qc(deepfocus) + +Alternatively, you can also retrieve raw predictions from the DeepFocus model for a slide by calling the deepfocus object on a :class:`slideflow.WSI` object, passing the argument threshold=False: + +.. code-block:: python + + preds = deepfocus(slide, threshold=False) + +Custom deep learning QC +----------------------- + +You can also create your own deep learning slide filters. To create a custom deep learning QC method like DeepFocus, create a custom slide filter that inherits :class:`slideflow.slide.qc.StridedDL`. For example, to manually recreate the above DeepFocus model, first clone the `TF2 fork on GitHub `_, which contains the DeepFocus architecture and model weights, and create a custom class as below: + +.. code-block:: python + + from slideflow.slide.qc import strided_dl + from deepfocus.keras_model import load_checkpoint, deepfocus_v3 + + class CustomDeepFocus(strided_dl.StridedDL): + + def __init__(self): + model = deepfocus_v3() + checkpoint = '/path/to/deepfocus/checkpoints/ver5' + load_checkpoint(model, checkpoint) + super().__init__( + model=model, + pred_idx=1, + tile_px=64, + tile_um='40x' + ) + +Then, supply this class to the ``qc`` argument as above. + +.. code-block:: python + + P.extract_tiles(qc=CustomDeepFocus()) + + +See :ref:`qc` for more information on the API for further QC customization. + +Segmentation Models (U-Net) +--------------------------- + +Slideflow also provides an interface for both training and using segmentation models (e.g. U-Net, FPN, DeepLabV3) for slide-level masking. This is discussed separately in :ref:`segmentation`. + +Grayspace filtering +-------------------- + +Grayspace filtering is a **tile-based method** that detects the amount of grayspace in a given image tile and discards the tile if the content exceeds a set threshold. RGB image tiles are converted to the HSV spectrum, and the fraction of pixels with saturation below a certain threshold is calculated. This filtering is performed separately for each tile as it is being extracted. Relevant arguments for grayspace filtering include: + + +- ``grayspace_threshold``: Saturation value, below which a pixel is considered gray. Range 0-1. Defaults to 0.05. +- ``grayspace_fraction``: Image tiles with grayspace above this fraction will be discarded. Defaults to 0.6. + +Grayspace filtering is enabled by default, and can be disabled by passing ``grayspace_fraction=1`` to the ``.extract_tiles()`` functions. + +Grayspace filtering is similar to Otsu's thresholding, with both operating in the HSV colorspace. Otsu's thresholding is ~30% faster than grayspace filtering for slides with accessible downsample layers, but if downsample layers are not stored in a given slide or are inaccessible (e.g. ``enable_downsample=False``), grayspace filtering may be faster. Grayspace filtering is more reliable than Otsu's thresholding for slides with abundant pen marks or other artifact, which can present issues for the Otsu's thresholding algorithm. + +Whitepsace filtering +-------------------- + +Whitespace filtering is performed similarly to grayspace filtering. Whitespace is calculated using overall brightness for each pixel, then counting the fraction of pixels with a brightness above some threshold. As with grayspace filtering, there are two relevant arguments: + + +- ``whitespace_threshold``: Brightness value, above which a pixel is considered white. Range 0-255. Defaults to 230. +- ``whitespace_fraction``: Image tiles with whitespace above this fraction will be discarded. Defaults to 1.0 (disabled). + +Whitespace filtering is disabled by default. + +Stain normalization +******************* + +.. image:: norm_compare/wsi_norm_compare.jpg + +Image tiles can undergo digital Hematoxylin and Eosin (H&E) stain normalization either during tile extraction or in real-time during training. Real-time normalization adds CPU overhead during training and inference but offers greater flexibility, allowing you to test different normalization strategies without re-extracting tiles from your entire dataset. + +Available stain normalization algorithms include: + +- **macenko**: `Original Macenko paper `_. +- **macenko_fast**: Modified Macenko algorithm with the brightness standardization step removed. +- **reinhard**: `Original Reinhard paper `_. +- **reinhard_fast**: Modified Reinhard algorithm with the brightness standardization step removed. +- **reinhard_mask**: Modified Reinhard algorithm, with background/whitespace removed. +- **reinhard_fast_mask**: Modified Reinhard-Fast algorithm, with background/whitespace removed. +- **vahadane**: `Original Vahadane paper `_. +- **augment**: HSV colorspace augmentation. +- **cyclegan**: CycleGAN-based stain normalization, as implemented by `Zingman et al `_ (PyTorch only) + +The Macenko and Reinhard stain normalizers are highly efficient, with native Tensorflow, PyTorch, and Numpy/OpenCV implementations, and support GPU acceleration (see :ref:`performance benchmarks `). + +During tile extraction +---------------------- + +Image tiles can be normalized during tile extraction by using the ``normalizer`` and ``normalizer_source`` arguments. ``normalizer`` is the name of the algorithm. The normalizer source - either a path to a reference image, or a ``str`` indicating one of our presets (e.g. ``'v1'``, ``'v2'``, ``'v3'``) - can also be set with ``normalizer_source``. + +.. code-block:: python + + P.extract_tiles( + tile_px=299, + tile_um=302, + normalizer='reinhard' + ) + +:ref:`Contextual stain normalization ` is supported when normalizing during tile extraction. + +On-the-fly +---------- + +The stain normalization implementations in Slideflow are fast and efficient, with separate Tensorflow-native, PyTorch-native, and Numpy/OpenCV implementations. In most instances, we recommend performing stain normalization on-the-fly as a part of image pre-processing, as this provides flexibility for changing normalization strategies without re-extracting all of your image tiles. + +Real-time normalization can be performed by setting the ``normalizer`` and/or ``normalizer_source`` hyperparameters. + +.. code-block:: python + + from slideflow.model import ModelParams + hp = ModelParams(..., normalizer='reinhard') + +If a model was trained using a normalizer, the normalizer algorithm and fit information will be stored in the model metadata file, ``params.json``, in the saved model folder. Any Slideflow function that uses this model will automatically process images using the same normalization strategy. + +When stain normalizing on-the-fly, stain augmentation becomes available as a training augmentation technique. Read more about :ref:`stain augmentation `. + +The normalizer interfaces can also be access directly through :class:`slideflow.norm.StainNormalizer`. See :py:mod:`slideflow.norm` for examples and more information. + +Performance optimization +************************ + +As tile extraction is heavily reliant on random access reading, significant performance gains can be experienced by either 1) moving all slides to an SSD, or 2) utilizing an SSD or ramdisk buffer (to which slides will be copied prior to extraction). The use of a ramdisk buffer can improve tile extraction speed by 10-fold or greater! To maximize performance, pass the buffer path to the argument ``buffer``. + +Extraction reports +****************** + +Once tiles have been extracted, a PDF report will be generated with a summary and sample of tiles extracted from their corresponding slides. An example of such a report is given below. Reviewing this report may enable you to identify data corruption, artifacts with stain normalization, or suboptimal background filtering. The report is saved in the TFRecords directory. + +.. image:: example_report_small.jpg + +In addition to viewing reports after tile extraction, you may generate new reports on existing tfrecords with :func:`slideflow.Dataset.tfrecord_report`, by calling this function on a given dataset. For example: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + dataset.tfrecord_report("/path/to/dest") + +You can also generate reports for slides that have not yet been extracted by passing ``dry_run=True`` to :meth:`slideflow.Dataset.extract_tiles`. diff --git a/docs/_sources/slide_qc.rst.txt b/docs/_sources/slide_qc.rst.txt new file mode 100644 index 000000000..d38e0be04 --- /dev/null +++ b/docs/_sources/slide_qc.rst.txt @@ -0,0 +1,36 @@ +.. currentmodule:: slideflow.slide.qc + +.. _qc: + +slideflow.slide.qc +================== + +This module contains functions for slide-level quality control, including Otsu's thresholding and Gaussian blur filtering. Quality control methods are used by passing a list of callables to the ``qc`` argument of ``.extract_tiles()``. They can also be directly applied to a slide with :meth:`slideflow.WSI.qc`. + +.. code-block:: python + + import slideflow as sf + from slideflow.slide import qc + + # Define custom QC options + qc = [ + qc.Otsu(), + qc.Gaussian(sigma=2) + ] + + # Use this QC during tile extraction + P.extract_tiles(qc=qc) + + # Alternatively, you can use the same QC directly on a WSI object + wsi = sf.WSI(...) + wsi.qc(qc).show() + +.. autoclass:: Otsu + +.. autoclass:: Gaussian + +.. autoclass:: Save + +.. autoclass:: Load + +.. autoclass:: StridedDL \ No newline at end of file diff --git a/docs/_sources/slideflow.rst.txt b/docs/_sources/slideflow.rst.txt new file mode 100644 index 000000000..edec27e5d --- /dev/null +++ b/docs/_sources/slideflow.rst.txt @@ -0,0 +1,11 @@ +.. currentmodule:: slideflow + +slideflow +========= + +.. autofunction:: slideflow.about +.. autofunction:: slideflow.build_feature_extractor +.. autofunction:: slideflow.create_project +.. autofunction:: slideflow.load_project +.. autofunction:: slideflow.getLoggingLevel +.. autofunction:: slideflow.setLoggingLevel diff --git a/docs/_sources/slideflow_cellseg.rst.txt b/docs/_sources/slideflow_cellseg.rst.txt new file mode 100644 index 000000000..b30e22dbe --- /dev/null +++ b/docs/_sources/slideflow_cellseg.rst.txt @@ -0,0 +1,23 @@ +.. currentmodule:: slideflow.cellseg + +slideflow.cellseg +================= + +This module contains utility functions for performing whole-slide image cell segmentation with Cellpose. + +See :ref:`cellseg` for more information. + +.. autofunction:: segment_slide + +Segmentation +************ +.. autoclass:: Segmentation +.. autofunction:: slideflow.cellseg.Segmentation.apply_rois +.. autofunction:: slideflow.cellseg.Segmentation.calculate_centroids +.. autofunction:: slideflow.cellseg.Segmentation.calculate_outlines +.. autofunction:: slideflow.cellseg.Segmentation.centroids +.. autofunction:: slideflow.cellseg.Segmentation.centroid_to_image +.. autofunction:: slideflow.cellseg.Segmentation.extract_centroids +.. autofunction:: slideflow.cellseg.Segmentation.mask_to_image +.. autofunction:: slideflow.cellseg.Segmentation.outline_to_image +.. autofunction:: slideflow.cellseg.Segmentation.save \ No newline at end of file diff --git a/docs/_sources/slidemap.rst.txt b/docs/_sources/slidemap.rst.txt new file mode 100644 index 000000000..7dc67546b --- /dev/null +++ b/docs/_sources/slidemap.rst.txt @@ -0,0 +1,61 @@ +.. currentmodule:: slideflow + +slideflow.SlideMap +================== + +:class:`slideflow.SlideMap` assists with visualizing tiles and slides in two-dimensional space. + +Once a model has been trained, tile-level predictions and intermediate layer activations can be calculated +across an entire dataset with :class:`slideflow.DatasetFeatures`. +The :class:`slideflow.SlideMap` class can then perform dimensionality reduction on these dataset-wide +activations, plotting tiles and slides in two-dimensional space. Visualizing the distribution and clustering +of tile-level and slide-level layer activations can help reveal underlying structures in the dataset and shared +visual features among classes. + +The primary method of use is first generating an :class:`slideflow.DatasetFeatures` from a trained +model, then using :meth:`slideflow.DatasetFeatures.map_activations`, which returns an instance of +:class:`slideflow.SlideMap`. + +.. code-block:: python + + ftrs = sf.DatasetFeatures(model='/path/', ...) + slide_map = ftrs.map_activations() + +Alternatively, if you would like to map slides from a dataset in two-dimensional space using pre-calculated *x* and *y* +coordinates, you can use the :meth:`sldieflow.SlideMap.from_xy` class method. In addition to X and Y, this method +requires supplying tile-level metadata in the form of a list of dicts. Each dict must contain the name of the origin +slide and the tile index in the slide TFRecord. + +.. code-block:: python + + x = np.array(...) + y = np.array(...) + slides = ['slide1', 'slide1', 'slide5', ...] + slide_map = sf.SlideMap.from_xy(x=x, y=y, slides=slides) + +.. autoclass:: SlideMap + +Methods +------- + +.. autofunction:: slideflow.SlideMap.activations +.. autofunction:: slideflow.SlideMap.build_mosaic +.. autofunction:: slideflow.SlideMap.cluster +.. autofunction:: slideflow.SlideMap.neighbors +.. autofunction:: slideflow.SlideMap.filter +.. autofunction:: slideflow.SlideMap.umap_transform +.. autofunction:: slideflow.SlideMap.label +.. autofunction:: slideflow.SlideMap.label_by_preds +.. autofunction:: slideflow.SlideMap.label_by_slide +.. autofunction:: slideflow.SlideMap.label_by_uncertainty +.. autofunction:: slideflow.SlideMap.load +.. autofunction:: slideflow.SlideMap.load_coordinates +.. autofunction:: slideflow.SlideMap.load_umap +.. autofunction:: slideflow.SlideMap.plot +.. autofunction:: slideflow.SlideMap.plot_3d +.. autofunction:: slideflow.SlideMap.save +.. autofunction:: slideflow.SlideMap.save_3d +.. autofunction:: slideflow.SlideMap.save_plot +.. autofunction:: slideflow.SlideMap.save_coordinates +.. autofunction:: slideflow.SlideMap.save_umap +.. autofunction:: slideflow.SlideMap.save_encoder diff --git a/docs/_sources/ssl.rst.txt b/docs/_sources/ssl.rst.txt new file mode 100644 index 000000000..7b7d0c661 --- /dev/null +++ b/docs/_sources/ssl.rst.txt @@ -0,0 +1,152 @@ +.. currentmodule:: slideflow.simclr + +.. _simclr_ssl: + +Self-Supervised Learning (SSL) +============================== + +Slideflow provides easy access to training the self-supervised, contrastive learning framework `SimCLR `_. Self-supervised learning provides an avenue for learning useful visual representations in your dataset without requiring ground-truth labels. These visual representations can be exported as feature vectors and used for downstream analyses such as :ref:`dimensionality reduction ` or :ref:`multi-instance learning `. + +The ``slideflow.simclr`` module contains a `forked Tensorflow implementation `_ minimally modified to interface with Slideflow. SimCLR models can be trained with :meth:`slideflow.Project.train_simclr`, and SimCLR features can be calculated as with other models using :meth:`slideflow.Project.generate_features`. + +Training SimCLR +*************** + +First, determine the SimCLR training parameters with :func:`slideflow.simclr.get_args`. This function accepts parameters via keyword arguments, such as ``learning_rate`` and ``temperature``, and returns a configured :class:`slideflow.simclr.SimCLR_Args`. + +.. code-block:: python + + from slideflow import simclr + + args = simclr.get_args( + temperature=0.1, + learning_rate=0.3, + train_epochs=100, + image_size=299 + ) + +Next, assemble a training and (optionally) a validation dataset. The validation dataset is used to assess contrastive loss during training, but is not required. + +.. code-block:: python + + import slideflow as sf + + # Load a project and dataset + P = sf.load_project('path') + dataset = P.dataset(tile_px=299, tile_um=302) + + # Split dataset into training/validation + train_dts, val_dts = dataset.split( + val_fraction=0.3, + model_type='classification', + labels='subtype') + +Finally, SimCLR can be trained with :meth:`slideflow.Project.train_simclr`. You can train with a single dataset: + +.. code-block:: python + + P.train_simclr(args, dataset) + +You can train with an optional validation dataset: + +.. code-block:: python + + P.train_simclr( + args, + train_dataset=train_dts, + val_dataset=val_dts + ) + +And you can also optionally provide labels for training the supervised head. To train a supervised head, you'll also need to set the SimCLR argument ``lineareval_while_pretraining=True``. + +.. code-block:: python + + # SimCLR args + args = simclr.get_args( + ..., + lineareval_while_pretraining=True + ) + + # Train with validation & supervised head + P.train_simclr( + args, + train_dataset=train_dts, + val_dataset=val_dts, + outcomes='subtype' + ) + +The SimCLR model checkpoints and final saved model will be saved in the ``simclr/`` folder within the project root directory. + +.. _dinov2: + +Training DINOv2 +*************** + +A lightly modified version of `DINOv2 `__ with Slideflow integration is available on `GitHub `_. This version facilitates training DINOv2 with Slideflow datasets and adds stain augmentation to the training pipeline. + +To train DINOv2, first install the package: + +.. code-block:: bash + + pip install git+https://github.com/jamesdolezal/dinov2.git + +Next, configure the training parameters and datsets by providing a configuration YAML file. This configuration file should contain a ``slideflow`` section, which specifies the Slideflow project and dataset to use for training. An example YAML file is shown below: + +.. code-block:: yaml + + train: + dataset_path: slideflow + batch_size_per_gpu: 32 + slideflow: + project: "/mnt/data/projects/TCGA_THCA_BRAF" + dataset: + tile_px: 299 + tile_um: 302 + filters: + brs_class: + - "Braf-like" + - "Ras-like" + seed: 42 + outcome_labels: "brs_class" + normalizer: "reinhard_mask" + interleave_kwargs: null + +See the `DINOv2 README `_ for more details on the configuration file format. + +Finally, train DINOv2 using the same command-line interface as the original DINOv2 implementation. For example, to train DINOv2 on 4 GPUs on a single node: + +.. code-block:: bash + + torchrun --nproc_per_node=4 -m "dinov2.train.train" \ + --config-file /path/to/config.yaml \ + --output-dir /path/to/output_dir + +The teacher weights will be saved in ``outdir/eval/.../teacher_checkpoint.pth``, and the final configuration YAML will be saved in ``outdir/config.yaml``. + +Generating features +******************* + +Generating features from a trained SSL is straightforward - use the same :meth:`slideflow.Project.generate_features` and :class:`slideflow.DatasetFeatures` interfaces as :ref:`previously described `, providing a path to a saved SimCLR model or checkpoint. + +.. code-block:: python + + import slideflow as sf + + # Create the SimCLR feature extractor + simclr = sf.build_feature_extractor( + 'simclr', + ckpt='/path/to/simclr.ckpt' + ) + + # Calculate SimCLR features for a dataset + features = P.generate_features(simclr, ...) + +For DINOv2 models, use ``'dinov2'`` as the first argument, and pass the model configuration YAML file to ``cfg`` and the teacher checkpoint weights to ``weights``. + +.. code-block:: python + + dinov2 = build_feature_extractor( + 'dinov2', + weights='/path/to/teacher_checkpoint.pth', + cfg='/path/to/config.yaml' + ) \ No newline at end of file diff --git a/docs/_sources/stats.rst.txt b/docs/_sources/stats.rst.txt index f15c1ee31..fbc9dfd28 100644 --- a/docs/_sources/stats.rst.txt +++ b/docs/_sources/stats.rst.txt @@ -3,139 +3,20 @@ slideflow.stats =============== -In addition to containing functions used during model training and evaluation, this module provides -the :class:`slideflow.SlideMap` class designed to assist with visualizing tiles and slides -in two-dimensional space. +This module contains internal utility functions for generating and evaluating model predictions and metrics. -Once a model has been trained, tile-level predictions and intermediate layer activations can be calculated -across an entire dataset with :class:`slideflow.model.DatasetFeatures`. -The :class:`slideflow.SlideMap` class can then perform dimensionality reduction on these dataset-wide -activations, plotting tiles and slides in two-dimensional space. Visualizing the distribution and clustering -of tile-level and slide-level layer activations can help reveal underlying structures in the dataset and shared -visual features among classes. +.. autofunction:: df_from_pred -The primary method of use is first generating an :class:`slideflow.model.DatasetFeatures` from a trained -model, then creating an instance of a :class:`slideflow.SlideMap` by using the ``from_features`` class -method: +.. autofunction:: eval_dataset -.. code-block:: python +.. autofunction:: group_reduce - df = sf.DatasetFeatures(model='/path/', ...) - slide_map = sf.SlideMap.from_features(df) - -Alternatively, if you would like to map slides from a dataset in two-dimensional space using pre-calculated *x* and *y* -coordinates, you can use the ``from_precalculated`` class method. In addition to X and Y, this method requires supplying -tile-level metadata in the form of a list of dicts. Each dict must contain the name of the origin slide and the tile -index in the slide TFRecord. - -.. code-block:: python - - dataset = project.dataset(tile_px=299, tile_um=302) - slides = dataset.slides() - x = np.array(...) - y = np.array(...) - meta = [{'slide': ..., 'index': ...} for i in range(len(x))] - slide_map = sf.SlideMap.from_precalculated(slides, x, y, meta) - -.. automodule: slideflow.stats - :imported-members: - -SlideMap --------- - -.. autoclass:: slideflow.SlideMap - :inherited-members: - -basic_metrics ----------------------- -.. autofunction:: basic_metrics - -calculate_centroid ------------------- -.. autofunction:: calculate_centroid - -concordance_index ----------------------- -.. autofunction:: concordance_index - -filtered_prediction -------------------- -.. autofunction:: filtered_prediction - -generate_combined_roc ----------------------- -.. autofunction:: generate_combined_roc - -generate_roc ----------------------- -.. autofunction:: generate_roc - -generate_scatter ----------------------- -.. autofunction:: generate_scatter - -gen_umap --------- -.. autofunction:: gen_umap - -get_centroid_index ------------------- -.. autofunction:: get_centroid_index - -metrics_from_dataset ---------------------- .. autofunction:: metrics_from_dataset -metrics_from_pred ------------------ -.. autofunction:: metrics_from_pred - -normalize_layout ----------------- -.. autofunction:: normalize_layout - -read_predictions ----------------- -.. autofunction:: read_predictions - -permute_importance ------------------- -.. autofunction:: permute_importance - -predict_from_layer ----------------------- -.. autofunction:: predict_from_layer - -predict_from_tensorflow ------------------------- -.. autofunction:: predict_from_tensorflow - -predict_from_torch ----------------------- -.. autofunction:: predict_from_torch - -save_histogram ----------------------- -.. autofunction:: save_histogram - -pred_to_df -------------------------- -.. autofunction:: pred_to_df - -to_onehot ----------------------- -.. autofunction:: to_onehot - - - - - - - - - - - +.. autofunction:: name_columns +.. autofunction:: predict_dataset +.. autofunction:: calculate_centroid +.. autofunction:: get_centroid_index \ No newline at end of file diff --git a/docs/_sources/studio.rst.txt b/docs/_sources/studio.rst.txt new file mode 100644 index 000000000..1507a25ef --- /dev/null +++ b/docs/_sources/studio.rst.txt @@ -0,0 +1,350 @@ +.. _studio: + +Slideflow Studio: Live Visualization +==================================== + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/studio_preview.webm + :autoplay: + +| + +Slideflow Studio provides powerful tools for interactive visualization of whole-slide images, model predictions, and GAN-generated images. It's also fast - with an OpenGL renderer and highly optimized whole-slide image viewer, you'll get a smooth experience that can even run on a Raspberry Pi. + +If you have installed slideflow via PIP, you can run Studio from a terminal with: + +.. code-block:: bash + + slideflow-studio + +If you are running from source, you can start Studio using the following script in the GitHub repository: + +.. code-block:: bash + + python slideflow-studio.py + +If you encounter any issues with the initialization scripts, you can also start Studio by executing the submodule: + +.. code-block:: bash + + python -m slideflow.studio + +If you are using a Docker image, additional arguments are required to launch Studio. Start your docker container using the arguments ``-e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix``. For example: + +.. code-block:: bash + + docker run -it --rm \ + -e DISPLAY=$DISPLAY \ + -v /tmp/.X11-unix:/tmp/.X11-unix \ + slideflow/slideflow:latest-tf + +A path to a whole-slide image can optionally be provided as the first argument. Use the ``--help`` flag to see a list of available arguments. + +You can also launch Studio by using the ``.view()`` function of :class:`slideflow.WSI`, :class:`slideflow.Heatmap`, and :class:`slideflow.Mosaic` functions. + +.. code-block:: python + + import slideflow + + wsi = sf.WSI('/path/to/slide.svs', tile_px=299, tile_um=302) + wsi.view() + + +Layout & design +*************** + +.. image:: studio_section_labels.jpg + +| + +The Slideflow Studio window has three primary areas: the main view, a tile preview, and the control panel. Fullscreen mode can be toggled with View -> Fullscreen or by pressing Alt+Enter. + +Main view +----------- +The main view is an interactive display for whole-slide images. Zoom in on a slide using the mouse wheel, and navigate around the slide by clicking and dragging. When a model is loaded, right clicking on the main view sets the prediction location, drawing a red box outlining the location that a tile was extracted and displaying the prediction underneath. + +Tile preview +------------ +When a model is loaded, right clicking on the main view will establish the location for a focal tile prediction. A tile will be extracted from this location of the whole-slide image at the pixel & micron size appropriate for the loaded model. The tile preview window shows the extracted image tile taken from this location. If the loaded model uses stain normalization, a post-normalization image is also shown on the right. The tile preview window can be hidden by clicking the X in the top right corner, or toggled via the menu item View -> Show -> Tile Preview. + +Control panel +------------- +The control panel shows relevant active widgets which contain information and controls for whole-slide images, loaded models, heatmaps, and loaded GANs. :ref:`Enabling an extension ` will add an additional icon and associated functionality. + +Projects +******** + + +A Slideflow :ref:`Project ` can be loaded to make it easier to find and load both slides and models. Load a project with either File -> Open Project, or click and drag a project folder onto the main view. Click the Project icon to view project information and browse both slides and models. + +.. video:: https://github.com/user-attachments/assets/e55339a9-69ce-4fa6-a3de-66a4a5244704 + :autoplay: + +| + +All slides associated with the project will be listed under the "Slides" subheader. Clicking a slide name will open the slide. Similarly, all trained models associated with the project are listed under the "Models" subheader and can be loaded by clicking a model name. Both Tensorflow and PyTorch models can be loaded, regardless of the active backend. + +.. _studio_wsi: + +Whole-slide images +****************** + +.. image:: studio_slide.jpg + +| + +Whole-slide images can be loaded directly with File -> Open Slide. You can also load a slide by dragging and dropping a file onto the main view or by using the Project interface. Use the mouse wheel to zoom, and click-and-drag to move. Slides can be closed with File -> Close Slide. + +The Slide section of the control panel shows slide properties, including dimensions, highest scanned magnification, slide scanner vendor, and how many annotated regions-of-interest (ROIs) are loaded for the slide. ROIs are loaded automatically if a Project is loaded and ROIs are available for the slide. + +A thumbnail of the loaded slide is shown in the upper right corner of the main view, and can be hidden with View -> Show -> Thumbnail. A magnification scale is shown in the bottom-left corner of the main view, and can be hidden with View -> Show -> Scale. + +.. _studio_roi: + +ROI Annotations +--------------- + +.. image:: studio_rois.jpg + +| + +Regions-of-Interest (ROIs) can be used to guide tile extraction. If a Slideflow project has been loaded (File -> Open Project), ROIs will be automatically loaded. You can use Studio to add, label, or remove ROIs with the annotation tool, under the subheader "ROIs". + +Click the plus (Add) icon to draw new ROIs with a lasso tool; right click and drag to create a new ROI. The pencil (Edit) icon allows you to edit any existing ROIs; right click an ROI while editing to delete the ROI or change its label. Once finished, ROIs can be exported in CSV format by clicking the floppy disk icon (Save). You can manually load an existing ROI file by clicking the folder icon (Load). + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/roi_label.mp4 + :autoplay: + +| + +Labels can be optionally supplied for each ROI. Labels can be set after creating an ROI and changed by right clicking an ROI while editing. Hover over an existing ROI to see its name and label. Labels are exported when saving ROIs. + +Slideflow 3.0 added a new polygon tool for drawing ROIs. Click the polygon icon to draw a polygon ROI. Right click to add points, and press Enter to close the polygon. The polygon tool can be used to draw complex shapes, and can be used in conjunction with the lasso tool. + +.. video:: https://github.com/user-attachments/assets/edf7c377-af40-4f8e-a4cb-f84024988e91 + :autoplay: + +When in Edit mode, click on an ROI to select it. Holding down the Control key will show the ROI vertices, which can then be selected and moved. Hold Shift and drag the mouse to select multiple vertices. Vertices can be moved by dragging them and deleted by pressing the Delete key. Click outside the ROI or press Esc to deselect. + +Slideflow can also import ROIs generated from external applications such as QuPath and ImageScope; see :ref:`regions_of_interest` for more information. + +Tile filtering +-------------- + +.. image:: tile_filter.jpg + +| + +A tile filtering strategy can be applied by checking "Tile filter" in the "Slide Processing" subsection. Click the ellipsis button to change grayspace fraction/threshold and whitespace fraction/threshold, to see how tuning these parameters alters tile-level filtering. If enabled, tile filtering will be performed when generating predictions from the slide. Once enabled, the tile filter can be previewed by checking the box "Show tile-level filter" in the "Display" subsection. + +Slide filtering +--------------- + +.. image:: slide_filter.jpg + +| + +Similarly, slide filtering can be enabled by checking "Slide filter". Available slide filtering / QC options include blur filtering, Otsu's thresholding, or both. If "Tile filter" and "Slide filter" are both selected, tiles will be filtered with both. The QC mask can be previewed by checking the box "QC Mask" in the "Display" subsection. + +.. _studio_segmentation: + +Tissue segmentation +------------------- + +.. video:: https://github.com/user-attachments/assets/6f0da6be-da47-443e-b08e-1bab978fb345 + :autoplay: + +| + +New in version 3.0, trained :ref:`segmentation models ` can be both trained and deployed directly within Studio using the new Segmentation widget. + +The Segmentation widget can be accessed by clicking the "Segmentation" icon in the left-hand toolbar. The widget allows you to load a segmentation model and apply it to the loaded slide, generating labeled ROIs. Trained models can also be loaded by dragging and dropping a model folder onto the main view. + +The Segmentation widget also contains a section for training models. In order to train models, a project must be loaded (File -> Open Project). The "Data Source" dropdown is used to select which slides in the project will be used for training. The "Data Processing" section is used to customize the model, including the tile size, magnification, stride, and margin. The "filter" option - which can be either "roi" or "otsu" - determines which tiles are used for training (either all tiles or only those within ROIs). The "Arch & Params" section is used to select the model architecture, hyperparameters, segmentation model type (binary, multiclass, or multilabel), and ROI classes that will be included in training. The "Train" button will begin training the model. Once training is complete, the "Export" button can be used to save the trained model to disk. "Generate ROIs" can then be used to apply the trained model to any loaded slide. + +Preview slide normalization +--------------------------- + +Stain normalization strategies can be quickly previewed by checking "Normalize", which will apply the associated normalization strategy to the main view. If a model is loaded, the model's normalizer will be used by default. The normalizer can be changed with the corresponding dropdown menu, allowing you to preview any normalization method. All normalizer methods shown except for the model normalizer will use the "v3" fit (see :py:mod:`slideflow.norm` for more information). Regardless of what is being previewed, the appropriate model normalizer will be used when generating predictions from the slide. + +Preview tile extraction +----------------------- + +.. image:: https://github-production-user-asset-6210df.s3.amazonaws.com/48372806/257349240-a4911b16-9b5a-4289-9d46-41c95f31acda.png + +| + +The "Display" subsection of the slide widget allows users to preview tile extraction, displaying outlines around tiles. Model predictions generated from the slide will only utilize the shown tiles. + +Models & predictions +******************** + +Slideflow models can be loaded with File -> Open Model, by clicking and dragging a model onto the main view, or by clicking "Load a Model" button of the model widget. Both Tensorflow and PyTorch models are supported. Multiple-instance learning (MIL) models require the MIL extension, :ref:`as discussed below `. Models can be closed with File -> Close Model. + +A summary of the loaded model is shown on the left side of the model widget, containing information about the model outcomes, tile size, image format (PNG/JPG), backend (Tensorflow/PyTorch), and the version of Slideflow used to train the model. Click the "HP" button to show a list of all hyperparameters used during model training. + +A model will be enabled by default once loaded, but can be disabled by clicking the gear icon in the Model section of the control panel, and then clicking "Close model". Similarly, to disable uncertainty quantification (UQ) for models trained with UQ, open the same gear menu and deselect "Enable UQ". + +Tile predictions +---------------- + +.. image:: studio_tile_preds.jpg + +| + +Once a model is loaded, right-click anywhere on the main view to set the tile extraction location for the tile preview. A tile will be extracted at this location matching the pixel and micron size of the loaded model. The extracted tile will be shown before and after stain normalization (if applicable) in the tile preview window. Right click and drag to slide the preview window. The model prediction at this location will be shown underneath the red box in the main view, and in histogram format in the control panel, along with the class label for classification models. + +Saliency +-------- + +.. image:: studio_saliency.jpg + +| + +Saliency maps for the given model and image tile can be previewed in real-time by selecting the checkbox under the "Saliency" subheader. The saliency map will replace the extracted image tile in the tile preview window. Alternatively, saliency can be viewed as an overlay on top of the extracted image tile by checking the box "Overlay". The dropdown menu below in this section can be used to change the saliency method. + + +Slide predictions +----------------- + +.. image:: studio_slide_preds.jpg + +| + +Click the "Predict Slide" button to generate a prediction for the whole-slide image. By default, this will show predictions across the slide as a heatmap in the main display, and the final prediction for the slide will be shown under the "Slide Prediction" subheader of the control panel. Histograms of predictions for each model outcome, as well as uncertainty (if applicable), will be shown in this same section of the control panel. Click the + and - buttons in this section to cycle through histograms for each outcome category. + + +.. _studio_mil: + +Multiple-Instance Learning +************************** + +Slideflow Studio includes support for multiple-instance learning (MIL) models with the MIL extension. In addition to generating predictions from MIL models, Studio can also be used to visualize associated attention heatmaps. Please see :ref:`mil` for more information. + +Start opening the MIL widget in the sidebar. Models are loaded by either clicking the "Load MIL model" button, selecting "File -> Load MIL Model...", or by dragging-and-dropping an MIL model folder onto the window. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/mil_attention.mp4 + :autoplay: + +| + +Information about the feature extractor and MIL model will be shown in the left-hand toolbar. MIL model architecture and hyperparameters can be viewed by clicking the "HP" button. Click "Predict Slide" to generate a whole-slide prediction. If applicable, attention will be displayed as a heatmap. The heatmap color and display can be customized in the Heatmap widget. + +Right-clicking for a focal prediction when an MIL model is loaded will display the tile-level attention along with the tile prediction. Tile-level attention can be displayed as a scaled colorbar, as shown in the video above, by specifying an attention range and thresholds in the MIL ``mil_params.json`` file. + +.. code-block:: python + + { + ... + "thresholds": { + "attention": { + "low": 0.3, + "high": 0.5, + "range": [0, 1] + } + }, + ... + } + + +Heatmaps +******** + +.. image:: studio_heatmap.jpg + +| + +The heatmap section of the control panel can be used to generate and customize whole-slide heatmaps. Heatmaps are generated using the settings configured in the Slide section of the control panel (including stride, tile filter, and slide filter). Click "Generate" in the heatmap widget to create the heatmap. The color scheme can be changed with the dropdown menu of the "Display" subheader, as can the alpha and gain. You can switch which outcome is being displayed as a heatmap by cycling through the available predictions. If the model was trained with uncertainty quantification (UQ), click the radio button next to UQ to show uncertainty as a heatmap. Press the left ALT key while hovering over the heatmap to show the raw heatmap values. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/heatmap.mp4 + :autoplay: + +| + +By default, heatmaps are calculated with multiprocessing pools, which may increase memory utilization. To decrease memory utilization at the cost of slower heatmap calculation, switch to low memory mode in the Settings section (described below), or by using the launch flag ``--low_memory``. + +Heatmaps can be saved in PNG format with File -> Export -> Heatmap (PNG). Heatmaps can also be exported in numpy format (NPZ) with File -> Export -> Heatmap (NPZ). The heatmap of predictions will be saved in the exported NPZ file under the key ``'logit'``, with the shape ``(y_dim, x_dim, num_classes)``. If the model was trained with uncertainty, the uncertainty heatmap will be saved under the key ``'uncertainty'``. + +Performance & Capture +********************* + +.. image:: studio_performance.jpg + +| + +Performance can be monitored in the Performance section of the control panel (lightning icon). This section shows frametimes for GUI display, image rendering, normalization, and model prediction. + +Export contents of the main view to a PNG file with File -> Export -> Main view. Similarly, the extracted image tile shown in the tile preview window can be exported with File -> Export -> Tile view. A screenshot of the entire window interface can be saved with File -> Export -> GUI view. + +Settings +******** + +Studio can be customized in the Settings section, which provides the ability to set a FPS limit (defaults to 60), enable vertical sync (enabled by default), and customize the theme. This section also includes an option to enter "Low lemory mode". In low memory mode, heatmaps are calculated with threadpools rather than multiprocessing pools, decreasing memory utilization at the cost of slower heatmap generation. + +.. _extensions: + +Extensions +********** + +.. image:: studio_extensions.jpg + +| + +Slideflow Studio includes an Extensions section for expanding functionality and adding additional features. Extensions may require additional software dependencies or have different licenses. The Extensions section can be accessed by clicking the puzzle icon in bottom-left section of the control panel. + +Four official extensions are included and described below, adding support for cell segmentation with Cellpose, generative adversarial networks (StyleGAN), mosaic maps, and multiple-instance learning. Development is underway to add support for community extensions that can be shared and downloaded. Please reach out to us `on GitHub `_ if you are interested in building and deploying an extension based on your research. + +Cell segmentation +----------------- + +The Cell Segmentation extension adds support for interactive cell segmentation with Cellpose. Please see :ref:`cellseg` for more information. + +StyleGAN +-------- + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/stylegan.webm + :autoplay: + +| + +The StyleGAN extension adds support for visualizing trained StyleGAN2 or StyleGAN3 networks. Once enabled, GAN ``*.pkl`` files can be loaded with File -> Load GAN, or with drag-and-drop. Generated images are shown in the tile preview window. Model predictions on GAN images operate similarly to predictions on whole-slide images. Predictions on GAN images are generated in real-time, and you can watch the predictions change in the control panel. + +By default, Studio will generate predictions on the full GAN image (after resizing to match the model's ``tile_px`` value). If a ``training_options.json`` file is found in the same directory as the GAN .pkl, the tile size used to train the GAN will be read from this file (slideflow_kwargs/tile_px and ../tile_um). If the GAN was trained on images with a different ``tile_um`` value, the GAN image will be cropped to match the model's ``tile_um`` before resizing. The cropped/resized (and stain normalized) image will be shown to the right of the raw GAN image in the tile preview window. + +The StyleGAN widget can be used to travel the GAN latent space, similar to the implementation in the official `NVIDIA StyleGAN3 repository `_. Set a specific seed in the input field next to "Seed", or click and drag the "Drag" button. If the model was trained with class conditioning, manually set the class with the "Class" field (the default value of -1 selects a random class). Press left or right on your keyboard to quickly move through seeds. + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/gan_seeds.mp4 + :autoplay: + +| + +The style mixing section can be used to mix styles between seeds, styles between classes, or both. You can control the degree of mixing with the mixing slider. You can finetune which GAN layers are used during the mixing by clicking the ellipsis button and selection which layers should be traversed during style mixing. + +Save the current seed by clicking the "Save" button; all saved seeds will be listed in the "Saved Seeds" subsection. Click any seed to load it. Once any seed has been saved, options will appear to export a list of saved seeds in CSV format. Previously exported seeds can be loaded by clicking "Load Seeds". + +StyleGAN requires the ``slideflow-noncommercial`` package: + +.. code-block:: bash + + pip install slideflow-noncommercial + +Mosaic maps +----------- + +The Mosaic Maps extension, which is enabled by default, adds support for interactively viewing mosaic maps. You can use the :meth:`slideflow.Mosaic.view` function to launch Studio and load the mosaic. + +.. code-block:: python + + import slideflow as sf + + mosaic = sf.Mosaic(...) + mosaic.view() + +Alternatively, a mosaic map can be saved to disk with :meth:`slideflow.Mosaic.export`, and then loaded into Studio with File -> Load Mosaic. + +.. image:: studio_mosaic.jpg + +| + +Once loaded,the mosaic map can be navigated using the same controls as WSI navigation - click and drag to pan, and use the mouse wheel to zoom. The UMAP used to generate the mosaic map will be shown in a window in the bottom-right corner, with a red box indicating the section of the UMAP currently in view. If a Project is loaded, hovering over an image tile will reveal a popup containing a larger corresponding section from the associated whole-slide image. This popup also contains the name of the slide and tile location coordinates. + +Use the control panel to increase or decrease the mosaic grid size, or to change the background color. diff --git a/docs/_sources/studio_module.rst.txt b/docs/_sources/studio_module.rst.txt new file mode 100644 index 000000000..e2daaa635 --- /dev/null +++ b/docs/_sources/studio_module.rst.txt @@ -0,0 +1,9 @@ +.. currentmodule:: slideflow.studio + +slideflow.studio +================ + +This module contains the Slideflow Studio visualization tool. See :ref:`studio` for more information. + +.. automodule:: slideflow.studio + :members: diff --git a/docs/_sources/stylegan.rst.txt b/docs/_sources/stylegan.rst.txt new file mode 100644 index 000000000..8fe9cb2ef --- /dev/null +++ b/docs/_sources/stylegan.rst.txt @@ -0,0 +1,179 @@ +.. currentmodule:: slideflow.gan + +.. _stylegan: + +Generative Networks (GANs) +========================== + +.. video:: https://media.githubusercontent.com/media/slideflow/slideflow/master/docs/stylegan.webm + :autoplay: + +| + +Slideflow includes tools to easily interface with the PyTorch implementations of `StyleGAN2 `_ and `StyleGAN3 `_, allowing you to train these Generative Adversarial Networks (GANs). Slideflow additionally includes tools to assist with image generation, interpolation between class labels, and interactively visualize GAN-generated images and their predictions. See our manuscript on the use of GANs to `generate synthetic histology `_ for an example of how these networks might be used. + + +.. note:: + + StyleGAN requires PyTorch <0.13 and Slideflow-NonCommercial, which can be installed with: + + .. code-block:: bash + + pip install slideflow-noncommercial + + +Training StyleGAN +***************** + +The easiest way to train StyleGAN2/StyleGAN3 is with :meth:`slideflow.Project.gan_train`. Both standard and class-conditional GANs are +supported. To train a GAN, pass a :class:`slideflow.Dataset`, experiment label, +and StyleGAN keyword arguments to this function: + +.. code-block:: python + + import slideflow as sf + + P = sf.Project('/project/path') + dataset = P.dataset(tile_px=512, tile_um=400) + + P.gan_train( + dataset=dataset, + model='stylegan3', + cfg='stylegan3-r', + exp_label="ExperimentLabel", + gpus=4, + batch=32, + ... + ) + +The trained networks will be saved in the ``gan/`` subfolder in the project directory. + +StyleGAN2/3 can only be trained on images with sizes that are powers of 2. You can crop and/or resize images from a Dataset to match this requirement by using the ``crop`` and/or ``resize`` arguments: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, ...) + + # Train a GAN on images resized to 256x256 + P.gan_train( + ..., + resize=256, + ) + +See the :meth:`slideflow.Project.gan_train` documentation for additional +keyword arguments to customize training. + +Class conditioning +------------------ + +GANs can also be trained with class conditioning. To train a class-conditional GAN, simply provide a list of categorical +outcome labels to the ``outcomes`` argument of :meth:`slideflow.Project.gan_train`. For example, to train a GAN with class conditioning on ER status: + +.. code-block:: python + + P.gan_train( + ..., + outcomes='er_status' + ) + +Tile-level labels +----------------- + +In addition to class conditioning with slide-level labels, StyleGAN2/StyleGAN3 can be trained with tile-level class conditioning. Tile-level labels can be generated through ROI annotations, as described in :ref:`tile_labels`. + +Prepare a pandas dataframe, indexed with the format ``{slide}-{x}-{y}``, where ``slide`` is the name of the slide (without extension), ``x`` is the corresponding tile x-coordinate, and ``y`` is the tile y-coordinate. The dataframe should have a single column, ``label``, containing onehot-encoded category labels. For example: + +.. code-block:: python + + import pandas as pd + + df = pd.DataFrame( + index=[ + 'slide1-251-425', + 'slide1-560-241', + 'slide1-321-502', + ... + ], + data={ + 'label': [ + [1, 0, 0], + [1, 0, 0], + [0, 1, 0], + ... + ] + } + ) + +This dataframe can be generated, as described in :ref:`tile_labels`, through the :meth:`slideflow.Dataset.get_tile_dataframe` function. For GAN conditioning, the ``label`` column should be onehot-encoded. + +Once the dataframe is complete, save it in parquet format: + +.. code-block:: python + + df.to_parquet('tile_labels.parquet') + +And supply this file to the ``tile_labels`` argument of :meth:`slideflow.Project.gan_train`: + +.. code-block:: python + + P.gan_train( + ..., + tile_labels='tile_labels.parquet' + ) + +Generating images +***************** + +Images can be generated from a trained GAN and exported either as loose images +in PNG or JPG format, or alternatively stored in TFRecords. Images are generated from a list +of seeds (list of int). Use the :meth:`slideflow.Project.gan_generate` function +to generate images, with ``out`` set to a directory path if exporting loose images, +or ``out`` set to a filename ending in ``.tfrecords`` if saving images in +TFRecord format: + +.. code-block:: python + + network_pkl = '/path/to/trained/gan.pkl' + P.gan_generate( + network_pkl, + out='target.tfrecords', + seeds=range(100), + ... + ) + +The image format is set with the ``format`` argument: + +.. code-block:: python + + P.gan_generate( + ..., + format='jpg', + ) + +Class index (for class-conditional GANs) is set with ``class_idx``: + +.. code-block:: python + + P.gan_generate( + ..., + class_idx=1, + ) + +Finally, images can be resized after generation to match a target tile size: + +.. code-block:: python + + P.gan_generate( + ..., + gan_px=512, + gan_um=400, + target_px=299, + target_um=302, + ) + +Interactive visualization +------------------------- + +Slideflow Studio can be used to interactively visualize GAN-generated images (see :ref:`studio`). Images can be directly exported from this interface. This tool also enables you to visualize real-time predictions for GAN generated images when as inputs to a trained classifier. + +For more examples of using Slideflow to work with GAN-generated images, see `our GitHub repository `_ for code accompanying the previously referenced manuscript. \ No newline at end of file diff --git a/docs/_sources/tfrecords.rst.txt b/docs/_sources/tfrecords.rst.txt new file mode 100644 index 000000000..5f16f02e6 --- /dev/null +++ b/docs/_sources/tfrecords.rst.txt @@ -0,0 +1,296 @@ +.. _tfrecords: + +TFRecords: Reading and Writing +============================== + +TFRecords are binary files designed for storing large amounts of data. In Slideflow, TFRecords are used to store compressed image tiles extracted from whole-slide images. TFRecords are used instead of loose image files (such as ``*.jpg`` or ``*.png``) because they are compact, more easily distributed, and significantly improve data reading efficiency during model training. TFRecords were originally designed for Tensorflow, but they can also be used with PyTorch. + +The following sections describe the TFRecord data format and provide examples of how to create, read, and manipulate TFRecords using Slideflow. + +TFRecord Format +*************** + +TFRecords are binary files that contain a sequence of records, where each record represents an individual image tile. Each record contains a serialized `protocol buffer `_ with a list of named features. Each feature can be a list of bytes, floats, or integers. TFRecords are expected to have the following features: + +- **"image_raw"**: Bytes containing the image data (either JPG or PNG). +- **"slide"**: Bytes containing the slide name (in UTF-8 format). +- **"loc_x"**: Integer containing the x-coordinate of the tile (optional). +- **"loc_y"**: Integer containing the y-coordinate of the tile (optional). + +Slideflow expects each TFRecord to contain images from only a single slide, with the TFRecord name matching the slide name. The ``loc_x`` and ``loc_y`` features are optional, but are required for some operations (such as generating TFRecord heatmaps). + +.. note:: + + When reading TFRecords with Tensorflow, records are internally decoded using ``tf.train.Example``. When Tensorflow is not being used (such as when using the PyTorch backend), tfrecords are decoded using ``sf.util.example_pb2.Example``, providing an alternative decoder that does not require Tensorflow. Tensorflow's ``tf.train.Example`` and Slideflow's ``sf.util.example_pb2.Example`` are identical, except that ``sf.util.example_pb2.Example`` does not require Tensorflow and supports ``protobuf`` version 4. + + +TFRecord Indices +**************** + +Slideflow uses TFRecord index files to keep track of the internal structure of each TFRecord, improving efficiency of data reading. These index files are automatically built and stored in the same directory as the TFRecords upon first use. A TFRecord index is an ``*.npz`` file with the same name as the TFRecord, but with the ``*.index.npz`` extension. A TFRecord index contains the following fields: + +- **"arr_0"**: An array of shape ``(n_tiles, 2)`` containing the starting bytes and length of each record. +- **"locations"**: An array of shape ``(n_tiles, 2)`` containing the x- and y-coordinates of each tile. + +Index files for an entire dataset can be rebuilt using :meth:`slideflow.Dataset.rebuild_index()`. You can manually create an index file for a single TFRecord using :func:`sf.util.tfrecord2idx.create_index()`. + +Creating TFRecords +****************** + +From a Dataset +-------------- + +The typical way to create TFRecords is to use the :meth:`slideflow.Dataset.extract_tiles` function, as described in :ref:`filtering`. TFRecords will be exported to the destination configured in the :class:`slideflow.Dataset` object (see: :ref:`datasets_and_validation`). + +From a slide +------------ + +A TFRecord file for a single slide can be manually created using :meth:`slideflow.WSI.extract_tiles()` function. The first argument of this function is the TFRecord destination folder. + +From a directory of images +-------------------------- + +A directory of loose image files can be assembled into a TFRecord using :func:`slideflow.io.write_tfrecords_single()`: + +.. code-block:: python + + sf.io.write_tfrecords_single( + '/path/to/images', + '/path/to/destination', + filename='filename', + slide='slide', + ) + +A nested directory of loose image tiles, organized into subdirectory by slide name, can be simultaneously assembled into multiple TFRecords (one for each slide) using :func:`slideflow.io.write_tfrecords_multi()`. Slide names are determined from the subdirectory names: + +.. code-block:: python + + sf.io.write_tfrecords_multi( + '/path/to/nested_images', + '/path/to/destination' + ) + +Inspecting TFRecords +******************** + +Individual TFRecords +-------------------- + +The quickest way to inspect a TFRecord is to use :class:`slideflow.TFRecord`: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfr = sf.TFRecord('/path/to/tfrecord') + +An index file will be automatically created if one is not found. To disable automatic index creation, set ``create_index=False``. + +The TFRecord object has several useful attributes: + + >>> tfr.fields + ['image_raw', 'slide', 'loc_x', 'loc_y'] + >>> tfr.img_format + 'jpeg' + >>> tfr.length + 1000 + >>> tfr.locations + [(768, 256), (768, 512), ...] + +The ``fields`` attribute is a list of the fields in the TFRecord. + +The ``img_format`` attribute is the image format of the TFRecord (either ``"jpeg"`` or ``"png"``). + +The ``length`` attribute is the number of tiles in the TFRecord. + +The ``locations`` attribute is a list of the x- and y- center coordinates of each tile, if available, otherwise None. + +Inspecting Datasets +------------------- + +The :class:`slideflow.Dataset` object provides several methods for inspecting the TFRecords in a dataset generated through :meth:`slideflow.Dataset.extract_tiles`. + +The :meth:`slideflow.Dataset.summary()` method provides a summary of the dataset, including the location TFRecords are stored and the number of total number of tiles across all TFRecords in the dataset. + +.. code-block:: python + + # Prepare a dataset of image tiles. + dataset = project.dataset( + tile_px=299, # Tile size, in pixels. + tile_um='10x' # Tile size, in microns or magnification. + ) + dataset.summary() + + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + Overview: + ╒===============================================╕ + │ Configuration file: │ /mnt/data/datasets.json │ + │ Tile size (px): │ 299 │ + │ Tile size (um): │ 10x │ + │ Slides: │ 941 │ + │ Patients: │ 941 │ + │ Slides with ROIs: │ 941 │ + │ Patients with ROIs: │ 941 │ + ╘===============================================╛ + + Filters: + ╒====================╕ + │ Filters: │ {} │ + ├--------------------┤ + │ Filter Blank: │ [] │ + ├--------------------┤ + │ Min Tiles: │ 0 │ + ╘====================╛ + + Sources: + + TCGA_LUNG + ╒==============================================╕ + │ slides │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ roi │ /mnt/raid/SLIDES/TCGA_LUNG │ + │ tiles │ /mnt/rocket/tiles/TCGA_LUNG │ + │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │ + │ label │ 299px_10x │ + ╘==============================================╛ + + Number of tiles in TFRecords: 284114 + Annotation columns: + Index(['patient', 'subtype', 'site', 'slide'], + dtype='object') + +The :meth:`slideflow.Dataset.tfrecords()` method returns a list of paths to tfrecords. + +.. code-block:: python + + >>> tfrecords = dataset.tfrecords() + >>> len(tfrecords) + 941 + >>> tfrecords[0] + '/path/to/tfrecords1' + +The ``slideflow.Dataset.num_tiles`` attribute returns the total number of tiles across all TFRecords in the dataset. + +.. code-block:: python + + >>> dataset.num_tiles + 284114 + +Finally, the :meth:`slideflow.Dataset.manifest()` method returns a dictionary mapping TFRecord paths to the number tiles in each TFRecord. Each value returned by the dictionary is a nested dictionary with two keys: ``"total"``, which is the total number of tiles in the TFRecords, and ``"clipped"``, which is the number of tiles that will be taken from the TFRecord as a result of :ref:`clipping/undersampling `. + +.. code-block:: python + + >>> dataset.manifest() + {'/path/to/tfrecords1': {'total': 1000, 'clipped': 512}, + '/path/to/tfrecords2': {'total': 2000, 'clipped': 512}, + ...} + +Reading TFRecords +***************** + +Slideflow provides several tools for reading and parsing TFRecords. These tools are intended for debugging and development, and are not recommended for model training. Higher-level dataloaders, which supervise sampling, shuffling, sharding, batching, labeling, and augmenting, are discussed in :ref:`dataloaders`. + +Reading a single image tile +--------------------------- + +To get a single parsed record according to its index, use :meth:`slideflow.TFRecord.__getitem__()`, which returns a dictionary of the parsed record: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfr = sf.TFRecord('/path/to/tfrecord') + >>> tfr[0] + {'image_raw': b'...', 'slide': 'SLIDE_NAME', 'loc_x': 0, 'loc_y': 0} + +The ``'image_raw'`` field contains raw image bytes, in either JPG or PNG format. + +To get a single parsed record according to its location, use :meth:`slideflow.TFRecord.get_record_by_xy()`, which returns the slide name and image bytes: + +.. code-block:: python + + >>> tfr.get_record_by_xy(768, 256) + ('SLIDE_NAME', b'...') + +Image bytes can be decoded into Tensors (according to the active backend) using :func:`slideflow.io.decode_image()`: + +.. code-block:: python + + >>> import slideflow as sf + >>> slide, image = tfr.get_record_by_xy(768, 256) + >>> print(type(image)) + + >>> sf.io.decode_image(image) + >> import slideflow as sf + >>> tfr = '/path/to/tfrecords' + >>> sf.io.tfrecord2idx.create_index(tfr) + >>> index = sf.io.tfrecord2idx.load_index(tfr) + +Then, use :func:`slideflow.tfrecord_loader()` to create a generator that yields parsed records from the TFRecord: + +.. code-block:: python + + >>> loader = sf.tfrecord.tfrecord_loader(tfr, index) + >>> record = next(iter(loader)) + {'image_raw': , 'slide': , 'loc_x': [0], 'loc_y': [0]} + +Both ``"image_raw"`` and ``"slide"`` fields are returned as bytes in numpy arrays. The ``"loc_x"`` and ``"loc_y"`` fields are returned as integers. The image and slide name can be decoded using :func:`slideflow.io.decode_image()` and ``.decode('utf-8')``, respectively: + +.. code-block:: python + + >>> image = sf.io.decode_image(bytes(record['image_raw'])) + >>> slide = bytes(record['slide']).decode('utf-8') + +This iterator can be used to read all images from a TFRecord in sequence: + +.. code-block:: python + + >>> for record in loader: + ... image = sf.io.decode_image(bytes(record['image_raw'])) + ... slide = bytes(record['slide']).decode('utf-8') + +The iterator can be split into separate shards (data partitions) with the ``shard`` argument, a tuple of ``(shard_id, n_shards)``. This is useful for parallelizing data reading across multiple processes, threads, or compute nodes: + +.. code-block:: python + + >>> loader = sf.tfrecord.tfrecord_loader(tfr, index, shard=(0, 2)) + +Data sharding ensures that each shard reads a unique subset of the data, and that each record is read exactly once. + +An index file is recommended for improving efficiency of data reading, and required if using data sharding. + +Interleaving multiple TFRecords +------------------------------- + +You can also interleave multiple TFRecords using :func:`slideflow.multi_tfrecord_loader()`. This function takes a list of TFRecord paths and a list of corresponding TFRecord indices, and returns a generator that randomly samples from TFRecords and parses the records: + +.. code-block:: python + + >>> import slideflow as sf + >>> tfrs = ['/path/to/tfrecord1', '/path/to/tfrecord2'] + >>> indices = [sf.io.tfrecord2idx.load_index(tfr) for tfr in tfrs] + >>> loader = sf.tfrecord.multi_tfrecord_loader(tfrs, indices) + >>> record = next(iter(loader)) + {'image_raw': , 'slide': , 'loc_x': [0], 'loc_y': [0]} + +By default, records are sampled from TFRecords with equal probability (i.e. uniform sampling). You can also specify a list of weights to sample from TFRecords with different probabilities (i.e. weighted sampling) via the ``weights`` argument. The weights should be a list of floats, one for each TFRecord, that sum to 1.0: + +.. code-block:: python + + >>> loader = sf.tfrecord.multi_tfrecord_loader(tfrs, indices, weights=[0.5, 0.5]) + +Records will be sampled infinitely by default. To disable infinite sampling, set ``infinite=False``. + +TFRecord sharding is also supported for ``multi_tfrecord_loader()`` via the ``shard`` argument. + diff --git a/docs/_sources/tile_labels.rst.txt b/docs/_sources/tile_labels.rst.txt new file mode 100644 index 000000000..d3aff16c0 --- /dev/null +++ b/docs/_sources/tile_labels.rst.txt @@ -0,0 +1,129 @@ +.. _tile_labels: + +Strong Supervision with Tile Labels +==================================== + +Pathology deep learning models are commonly trained with weak supervision, where the labels for individual image tiles are inherited from the parent slide. The end goal for such models is to predict the label for the entire slide, rather than individual tiles. + +However, it is also possible to train models with strong supervision, where the labels for individual +image tiles are determined through :ref:`Region of Interest (ROI) ` labels. This note describes the process by which such labels are generated, and how they can be used to train a model. Training models with strong supervision requires PyTorch and is not supported in TensorFlow. + +Labeling ROIs +************* + +The first step is to create regions of interest (ROIs). The fastest way to create labeled ROIs is with :ref:`Slideflow Studio `, which includes integrated tools for quickly assigning labels to both new and existing ROIs. However, it is also possible to create ROIs with other tools, such as QuPath or ImageScope (as described :ref:`here `), and modify the generated ROI CSV file to add labels. + +ROI CSV files are formatted with three required columns: "roi_name", "x_base", and "y_base". Each row is a single point in an ROI, with the "x_base" and "y_base" columns specifying the X/Y coordinates in the slide's lowest (base) dimension. Individual ROIs are grouped by the "roi_name" column, with each ROI having a unique name. An optional fourth column, "label", can be used to assign a label to each ROI. For example: + +.. code-block:: csv + + roi_name,x_base,y_base,label + 1,100,100,tumor + 1,104,165,tumor + 1,532,133,tumor + 1,101,101,tumor + 2,200,200,stroma + 2,200,235,stroma + 2,222,267,stroma + 2,202,201,stroma + +When ROIs are saved in Slideflow Studio, they are exported in this file format and saved in either the current working directory or, if a project is loaded, in the configured project directory . + +Building tile labels +******************** + +Once ROIs have been generated, labeled, and saved in CSV format, the next step is to build a dataframe of tile labels. If not already done, start by :ref:`configuring a project ` and ensuring that ROIs are in the correct directory. You can verify that the ROIs are in the right place by confirming that :meth:`slideflow.Dataset.rois` returns the number of slides with ROIs: + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/path/to/project') + >>> dataset = P.dataset(tile_px=256, tile_um=256) + >>> len(dataset.rois()) + 941 + +Next, build a dataframe of tile labels with :meth:`slideflow.Dataset.get_tile_dataframe`. This will return a dataframe with tile coordinates (X/Y of tile center, in base dimension), slide grid index, and associated ROI name/label if the tile is in an ROI. For example: + +.. code-block:: python + + >>> df = dataset.get_tile_dataframe() + >>> df.head() + loc_x loc_y grid_x grid_y roi_name roi_desc label slide + slide1-608-608 608 608 0 0 ROI_0 None tumor slide1 + slide1-608-864 608 864 0 1 ROI_0 None tumor slide1 + slide1-608-1120 608 1120 0 2 ROI_0 None tumor slide1 + ... + +The index for this dataframe is the tile ID, a unique identifier built from a combination of the slide name and tile coordinates. + +When training with supervised labels, we'll want to exclude tiles that are either not in an ROI or are in an unlabeled ROI. This can be done by filtering the dataframe to only include rows where the "label" column is not None: + +.. code-block:: python + + >>> df = df.loc[df.label.notnull()] + +Finally, we'll only need the "label" column and tile ID for training, so all other columns can be dropped. This step is optional but may reduce memory usage. + +.. code-block:: python + + >>> df = df[['label']] + >>> df.head() + label + slide1-608-608 tumor + slide1-608-864 tumor + slide1-608-1120 tumor + ... + +This dataframe can now be used to train a model with strong supervision. + +Training a model +**************** + +Training a model with strong supervision requires using a :class:`slideflow.model.Trainer`, as described in :ref:`tutorial2`. The only difference when training with strong supervision is that the trainer should be initialized with the tile dataframe for the labels: + +.. code-block:: python + + >>> trainer = sf.model.build_trainer(..., labels=df) + >>> trainer.train(...) + +Once training has finished, the saved model can be used interchangeably with models trained with weak supervision for evaluation, inference, feature generation, etc. + +Complete example +**************** + +Below is a complete example of training a model with strong supervision. This example assumes that a project has already been configured, tiles have been extracted, and ROIs have been generated and labeled. + +.. code-block:: python + + import slideflow as sf + + # Load project and dataset + P = sf.load_project('/path/to/project') + dataset = P.dataset(tile_px=256, tile_um=256) + + # Build tile label dataframe, and filter + # to only include tiles in an ROI. + df = dataset.get_tile_dataframe() + df = df.loc[df.label.notnull()] + + # Subsample our dataset to only include slides with ROI labels. + dataset = dataset.filter({'slide': list(df.slide.unique())}) + + # Split the dataset into training and validation. + train, val = dataset.split(val_fraction=0.3) + + # Build model hyperparameters + hp = sf.ModelParams( + tile_px=256, + tile_um=256, + model='xception', + batch_size=32 + ) + + # Train model + trainer = sf.model.build_trainer( + hp=hp, + outdir='/path/to/outdir', + labels=df + ) + trainer.train(train, val) diff --git a/docs/_sources/training.rst.txt b/docs/_sources/training.rst.txt index 25854b7ad..6e954d400 100644 --- a/docs/_sources/training.rst.txt +++ b/docs/_sources/training.rst.txt @@ -1,10 +1,23 @@ +.. _training: + Training ======== +Slideflow offers tools for training many types of neural networks, including: + +- **Weakly supervised, tile-based models**: Models trained on image tiles, with labels inherited from the parent slide. +- **Weakly supervised, multi-instance learning**: Models trained on feature vectors, with labels inherited from the parent slide. +- **Strongly supervised models**: Models trained on image tiles, with labels assigned by ROI. +- **Self-supervised pretraining**: Contrastive pretraining with or without labels (e.g. `SimCLR `_). +- **Generative adversarial networks**: Models trained to generate synthetic images (e.g. `StyleGAN2/3 `_). +- **Segmentation models**: Models trained to identify and classify tissue regions (e.g. `U-Net `_). + +In this section, we will walk through the process of training a weakly supervised tile-based model. :ref:`Strong supervision `, :ref:`Multi-instance learning (MIL) `, :ref:`self-supervised pretraining (SSL) `, :ref:`generative adversarial networks (GAN) `, and :ref:`segmentation` are described in other sections. + Prepare hyperparameters *********************** -The first step of model training is configuring a set of model parameters / training hyperparameters. There are two methods for configuring model parameters. If you intend to train a model using a single combination of hyperparameters, use the ``ModelParams`` class: +The first step of training a weakly-supervised model is configuring model parameters and hyperparameters with :class:`slideflow.ModelParams`. ``ModelParams`` determines the model architecture, loss, preprocessing augmentations, and training hyperparameters. .. code-block:: python @@ -18,116 +31,352 @@ The first step of model training is configuring a set of model parameters / trai ... ) -Alternatively, if you intend to perform a sweep across multiple hyperparameter combinations, use the ``Project.create_hp_sweep()`` function to automatically save a sweep to a JSON file. For example, the following would set up a batch_train file with two combinations; the first with a learning rate of 0.01, and the second with a learning rate of 0.001: +See the :class:`slideflow.ModelParams` API documentation for a list of available hyperparameters. + +.. note:: + + If you are using a continuous variable as an outcome measure, be sure to use a regression loss function. Regression loss functions can be viewed in ``slideflow.ModelParams.RegressionLossDict``, and all available loss functions are in ``slideflow.ModelParams.AllLossDict``. + +Training a model +**************** + +Slideflow provides two methods for training models: with the high-level :meth:`slideflow.Project.train` function or with the lower-level :class:`slideflow.model.Trainer`. The former provides an easier interface for executing complex training tasks with a single function call, while the latter provides lower-level access for greater customizability. + +.. _training_with_project: + +Training with a Project +----------------------- + +:meth:`slideflow.Project.train` provides an easy API for executing complex training plans and organizing results in the project directory. This is the recommended way to train models in Slideflow. There are two required arguments for this function: + +- ``outcomes``: Name (or list of names) of annotation header columns, from which to determine slide labels. +- ``params``: Model parameters. + +The default validation plan is three-fold cross-validation, but the validation strategy can be customized via keyword arguments (``val_strategy``, ``val_k_fold``, etc) as described in the API documentation. If crossfold validation is used, each model in the crossfold will be trained sequentially. Read more about :ref:`validation strategies `. + +By default, all slides in the project will be used for training. You can restrict your training/validation data to only a subset of slides in the project with one of two methods: either by providing ``filters`` or a filtered :class:`slideflow.Dataset`. + +For example, you can use the ``filters`` argument to train/validate only using slides labeled as "train_and_val" in the "dataset" column with the following syntax: + +.. code-block:: python + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + filters={"dataset": ["train_and_val"]} + ) + +Alternatively, you can restrict the training/validation dataset by providing a :class:`slideflow.Dataset` to the ``dataset`` argument: + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + dataset = dataset.filter({"dataset": ["train_and_val"]}) + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + dataset=dataset + ) + +In both cases, slides will be further split into training and validation sets using the specified validation settings (defaulting to three-fold cross-validation). + +For more granular control over the validation dataset used, you can supply a :class:`slideflow.Dataset` to the ``val_dataset`` argument. Doing so will cause the rest of the validation keyword arguments to be ignored. + +.. code-block:: python + + dataset = P.dataset(tile_px=299, tile_um=302) + train_dataset = dataset.filter({"dataset": ["train"]}) + val_dataset = dataset.filter({"dataset": ["val"]}) + + results = P.train( + outcomes="tumor_type", + params=sf.ModelParams(...), + dataset=train_dataset + val_dataset=val_dataset + ) + +Performance metrics - including accuracy, loss, etc. - are returned as a dictionary and saved in ``results_log.csv`` in both the project directory and model directory. Additional data, including ROCs and scatter plots, are saved in the model directories. Pandas DataFrames containing tile-, slide-, and patient-level predictions are also saved in the model directory. + +At each designated epoch, models are saved in their own folders. Each model directory will include a copy of its hyperparameters in a ``params.json`` file, and a copy of its training/validation slide manifest in ``slide.log``. + +.. _training_with_trainer: + +Using a Trainer +--------------- + +You can also train models outside the context of a project by using :class:`slideflow.model.Trainer`. This lower-level interface provides greater flexibility for customization and allows models to be trained without requiring a Project to be set up. It lacks several convenience features afforded by using :meth:`slideflow.Project.train`, however, such as cross-validation, logging, and label preparation for easy multi-outcome support. + +For this training approach, start by building a trainer with :func:`slideflow.model.build_trainer`, which requires: + +- ``hp``: :class:`slideflow.ModelParams` object. +- ``outdir``: Directory in which to save models and checkpoints. +- ``labels``: Dictionary mapping slide names to outcome labels. + +:class:`slideflow.Dataset` provides a ``.labels()`` function that can generate this required labels dictionary. + +.. code-block:: python + + # Prepare dataset and labels + dataset = P.dataset(tile_px=299, tile_um=302) + labels, unique_labels = dataset.labels('tumor_type') + + # Split into training/validation + train_dataset = dataset.filter({"dataset": ["train"]}) + val_dataset = dataset.filter({"dataset": ["val"]}) + + # Determine model parameters + hp = sf.ModelParams( + tile_px=299, + tile_um=302, + batch_size=32, + ... + ) + + # Prepare a Trainer + trainer = sf.model.build_trainer( + hp=hp, + outdir='path', + labels=labels + ) + +Use :meth:`slideflow.model.Trainer.train` to train a model using your specified training and validation datasets. + +.. code-block:: python + + # Train a model + trainer.train(train_dataset, val_dataset) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + { + "epochs": { + "epoch3": { + "train_metrics": { + "loss": 0.497 + "accuracy": 0.806 + "val_loss": 0.719 + "val_accuracy": 0.778 + }, + "val_metrics": { + "loss": 0.727 + "accuracy": 0.770 + }, + "tile": { + "Outcome 0": [ + 0.580 + 0.580 + ] + }, + "slide": { + "Outcome 0": [ + 0.658 + 0.658 + ] + }, + "patient": { + "Outcome 0": [ + 0.657 + 0.657 + ] + } + } + } + } + +Read more about the ``Trainer`` class and available keyword arguments in the :class:`API documentation `. + +Multiple outcomes +***************** + +Slideflow supports both classification and regression, as well as training to single or multiple outcomes at once. To train with multiple outcomes simultaneously, simply pass multiple annotation headers to the ``outcomes`` argument of :meth:`slideflow.Project.train`. + +Time-to-event / survival outcomes +********************************* + +Models can also be trained to a time series outcome using Cox Proportional Hazards (CPH) and negative log likelihood loss. For time-to-event / survival models, use ``'negative_log_likelihood'`` loss and set ``outcomes`` equal to the annotation column indicating event *time*. Specify the event *type* (0 or 1) by passing the event type annotation column to the argument ``input_header``. If you are using multiple clinical inputs, the first header passed to ``input_header`` must be event type. Survival models are not compatible with multiple outcomes. + +.. note:: + Survival models are currently only available with the Tensorflow backend. PyTorch support for survival outcomes is in development. + +Multimodal models +***************** + +In addition to training using image data, clinical data can also be provided as model input by passing annotation column headers to the variable ``input_header``. This input is concatenated at the post-convolutional layer, prior to any configured hidden layers. + +If desired, models can also be trained with clinical input data alone, without images, by using the hyperparameter argument ``drop_images=True``. + +.. _hyperparameter_optimization: + +Hyperparameter optimization +*************************** + +Slideflow includes several tools for assisting with hyperparameter optimization, as described in the next sections. + +Testing multiple combinations +----------------------------- + +You can easily test a series of hyperparameter combinations by passing a list of ``ModelParams`` object to the ``params`` argument of :meth:`slideflow.Project.train`. + +.. code-block:: python + + hp1 = sf.ModelParams(..., batch_size=32) + hp2 = sf.ModelParams(..., batch_size=64) + + P.create_hp_sweep( + ..., + params=[hp1, hp2] + ) + +Grid-search sweep +----------------- + +You can also prepare a grid-search sweep, testing every permutation across a series of hyperparameter ranges. Use :meth:`slideflow.Project.create_hp_sweep`, which will calculate and save the sweep configuration to a JSON file. For example, the following would configure a sweep with only two combinations; the first with a learning rate of 0.01, and the second with a learning rate of 0.001: .. code-block:: python P.create_hp_sweep( - epochs=[5], - toplayer_epochs=0, + filename='sweep.json', model=['xception'], - pooling=['avg'], loss='sparse_categorical_crossentropy', learning_rate=[0.001, 0.0001], batch_size=64, - hidden_layers=[1], - optimizer='Adam', - augment='xyrj' ) -Available hyperparameters include: - -- **augment** - Image augmentations to perform, including flipping/rotating and random JPEG compression. Please see :class:`slideflow.model.ModelParams` for more details. -- **batch_size** - Batch size for training. -- **dropout** - Adds dropout layers after each fully-connected layer. -- **early_stop** - Stop training early if validation loss/accuracy is not decreasing. -- **early_stop_patience** - Number of epochs to wait before allowing early stopping. -- **early_stop_method** - mMtric to use for early stopping. Includes 'loss', 'accuracy', or 'manual'. -- **epochs** - Number of epochs to spend training the full model. -- **include_top** - Include the default, preconfigured, fully connected top layers of the specified model. -- **hidden_layers** - Number of fully-connected final hidden layers before softmax prediction. -- **hidden_layer_width** - Width of hidden layers. -- **l1** - Adds L1 regularization to all convolutional layers with this weight. -- **l1_dense** - Adds L1 regularization to all fully-conected Dense layers with this weight. -- **l2** - Adds L2 regularization to all convolutional layers with this weight. -- **l2_dense** - Adds L2 regularization to all fully-conected Dense layers with this weight. -- **learning_rate** - Learning rate for training. -- **learning_rate_decay** - lLarning rate decay during training. -- **learning_rate_decay_steps** - Number of steps after which to decay learning rate -- **loss** - loss function; please see `Keras loss documentation `_ for all options. -- **manual_early_stop_epoch** - Manually trigger early stopping at this epoch/batch. -- **manual_early_stop_batch** - Manually trigger early stopping at this epoch/batch. -- **model** - Model architecture; please see `Keras application documentation `_ for all options. -- **normalizer** - Normalization method to use on images. -- **normalizer_source** - Optional path to normalization image to use as the source. -- **optimizer** - Training optimizer; please see `Keras opt documentation `_ for all options. -- **pooling** - Pooling strategy to use before final fully-connected layers; either 'max', 'avg', or 'none'. -- **tile_px** - Size of extracted tiles in pixels. -- **tile_um** - Size of extracted tiles in microns. -- **toplayer_epochs** - Number of epochs to spend training just the final layer, with all convolutional layers "locked" (sometimes used for transfer learning). -- **trainable_layers** - Number of layers available for training, other layers will be frozen. If 0, all layers are trained. -- **training_balance** - Training input balancing strategy; please see :ref:`balancing` for more details. -- **uq** - Enable uncertainty quantification (UQ) during inference. Requires dropout to be non-zero. -- **validation_balance** - Validation input balancing strategy; please see :ref:`balancing` for more details. - -If you are using a continuous variable as an outcome measure, be sure to use a linear loss function. Linear loss functions can be viewed in ``slideflow.model.ModelParams.LinearLossDict``, and all available loss functions are in ``slideflow.model.ModelParams.AllLossDict``. - -Begin training -************** - -Once your hyperparameter settings have been chosen you may begin training using the ``train`` function. Documentation of the function is given below: - -.. autofunction:: slideflow.Project.train - :noindex: - -If you used the ``ModelParams`` class to configure a single combination of parameters, pass this object via the ``params`` argument. If you configured a hyperparameter sweep, set this argument to the name of your hyperparameter sweep file (saved by default to 'sweep.json'). - -Your outcome variable(s) are specified with the ``outcomes`` argument. You may filter slides for training using the ``filter`` argument, as previously described. - -For example, to train using only slides labeled as "train" in the "dataset" column, with the outcome variable defined by the column "category", use the following syntax: - -.. code-block:: python - - P.train( - outcomes="category", - filters={"dataset": ["train"]}, - params='sweep.json' +The sweep is then executed by passing the JSON path to the ``params`` argument of :meth:`slideflow.Project.train()`: + +.. code-block:: python + + P.train(params='sweep.json', ...) + +.. _bayesian_optimization: + +Bayesian optimization +--------------------- + +You can also perform Bayesian hyperparameter optimization using `SMAC3 `_, which uses a `configuration space `_ to determine the types and ranges of hyperparameters to search. + +Slideflow provides several functions to assist with building these configuration spaces. :func:`slideflow.util.create_search_space` allows you to define a range to search for each hyperparameter via keyword arguments: + +.. code-block:: python + + import slideflow as sf + + config_space = sf.util.create_search_space( + normalizer=['macenko', 'reinhard', 'none'], + dropout=(0.1, 0.5), + learning_rate=(1e-4, 1e-5) ) -If you would like to use a different validation plan than the default, pass the relevant keyword arguments to the training function. +:func:`slideflow.util.broad_search_space` and :func:`slideflow.util.shallow_search_space` provide preconfigured search spaces that will search a broad and narrow range of hyperparameters, respectively. You can also customize a preconfigured search space using keyword arguments. For example, to do a broad search but disable L1 searching: -Once training has finished, performance metrics - including accuracy, loss, etc. - can be found in the ``results_log.csv`` file in the project directory. Additional data, including ROCs and scatter plots, are saved in the model directories. +.. code-block:: python -At each designated epoch, models are saved in their own folders. Each model directory will include a copy of its hyperparameters in a ``params.json`` file, and a copy of its training/validation slide manifest in ``slide.log``. + import slideflow as sf -Multiple outcomes -***************** + config_space = sf.util.broad_search_space(l1=None) + +See the linked API documentation for each function for more details about the respective search spaces. -Slideflow supports both categorical and continuous outcomes, as well as training to single or multiple outcomes at once. To use multiple outcomes simultaneously, simply pass multiple annotation headers to the ``outcomes`` argument. +Once the search space is determined, you can perform the hyperparameter optimization by simply replacing :meth:`slideflow.Project.train` with :meth:`slideflow.Project.smac_search`, providing the configuration space to the argument ``smac_configspace``. By default, SMAC3 will optimize the tile-level AUROC, but the optimization metric can be customized with the keyword argument ``smac_metric``. -Multiple input variables -************************ +.. code-block:: python -In addition to training using image data, clinical data can also be provided as model input by passing annotation column headers to the variable ''input_header''. This input is merged at the post-convolutional layer, prior to any configured hidden layers. + # Base hyperparameters + hp = sf.ModelParams(tile_px=299, ...) -If desired, models can also be trained with clinical input data alone, without images, by using the hyperparameter argument ``drop_images=True``. + # Configuration space to optimize + config_space = sf.util.shallow_search_space() -Cox Proportional Hazards (CPH) models -************************************* + # Run the Bayesian optimization + best_config, history = P.smac_search( + outcomes='tumor_type', + params=hp, + smac_configspace=cs, + smac_metric='tile_auc', + ... + ) + print(history) -Models can also be trained to a time series outcome using CPH and negative log likelihood loss. For CPH models, use `'negative_log_likelihood'` loss and set ``outcomes`` equal to the annotation column indicating event *time*. Specify the event *type* (0 or 1) by passing the event type annotation column to the argument ``input_header``. If you are using multiple clinical inputs, the first header passed to ``input_header`` must be event type. CPH models are not compatible with multiple outcomes. +.. rst-class:: sphx-glr-script-out -.. note:: - CPH models are currently unavailable with the PyTorch backend. PyTorch support for CPH outcomes is in development. + .. code-block:: none + + dropout l1 l2 metric + 0 0.126269 0.306857 0.183902 0.271778 + 1 0.315987 0.014661 0.413443 0.283289 + 2 0.123149 0.311893 0.184439 0.250339 + 3 0.250000 0.250000 0.250000 0.247641 + 4 0.208070 0.018481 0.121243 0.257633 + +:meth:`slideflow.Project.smac_search` returns the best configuration and a history of models trained during the search. This history is a Pandas DataFrame with hyperparameters for columns, and a "metric" column with the optimization metric result for each trained model. The run history is also saved in CSV format in the associated model folder. -Distributed training across GPUs -******************************** +See the API documentation for available customization via keyword arguments. -If multiple GPUs are available, training can be distributed by passing the argument ``multi_gpu=True``. If provided, slideflow will use all available (and visible) GPUs for training. +.. _custom_loss: + +Customizing model or loss +************************* + +Slideflow supports dozens of model architectures, but you can also train with a custom architecture, as demonstrated in :ref:`tutorial3`. + +Similarly, you can also train with a custom loss function by supplying a dictionary to the ``loss`` argument in ``ModelParams``, with the keys ``type`` (which must be either ``'classification'``, ``'regression'``, or ``'survival'``) and ``fn`` (a callable loss function). + +For Tensorflow/Keras, the loss function must accept arguments ``y_true, y_pred``. For regression losses, ``y_true`` may need to be cast to ``tf.float32``. An example custom regression loss is given below: + +.. code-block:: python + + # Custom Tensorflow loss + def custom_regression_loss(y_true, y_pred): + y_true = tf.cast(y_true, tf.float32) + squared_difference = tf.square(y_true - y_pred) + return tf.reduce_mean(squared_difference, axis=-1) + + +For PyTorch, the loss function must return a nested loss function with arguments ``output, target``. An example regression loss is given below: + +.. code-block:: python + + # Custom PyTorch loss + def custom_regression_loss(): + def loss_fn(output, target): + return torch.mean((target - output) ** 2) + return loss_fn + + +In both cases, the loss function is applied as follows: + +.. code-block:: python + + hp = sf.ModelParams(..., loss={'type': 'regression', 'fn': custom_regression_loss}) + + +Using multiple GPUs +******************* + +Slideflow can perform distributed training if multiple GPUs are available. Enable distributed training by passing the argument ``multi_gpu=True``, which will allow Slideflow to use all available (and visible) GPUs. + +.. _from_wsi: + +Training without TFRecords +************************** + +It is also possible to train deep learning models directly from slides, without first generating TFRecords. This may be advantageous for rapidly prototyping models on a large dataset, or when tuning the tile size for a dataset. + +Use the argument ``from_wsi=True`` in either the :meth:`slideflow.Project.train` or :meth:`slideflow.model.Trainer.train` functions. Image tiles will be dynamically extracted from slides during training, and background will be automatically removed via Otsu's thresholding. + +.. note:: + + Using the :ref:`cuCIM backend ` will greatly improve performance when training without TFRecords. Monitoring performance ********************** +Tensorboard +----------- + During training, progress can be monitored using Tensorflow's bundled ``Tensorboard`` package by passing the argument ``use_tensorboard=True``. This functionality was disabled by default due to a recent bug in Tensorflow. To use tensorboard to monitor training, execute: .. code-block:: bash @@ -135,3 +384,14 @@ During training, progress can be monitored using Tensorflow's bundled ``Tensorbo $ tensorboard --logdir=/path/to/model/directory ... and open http://localhost:6006 in your web browser. + +Neptune.ai +---------- + +Experiments can be automatically logged with `Neptune.ai `_. To enable logging, first locate your Neptune API token and workspace ID, and configure the environmental variables ``NEPTUNE_API_TOKEN`` and ``NEPTUNE_WORKSPACE``. + +With the environmental variables set, Neptune logs are enabled by passing ``use_neptune=True`` to ``sf.load_project``. + +.. code-block:: python + + P = sf.load_project('/project/path', use_neptune=True) \ No newline at end of file diff --git a/docs/_sources/troubleshooting.rst.txt b/docs/_sources/troubleshooting.rst.txt index 08dd04de5..26429f222 100644 --- a/docs/_sources/troubleshooting.rst.txt +++ b/docs/_sources/troubleshooting.rst.txt @@ -8,7 +8,15 @@ To check for errors in your environment or installation, you can also use the te Testing ******* -To test all pipeline functions, use the ``test.py`` script, providing a path to a directory containing slides to use for testing: +To troubleshoot environment or installation issues, start by running unit tests, +which do not require any sample slides. Use the ``test.py`` script without any +arguments: + +.. code-block:: bash + + $ python3 test.py + +For a more comprehensive test of all pipeline functions, provide a path to a directory containing sample slides via ``--slides``, setting ``--all=True`` to run all tests: .. code-block:: bash @@ -25,4 +33,21 @@ To view a list of all tests that will be run (and thus can be skipped), pass the Issue Reporting *************** -If the issue is still unclear, please submit an Issue on the `project Github page `_. \ No newline at end of file +If the issue is still unclear, please submit an Issue on the `project Github page `_. Be sure to include the following information: + +* The version of Slideflow you are using, which can be displayed with ``sf.about()``: + +.. code-block:: bash + + $ python3 -c "import slideflow; slideflow.about()" + ╭=======================╮ + │ Slideflow │ + │ Version: 2.1.0 │ + │ Backend: tensorflow │ + │ Slide Backend: cucim │ + │ https://slideflow.dev │ + ╰=======================╯ + +* The active deep learning backend (``sf.backend()``) and slide backend (``sf.slide_backend()``) +* The version of Python you are using (``python3 --version``) +* The operating system you are using (``uname -a``) diff --git a/docs/_sources/tutorial1.rst.txt b/docs/_sources/tutorial1.rst.txt index b01f590d1..25d8f2b5a 100644 --- a/docs/_sources/tutorial1.rst.txt +++ b/docs/_sources/tutorial1.rst.txt @@ -3,15 +3,14 @@ Tutorial 1: Model training (simple) ===================================== -In this first tutorial, we will walk through the steps needed to take an example project from start to finish, using -the bundled ``run_project.py`` script to execute pipeline functions. As with all of these tutorials, we will use +In this first tutorial, we will walk through the steps needed to take an example project from start to finish. As with all of these tutorials, we will use publicly available data from `The Cancer Genome Atlas (TCGA) `_. In this first tutorial, we will train a model to predict ER status from breast cancer slides. Examples will be given assuming project files are in the directory ``/home/er_project`` and slides are in ``/home/brca_slides``, although you will need to customize these paths according to your needs. -Project Planning +Create a Project **************** First, download slides and annotations for the TCGA-BRCA project using the `legacy GDC portal @@ -19,45 +18,22 @@ First, download slides and annotations for the TCGA-BRCA project using the `lega patients. Our outcome of interest is "er_status_by_ihc", of which 1011 have a documented result (either "Positive" or "Negative"), giving us our final patient count of 1011. -To create a new project, use the ``run_project.py`` script: +Create a new project, and pass the path to the downloaded slides to the argument ``slides``. -.. code-block:: bash +.. code-block:: python + + import slideflow as sf + + P = sf.create_project( + root='/home/er_project', + slides='/path/to/slides' + ) + +After the project is created, we can load the project with: + +.. code-block:: python - $ python3 run_project.py -p /home/er_project - -We will then be taken through an interactive prompt asking for project settings. When prompted, use the -following settings (mostly defaults): - -+-------------------------------+-------------------------------------------------------+ -| **name** | Breast_ER | -+-------------------------------+-------------------------------------------------------+ -| **annotations** | ./annotations.csv (default) | -+-------------------------------+-------------------------------------------------------+ -| **dataset_config** | ./datasets.json (default) | -+-------------------------------+-------------------------------------------------------+ -| **sources** | BRCA | -+-------------------------------+-------------------------------------------------------+ -| **models_dir** | ./models (default) | -+-------------------------------+-------------------------------------------------------+ -| **eval_dir** | ./eval | -+-------------------------------+-------------------------------------------------------+ - -After a blank datasets.json file is created, we will be prompted to add a new dataset source. Use the following -configuration for the added dataset source: - -+-------------------------------+-------------------------------------------------------+ -| **source** | BRCA | -+-------------------------------+-------------------------------------------------------+ -| **slides** | /home/brca_slides | -+-------------------------------+-------------------------------------------------------+ -| **roi** | /home/brca_slides | -+-------------------------------+-------------------------------------------------------+ -| **tiles** | /home/er_project/tiles | -+-------------------------------+-------------------------------------------------------+ -| **tfrecords** | /home/er_project/tfrecords | -+-------------------------------+-------------------------------------------------------+ - -For simplicity, we will not be using annotated tumor regions of interest (ROI), instead training on whole-slide images. + P = sf.load_project('/home/er_project') Setting up annotations ********************** @@ -66,9 +42,9 @@ With our project initialized, we can set up our annotations file. Use the downlo CSV file, with a column "patient" indicating patient name (in the case of TCGA, these are in the format TCGA-SS-XXXX, where SS indicates site of origin and XXXX is the patient identifier), and a column "er_status_by_ihc" containing our outcome of interest. Add a third column "slide" containing the name of the slide associated with the -patient. If there are multiple slides per patient, list each slide on a separate row. Finally, add a column "dataset" -to indicate whether the slide should be used for training or evaluation. Set aside somewhere around 10-30% of the -dataset for evaluation. +patient (without the file extension). If there are multiple slides per patient, list each slide on a separate row. +Finally, add a column "dataset" to indicate whether the slide should be used for training or evaluation. Set aside +somewhere around 10-30% of the dataset for evaluation. .. note:: @@ -91,21 +67,18 @@ Your annotations file should look something like: | ... | ... | ... | ... | +-----------------------+--------------------+-----------+-----------------------------------+ +Save this CSV file in your project folder with the name ``annotations.csv``. Tile extraction *************** -The next step is to extract tiles from our slides. Find the sample ``actions.py`` file in the project folder, which we -will modify and use to execute our pipeline functions. Delete the commented-out examples in this file. - -For this example, we will use a 256px x 256px tile size, at 0.5 µm/pixel (128 um). Add the following -to the project ``actions.py`` file: +The next step is to extract tiles from our slides. For this example, we will use a 256px x 256px tile size, +at 0.5 µm/pixel (128 um). .. code-block:: python - def main(P): - # Extract tiles at 256 pixels, 0.5 um/px - P.extract_tiles(tile_px=256, tile_um=128) + # Extract tiles at 256 pixels, 0.5 um/px + P.extract_tiles(tile_px=256, tile_um=128) .. hint:: Tile extraction speed is greatly improved when slides are on an SSD or ramdisk; slides can be automatically @@ -119,22 +92,18 @@ Training ******** After tiles are extracted, the dataset will be ready for training. We will train with a single set of manually defined -hyperparameters, which we can configure with :class:`slideflow.model.ModelParams`. We will use the +hyperparameters, which we can configure with :class:`slideflow.ModelParams`. We will use the `Xception `_ model with a batch size of 32, otherwise keeping defaults. .. code-block:: python - def main(P): - from slideflow.model import ModelParams - ... - - hp = ModelParams( - tile_px=256, - tile_um=128, - model='xception', - batch_size=32, - epochs=[3] - ) + hp = sf.ModelParams( + tile_px=256, + tile_um=128, + model='xception', + batch_size=32, + epochs=[3] + ) For training, we will use 5-fold cross-validation on the training dataset. To set up training, invoke the :meth:`slideflow.Project.train` function with the outcome of interest, our hyperparameters, and our validation plan. @@ -143,17 +112,14 @@ to only include patients with documented ER status (otherwise a blank "" would b .. code-block:: python - def main(P): - ... - - # Train with 5-fold cross-validation - P.train( - 'ER_status', - params=hp, - val_k_fold=5, - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) + # Train with 5-fold cross-validation + P.train( + 'er_status_by_ihc', + params=hp, + val_k_fold=5, + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) After cross validation is complete, we will want to have a model trained across the entire dataset, so we can assess performance on our held-out evaluation set. To train a model across the entire training dataset without validation, @@ -161,59 +127,56 @@ we will set ``val_strategy`` to ``None``: .. code-block:: python - def main(P): - ... - - # Train across the entire training dataset - P.train( - 'ER_status', - params=hp, - val_strategy='none', - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) + # Train across the entire training dataset + P.train( + 'er_status_by_ihc', + params=hp, + val_strategy='none', + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) -Now, it's time to start our pipeline. To review, our ``actions.py`` file at this point should look like: +Now, it's time to start our pipeline. To review, our complete script should look like: .. code-block:: python - def main(P): - from slideflow.model import ModelParams - - # Extract tiles at 256 pixels, 0.5 um/px - P.extract_tiles(tile_px=256, tile_um=128) - - hp = ModelParams( - tile_px=256, - tile_um=128, - model='xception', - batch_size=32, - epochs=[3, 5, 10] - ) - - # Train with 5-fold cross-validation - P.train( - 'ER_status', - params=hp, - val_k_fold=5, - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) - - # Train across the entire training dataset - P.train( - 'ER_status', - params=hp, - val_strategy='none', - filters={'dataset': ['train'], - 'er_status_by_ihc': ['Positive', 'Negative']} - ) - -To execute these functions, use the ``run_project.py`` script, passing the project directory with the ``-p`` flag. + import slideflow as sf + + # Create a new project + P = sf.create_project( + root='/home/er_project', + slides='/path/to/slides' + ) + + # Extract tiles at 256 pixels, 0.5 um/px + P.extract_tiles(tile_px=256, tile_um=128) + + hp = ModelParams( + tile_px=256, + tile_um=128, + model='xception', + batch_size=32, + epochs=[3, 5, 10] + ) + + # Train with 5-fold cross-validation + P.train( + 'er_status_by_ihc', + params=hp, + val_k_fold=5, + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) + + # Train across the entire training dataset + P.train( + 'er_status_by_ihc', + params=hp, + val_strategy='none', + filters={'dataset': ['train'], + 'er_status_by_ihc': ['Positive', 'Negative']} + ) -.. code-block:: bash - - $ python3 run_project.py -p /home/er_project The final training results should should show an average AUROC of around 0.87, with average AP around 0.83. Tile, slide, and patient-level receiver operator curves are saved in the model folder, along with precision-recall curves (not shown): @@ -239,20 +202,3 @@ Tensorboard-formatted training and validation logs are saved the model directory $ tensorboard --logdir=/project_path/models/00001-outcome-HP0 Tensorboard can then be accessed by navigating to ``https://localhost:6006`` in a browser. - -Monitoring with Neptune -*********************** - -Experiments can be automatically logged with `Neptune.ai `_. To enable logging, first locate your Neptune API token and workspace ID, and configure the environmental variables ``NEPTUNE_API_TOKEN`` and ``NEPTUNE_WORKSPACE``. - -With the environmental variables set, Neptune logs are enabled either by passing a ``-n`` flag to the ``run_project.py`` script: - -.. code-block:: bash - - $ python3 run_project.py -n -p /project_path/ - -or by passing ``use_neptune=True`` to the ``slideflow.Project`` class: - -.. code-block:: python - - P = sf.Project('/project/path', use_neptune=True) \ No newline at end of file diff --git a/docs/_sources/tutorial2.rst.txt b/docs/_sources/tutorial2.rst.txt index 799087b97..17d6b051b 100644 --- a/docs/_sources/tutorial2.rst.txt +++ b/docs/_sources/tutorial2.rst.txt @@ -1,3 +1,5 @@ +.. _tutorial2: + Tutorial 2: Model training (advanced) ======================================= @@ -76,12 +78,12 @@ We can use the dataset to get our ER status labels. The :meth:`slideflow.Dataset We can see the slideflow logs showing us that 234 slides with the outcome label "Negative" were assigned to the numerical outcome "0", and 842 "Positive" slides were assigned "1". -Next, we'll need to split this dataset into a training and validation set. We'll start by training on the first of 3 k-folds for cross-validated training. To split a dataset, use the :meth:`slideflow.Dataset.train_val_split` method. We'll need to provide our labels to ensure that the outcome categories are balanced in the training and validation sets. +Next, we'll need to split this dataset into a training and validation set. We'll start by training on the first of 3 k-folds for cross-validated training. To split a dataset, use the :meth:`slideflow.Dataset.split` method. We'll need to provide our labels to ensure that the outcome categories are balanced in the training and validation sets. .. code-block:: python - >>> train_dts, val_dts = dataset.train_val_split( - ... model_type='categorical', + >>> train_dts, val_dts = dataset.split( + ... model_type='classification', ... labels=labels, ... val_strategy='k-fold', ... val_k_fold=3, @@ -107,12 +109,11 @@ At this point, we can also add categorical balancing to our dataset (see :ref:`b Training ******** -Now that our dataset is prepared, we can begin setting up our model and trainer. Our model training parameters are configured with :class:`slideflow.model.ModelParams`. +Now that our dataset is prepared, we can begin setting up our model and trainer. Our model training parameters are configured with :class:`slideflow.ModelParams`. .. code-block:: python - >>> from slideflow.model import ModelParams, Trainer - >>> hp = ModelParams( + >>> hp = sf.ModelParams( ... tile_px=256, ... tile_um=128, ... model='xception', @@ -124,14 +125,13 @@ In addition to the above model parameters, our trainer will need the outcome lab .. code-block:: python - >>> trainer = Trainer( + >>> trainer = sf.model.build_trainer( ... hp=hp, ... outdir='/some/directory', ... labels=labels, - ... patients=dataset.patients() ... ) -Finally, we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assinging the output to a new variable ``results`` +Now we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assigning the output to a new variable ``results`` .. code-block:: python @@ -176,4 +176,4 @@ You'll see logs recording model structure, training progress across epochs, and } } -Training results are separated with nested dictionaries according to epoch. The raw training metrics and validation metrics are stored with the keys ``"train_metrics"`` and ``"val_metrics"``, and tile-, slide-, and patient-level metrics (AUC for categorical data, R-squared for linear outcomes, and concordance index for CPH models) is reported under the ``"tile"``, ``"slide"``, and ``"patient"`` keys for each outcome, respectively. \ No newline at end of file +Training results are separated with nested dictionaries according to epoch. The raw training metrics and validation metrics are stored with the keys ``"train_metrics"`` and ``"val_metrics"``, and tile-, slide-, and patient-level metrics (AUROC for classification, R-squared for regression outcomes, and concordance index for survival models) is reported under the ``"tile"``, ``"slide"``, and ``"patient"`` keys for each outcome, respectively. \ No newline at end of file diff --git a/docs/_sources/tutorial3.rst.txt b/docs/_sources/tutorial3.rst.txt index 0b286126b..c6ecda42f 100644 --- a/docs/_sources/tutorial3.rst.txt +++ b/docs/_sources/tutorial3.rst.txt @@ -1,3 +1,5 @@ +.. _tutorial3: + Tutorial 3: Using a custom architecture ======================================= @@ -141,8 +143,8 @@ First, define the model in a file that can be imported. In this example, we will dim, depth, heads, mlp_dim): super().__init__() if not image_size % patch_size == 0: - msg = 'image dimensions must be divisible by the patch size' - raise ValueError(msg) + raise ValueError('image dimensions must be divisible by the ' + 'patch size') num_patches = (image_size // patch_size) ** 2 self.patch_size = patch_size self.dim = dim diff --git a/docs/_sources/tutorial4.rst.txt b/docs/_sources/tutorial4.rst.txt index 65bb3912a..316d2e5fa 100644 --- a/docs/_sources/tutorial4.rst.txt +++ b/docs/_sources/tutorial4.rst.txt @@ -84,7 +84,7 @@ If the referenced model was trained with digital stain normalization, this will The ``resolution`` parameter indicates the stride at which tiles should be extracted from slides to generate predictions. ``"low"`` resolution yields predictions on non-overlapping slides (stride_div=1). ``"medium"`` resolutions uses tiles with 50% overlap (stide_div=2), and ``"high"`` resolution uses tiles with 75% overlap (stride_div=4). -Heatmaps are colored and scaled in a manner optimized for categorical outcomes, with the colorscale 0 (blue) -> 0.5 (white) -> 1.0 (red). To change this colorscaling (particularly important for linear outcomes), set ``vmin``, ``vcenter``, and ``vmax`` accordingly. +Heatmaps are colored and scaled in a manner optimized for categorical outcomes, with the colorscale 0 (blue) -> 0.5 (white) -> 1.0 (red). To change this colorscaling (particularly important for regression outcomes), set ``vmin``, ``vcenter``, and ``vmax`` accordingly. Heatmaps are displayed without any color interpolation by default. To generate a smoothed heatmap, interpolate colors with any strategy supported by matplotlib (including, for example, "bicubic", "nearest", "bilnear", and many more) with the argument ``interpolation``. diff --git a/docs/_sources/tutorial5.rst.txt b/docs/_sources/tutorial5.rst.txt index 6e9e8a1fc..719338c4f 100644 --- a/docs/_sources/tutorial5.rst.txt +++ b/docs/_sources/tutorial5.rst.txt @@ -115,7 +115,7 @@ Layer activations calculated on very large datasets may result in high memory us max_tiles=100 ) -This function will return an instance of :class:`slideflow.model.DatasetFeatures`, which contains tile-level predictions (in ``DatasetFeatures.logits``), tile X,Y locations from their respective slides (in ``DatasetFeatures.locations``), layer activations (in ``DatasetFeatures.activations``), and uncertainty (if applicable, in ``DatasetFeatures.uncertainty``). +This function will return an instance of :class:`slideflow.DatasetFeatures`, which contains tile-level predictions (in ``DatasetFeatures.predictions``), tile X,Y locations from their respective slides (in ``DatasetFeatures.locations``), layer activations (in ``DatasetFeatures.activations``), and uncertainty (if applicable, in ``DatasetFeatures.uncertainty``). Create the mosaic map @@ -147,14 +147,14 @@ Save corresponding UMAPs Now that we have the mosaic generated, we need to create corresponding labeled UMAP plots to aid in interpretability. UMAP plots are stored in :class:`slideflow.SlideMap` objects. A mosaic's underlying ``SlideMap`` can be accessed via ``mosaic.slide_map``. -The :class:`slideflow.SlideMap` class provides several functions useful for labeling. To start, we will label the umap according to the raw logits for each tile image. As this is a binary categorical outcome, there will be two logits. We will label the UMAP according to the second logit (id=1), and then save the image to disc. +The :class:`slideflow.SlideMap` class provides several functions useful for labeling. To start, we will label the umap according to the raw predictions for each tile image. As this is a binary categorical outcome, there will be two post-softmax predictions. We will label the UMAP according to the second logit (id=1), and then save the image to disc. .. code-block:: python - # Label by raw logits + # Label by raw predictions umap = mosaic.slide_map - umap.label_by_logits(1) - umap.save('umap_logits.png') + umap.label_by_preds(1) + umap.save('umap_preds.png') .. image:: https://i.imgur.com/FT7nH90.png @@ -162,7 +162,7 @@ Next, we will discretize the predictions, showing the final prediction as a cate .. code-block:: python - # Label by raw logits + # Label by raw preds umap.label_by_meta('prediction') umap.save('umap_predictions.png') @@ -175,7 +175,7 @@ For reference, let's see the ground truth categorical labels. For this, we will # Get slide labels labels, unique = P.dataset().labels('cohort') - # Label by raw logits + # Label with slide labels umap.label_by_slide(labels) umap.save('umap_labels.png') @@ -185,7 +185,7 @@ Finally, if we are a using a model that was trained with uncertainty quantificat .. code-block:: python - # Label by raw logits + # Label by uncertainty umap.label_by_uncertainty() umap.save('umap_uncertainty.png') @@ -195,7 +195,6 @@ In all cases, the UMAP plots can be customized by passing keyword arguments acce .. code-block:: python - # Label by raw logits umap.save( 'umap_uncertainty.png', # Save path title='Uncertainty', # Title for plot diff --git a/docs/_sources/tutorial6.rst.txt b/docs/_sources/tutorial6.rst.txt new file mode 100644 index 000000000..6a077503c --- /dev/null +++ b/docs/_sources/tutorial6.rst.txt @@ -0,0 +1,50 @@ +.. currentmodule:: slideflow.slide + +.. _tutorial6: + +Tutorial 6: Custom slide filtering +================================== + +In this brief tutorial, we'll take a look at how you can implement and preview bespoke slide-level filtering methods. + +The slide-level filtering (QC) methods Slideflow currently supports include Otsu's thresholding and Gaussian blur filtering, which can be applied to a :class:`WSI` object with :meth:`WSI.qc`. If you have a custom filtering algorithm you would like to apply to a slide, you can now use :meth:`WSI.apply_qc_mask()` to apply a boolean mask to filter a slide. + +For the purposes of this tutorial, we will generate a boolean mask using the already-available Otsu's thresholding algorithm, but you can replace this with whatever masking algorithm you like. + +First, we'll load a slide: + +.. code-block:: python + + import numpy as np + import slideflow as sf + + wsi = sf.WSI('slide.svs', tile_px=299, tile_um=302) + +Next, we'll apply Otsu's thresholding to get the boolean mask we'll use in subsequent steps, then remove the QC once we have the mask: + +.. code-block:: python + + wsi.qc('otsu') + qc_mask = np.copy(wsi.qc_mask) + wsi.remove_qc() + +Our mask should have two dimensions (y, x) and have a dtype of bool: + +.. code-block:: bash + + >>> qc_mask.shape + (1010, 2847) + >>> qc_mask.dtype + dtype('bool') + +Our :class:`WSI` object now has no QC applied. We can manually apply this boolean mask with :meth:`WSI.apply_qc_mask()`: + +.. code-block:: python + + wsi.apply_qc_mask(qc_mask) + +And that's it! We can preview how our mask affects tile filtering by using :meth:`WSI.preview()`: + +.. code-block:: python + + wsi.preview().show() diff --git a/docs/_sources/tutorial7.rst.txt b/docs/_sources/tutorial7.rst.txt new file mode 100644 index 000000000..8ac1697f4 --- /dev/null +++ b/docs/_sources/tutorial7.rst.txt @@ -0,0 +1,89 @@ +.. _tutorial7: + +Tutorial 7: Training with custom augmentations +============================================== + +In this tutorial, we'll take a look at how you can use custom image augmentations when training a model with Slideflow. This tutorial builds off of :ref:`tutorial2`, so if you haven't already, you should read that tutorial first. + +Our goal will be to train a model on a sparse outcome, such as ER status (roughly 4:1 positive:negative), with a custom augmentation that will oversample the minority class. This tutorial will use PyTorch, but the same principles apply when using Tensorflow. + +.. code-block:: python + + >>> import os + >>> os.environ['SF_BACKEND'] = 'torch' + +First, we'll start by loading a project and preparing our datasets, just like in :ref:`tutorial2`: + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/home/er_project') + >>> full_dataset = P.dataset( + ... tile_px=256, + ... tile_um=128, + ... filters={ + ... 'er_status_by_ihc': ['Positive', 'Negative'] + ... }) + >>> labels, _ = full_dataset.labels('er_status_by_ihc') + >>> train, val = full_dataset.split( + ... labels='er_status_by_ihc', + ... val_strategy='k-fold', + ... val_k_fold=3, + ... k_fold_iter=1 + ... ) + +If tiles have not yet been extracted from slides, do that now. + +.. code-block:: python + + >>> dataset.extract_tiles(qc='otsu') + +By default, Slideflow will equally sample from all slides / TFRecords during training, resulting in oversampling of slides with fewer tiles. In this case, we want to oversample the minority class (ER negative), so we'll use category-level balancing. Sampling strategies are discussed in detail in the :ref:`Developer Notes `. + +.. code-block:: python + + >>> train = train.balance('er_status_by_ihc', strategy='category') + +Next, we'll set up our model hyperparameters, using the same parameters as in :ref:`tutorial2`. We still want to use Slideflow's default augmentation (random flip/rotation and JPEG compression), so we'll use the hyperparameter ``augment=True``. Our custom augmentation will be applied after the default augmentation. + +.. code-block:: python + + >>> hp = sf.ModelParams( + ... tile_px=256, + ... tile_um=128, + ... model='xception', + ... batch_size=32, + ... epochs=[3], + ... augment=True + ... ) + +Now, we'll define our custom augmentation. Augmentations are functions that take a single Tensor (:class:`tf.Tensor` or :class:`torch.Tensor`) as input and return a single Tensor as output. Our training augmentation will include a random color jitter, random gaussian blur, and random auto-contrast. + +.. code-block:: python + + >>> import torch + >>> from torchvision import transforms + >>> augment = transforms.Compose([ + ... transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5), + ... transforms.RandomAutocontrast(), + ... transforms.GaussianBlur(3) + ... ]) + +Transformations can be applied to training or validation data by passing a dictionary - with the keys 'train' and/or 'val' - to the ``transform`` argument of :class:`slideflow.Trainer`. If a transformation should be applied to both training and validation, it can be passed directly to the ``transform`` argument. In this case, we'll apply our custom augmentation to the training dataset only. + +.. code-block:: python + + >>> trainer = sf.model.build_trainer( + ... hp=hp, + ... outdir='/some/directory', + ... labels=labels, + ... transform={'train': augment}, + ... ) + +Now we can start training. Pass the training and validation datasets to the :meth:`slideflow.model.Trainer.train` method of our trainer, assigning the output to a new variable ``results``. + +.. code-block:: python + + >>> results = trainer.train(train, val) + +And that's it! You've trained a model with a custom augmentation. You can now use the model to make predictions on new data, or use the model to make predictions on the validation dataset. \ No newline at end of file diff --git a/docs/_sources/tutorial8.rst.txt b/docs/_sources/tutorial8.rst.txt new file mode 100644 index 000000000..b5f51d329 --- /dev/null +++ b/docs/_sources/tutorial8.rst.txt @@ -0,0 +1,165 @@ +.. _tutorial8: + +Tutorial 8: Multiple-Instance Learning +====================================== + +In contrast with tutorials 1-4, which focused on training and evaluating traditional tile-based models, this tutorial provides an example of training a multiple-instance learning (MIL) model. MIL models are particularly useful for heterogeneous tumors, when only parts of a whole-slide image may carry a distinctive histological signature. In this tutorial, we'll train a MIL model to predict the ER status of breast cancer patients from whole slide images. Note: MIL models require PyTorch. + +We'll start the same way as :ref:`tutorial1`, loading a project and preparing a dataset. + +.. code-block:: python + + >>> import slideflow as sf + >>> P = sf.load_project('/home/er_project') + >>> dataset = P.dataset( + ... tile_px=256, + ... tile_um=128, + ... filters={ + ... 'er_status_by_ihc': ['Positive', 'Negative'] + ... }) + +If tiles have not yet been :ref:`extracted ` for this dataset, do that now. + +.. code-block:: python + + >>> dataset.extract_tiles(qc='otsu') + +Once a dataset has been prepared, the next step in training an MIL model is :ref:`converting images into features `. For this example, we'll use the pretrained `Virchow `_ feature extractor, a vision transformer pretrained on 1.5M whole-slide images. Virchow has an input size of 224x224, so our images will be resized to match. + +.. code-block:: python + + >>> virchow = sf.build_feature_extractor('virchow', center_crop=True) + >>> virchow.cite() + @misc{vorontsov2024virchowmillionslidedigitalpathology, + title={Virchow: A Million-Slide Digital Pathology Foundation Model}, + author={Eugene Vorontsov and Alican Bozkurt and Adam Casson and George Shaikovski and Michal Zelechowski and Siqi Liu and Kristen Severson and Eric Zimmermann and James Hall and Neil Tenenholtz and Nicolo Fusi and Philippe Mathieu and Alexander van Eck and Donghun Lee and Julian Viret and Eric Robert and Yi Kan Wang and Jeremy D. Kunz and Matthew C. H. Lee and Jan Bernhard and Ran A. Godrich and Gerard Oakley and Ewan Millar and Matthew Hanna and Juan Retamero and William A. Moye and Razik Yousfi and Christopher Kanan and David Klimstra and Brandon Rothrock and Thomas J. Fuchs}, + year={2024}, + eprint={2309.07778}, + archivePrefix={arXiv}, + primaryClass={eess.IV}, + url={https://arxiv.org/abs/2309.07778}, + } + >>> virchow.num_features + 2560 + +The Virchow feature extractor produces a 2560-dimensional vector for each tile. We can generate and export :ref:`bags ` of these features for all slides in our dataset using :func:`slideflow.Project.generate_feature_bags`. + +.. code-block:: python + + >>> P.generate_feature_bags( + ... virchow, + ... dataset, + ... outdir='/bags/path' + ... ) + +The output directory, ``/bags/path``, should look like: + +.. code-block:: bash + + /bags/path + ├── slide1.pt + ├── slide1.indez.npz + ├── slide2.pt + ├── slide2.index.npz + ├── ... + └── bags_config.json + +The ``*.pt`` files contain the feature vectors for tiles in each slide, and the ``*.index.npz`` files contain the corresponding X, Y coordinates for each tile. The ``bags_config.json`` file contains the feature extractor configuration. + +The next step is to create an MIL model configuration using :func:`slideflow.mil.mil_config`, specifying the architecture and relevant hyperparameters. For the architecture, we'll use :class:`slideflow.mil.models.Attention_MIL`. For the hyperparameters, we'll use a learning rate of 1e-4, a batch size of 32, 1cycle learning rate scheduling, and train for 10 epochs. + +.. code-block:: python + + >>> from slideflow.mil import mil_config + >>> config = mil_config( + ... model='attention_mil', + ... lr=1e-4, + ... batch_size=32, + ... epochs=10, + ... fit_one_cycle=True + ... ) + +Finally, we can train the model using :func:`slideflow.mil.train_mil`. We'll split our dataset into 70% training and 30% validation, training to the outcome "er_status_by_ihc" and saving the model to ``/model/path``. + +.. code-block:: python + + >>> from slideflow.mil import train_mil + >>> train, val = dataset.split(labels='er_status_by_ihc', val_fraction=0.3) + >>> train_mil( + ... config, + ... train_dataset=train, + ... val_dataset=val, + ... outcomes='er_status_by_ihc', + ... bags='/bags/path', + ... outdir='/model/path' + ... ) + +During training, you'll see the training/validation loss and validation AUROC for each epoch. At the end of training, you'll see the validation metrics for each outcome. + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + [18:51:01] INFO Training FastAI MIL model with config: + INFO TrainerConfigFastAI( + aggregation_level='slide' + lr=0.0001 + wd=1e-05 + bag_size=512 + fit_one_cycle=True + epochs=10 + batch_size=32 + model='attention_mil' + apply_softmax=True + model_kwargs=None + use_lens=True + ) + [18:51:02] INFO Training dataset: 272 merged bags (from 272 possible slides) + INFO Validation dataset: 116 merged bags (from 116 possible slides) + [18:51:04] INFO Training model Attention_MIL (in=1024, out=2, loss=CrossEntropyLoss) + epoch train_loss valid_loss roc_auc_score time + 0 0.328032 0.285096 0.580233 00:01 + Better model found at epoch 0 with valid_loss value: 0.2850962281227112. + 1 0.319219 0.266496 0.733721 00:01 + Better model found at epoch 1 with valid_loss value: 0.266496479511261. + 2 0.293969 0.230561 0.859690 00:01 + Better model found at epoch 2 with valid_loss value: 0.23056122660636902. + 3 0.266627 0.190546 0.927519 00:01 + Better model found at epoch 3 with valid_loss value: 0.1905461698770523. + 4 0.236985 0.165320 0.939147 00:01 + Better model found at epoch 4 with valid_loss value: 0.16532012820243835. + 5 0.215019 0.153572 0.946512 00:01 + Better model found at epoch 5 with valid_loss value: 0.153572216629982. + 6 0.199093 0.144464 0.948837 00:01 + Better model found at epoch 6 with valid_loss value: 0.1444639265537262. + 7 0.185597 0.141776 0.952326 00:01 + Better model found at epoch 7 with valid_loss value: 0.14177580177783966. + 8 0.173794 0.141409 0.951938 00:01 + Better model found at epoch 8 with valid_loss value: 0.14140936732292175. + 9 0.167547 0.140791 0.952713 00:01 + Better model found at epoch 9 with valid_loss value: 0.14079126715660095. + [18:51:18] INFO Predictions saved to {...}/predictions.parquet + INFO Validation metrics for outcome brs_class: + [18:51:18] INFO slide-level AUC (cat # 0): 0.953 AP: 0.984 (opt. threshold: 0.544) + INFO slide-level AUC (cat # 1): 0.953 AP: 0.874 (opt. threshold: 0.458) + INFO Category 0 acc: 88.4% (76/86) + INFO Category 1 acc: 83.3% (25/30) + +After training has completed, the output directory, ``/model/path``, should look like: + +.. code-block:: bash + + /model/path + ├── attention + │ ├── slide1_att.npz + │ └── ... + ├── models + │ └── best_valid.pth + ├── history.csv + ├── mil_params.json + ├── predictions.parquet + └── slide_manifest.csv + +The final model weights are saved in ``models/best_valid.pth``. Validation dataset predictions are saved in the "predictions.parquet" file. A manifest of training/validation data is saved in the "slide_manifest.csv" file, and training history is saved in the "history.csv" file. Attention values for all tiles in each slide are saved in the ``attention/`` directory. + +The final saved model can be used for evaluation (:class:`slideflow.mil.eval_mil`) or inference (:class:`slideflow.mil.predict_slide` or :ref:`Slideflow Studio `). The saved model path should be referenced by the parent directory (in this case, "/model/path") rather than the model file itself. For more information on MIL models, see :ref:`mil`. \ No newline at end of file diff --git a/docs/_sources/uq.rst.txt b/docs/_sources/uq.rst.txt index 2d2fad31d..fa334d7a7 100644 --- a/docs/_sources/uq.rst.txt +++ b/docs/_sources/uq.rst.txt @@ -1,9 +1,11 @@ -Uncertainty quantification +.. _uncertainty: + +Uncertainty Quantification ========================== Several uncertainty quantification (UQ) methods have been developed for deep learning models and tested in digital histopathology, including MC Dropout, deep ensembles, hyper-deep ensembles, and test-time augmentation. -In verison 1.1, we implemented a dropout-based method of uncertainty estimation (`arXiv paper `_). MC dropout UQ methods exploit the observation that neural networks with dropout approximate sampling of the Bayesian posterior. Images undergo multiple forward passes in a dropout-enabled network during inference, which results in a distribution of predictions. The standard deviation of such a distribution represents the uncertainty estimate. +Slideflow includes a dropout-based method of uncertainty estimation. MC dropout UQ methods exploit the observation that neural networks with dropout approximate sampling of the Bayesian posterior. Images undergo multiple forward passes in a dropout-enabled network during inference, which results in a distribution of predictions. The standard deviation of such a distribution represents the uncertainty estimate. Training with UQ **************** @@ -34,7 +36,134 @@ Uncertainty heatmaps If a model was trained with UQ enabled, the :meth:`slideflow.Project.generate_heatmaps()` function will automatically create uncertainty heatmaps alongside the prediction heatmaps. -Slide-level confidence & uncertainty thresholding -************************************************* +Uncertainty thresholding +************************ + +Uncertainty information can be exploited to separate slide- and patient-level predictions into low- and high-confidence. We developed an uncertainty thresholding algorithm (`BISCUIT `_) to accomplish this task, which is available in :mod:`slideflow.biscuit`. Algorithmic details and validation studies can be found in our `manuscript `_ detailing the method. + +Here, we will run through an example of how to apply this UQ thresholding strategy for a weakly-supervised classification model. At present, ``biscuit`` only supports uncertainty estimation and confidence thresholding for binary classification. + +Prepare an Experiment +--------------------- + +Start by creating a Slideflow project and then initializing a ``biscuit`` experiment, including the outcome target and the two classes. We will be training models to predict ``"HPV_status"``, with the two classes ``"positive"`` and ``"negative"``. + +.. code-block:: python + + import slideflow as sf + from slideflow import biscuit + + # Create a Slideflow project + P = sf.Project(...) + + # Initialize a biscuit experiment + experiment = biscuit.Experiment( + train_project=P, + outcome='HPV_status', + outcome1='negative', + outcome2='positive' + ) + +Next, prepare the model hyperparameters. Here, we will use the hyperparameters used in the original manuscript. + +.. code-block:: python + + hp = biscuit.hp.nature2022() + +Train with cross-validation +--------------------------- + +We'll start by training models in cross-validation on the full dataset. We'll use the default three-fold cross-validation strategy. We need to supply a label for experiment model tracking, which will be used for the rest of our experiments. + +.. code-block:: python + + # Train outer cross-validation models. + experiment.train(hp=hp, label='HPV') + +Models will be saved in the project model folder. + +Train inner cross-validation +---------------------------- + +Next, for each of the three cross-validation models trained, we will perform 5-fold nested cross-validation. Uncertainty thresholds are determined from nested cross-validation results. + +.. code-block:: python + + # Train inner, nested cross-validation models. + experiment.train_nested_cv(hp=hp, label='HPV') + +Models will again be saved in the project model directory. We can view a summary of the results from these cross-validation studies using the :func:`biscuit.find_cv()` and :func:`biscuit.get_model_results()` functions. + +.. code-block:: python + + from slideflow.biscuit import find_cv, get_model_results + + # Print results from outer cross-validation + cv_models = find_cv( + project=P, + label='HPV', + outcome='HPV_status' + ) + for m in cv_models: + results = get_model_results(m, outcome='HPV_status', epoch=1) + print(m, results['pt_auc']) + +Uncertainty thresholds are calculated using results from the inner cross-validation studies. :func:`biscuit.Experiment.thresholds_from_nested_cv` will calculate and return uncertainty and prediction thresholds. + +.. code-block:: python + + # Calculate uncertainty thresholds + df, thresh = experiment.thresholds_from_nested_cv(label='HPV') + print(thresh) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none + + {'tile_uq': 0.02726791, + 'slide_uq': 0.0147878695, + 'tile_pred': 0.41621968, + 'slide_pred': 0.4756707} + + +Apply thresholds to test set +---------------------------- + +Finally, we can apply these thresholds to a held out test set. First, generate predictions for a held-out test set as described in :ref:`evaluation`. Locate the parquet file containing the saved tile-level predictions and load it into a DataFrame. Rename the columns in the dataframe so that ground-truth is ``y_true``, predictions are ``y_pred``, and uncertainty is ``uncertainty``. + +.. code-block:: python + + import pandas as pd + + # Load tile-level predictions from a test set evaluation + df = pd.read_parquet('/path/to/tile_predictions.parquet.gzip') + + # Rename the columns to y_true, y_pred, and uncertainty + df.rename(columns={ + 'HPV_status-y_true': 'y_true, + 'HPV_status-y_pred1': 'y_pred', + 'HPV_status-uncertainty1': 'uncertainty' + ' + }) + +Use :func:`biscuit.threshold.apply` to apply the previously-determined thresholds to these predictions. This will return classifier metrics (AUROC, accuracy, sensitivity, specificity) for high-confidence predictions and a dataframe of slide-level high-confidence predictions. Slides with low-confidence predictions will be omitted. The percentage of slides with high-confidence predictions will be reported as ``'percent_incl'``. + +.. code-block:: python + + # Calculate high-confidence slide-level predictions + metrics, high_conf_df = biscuit.threshold.apply( + df, # Dataframe of tile-level predictions + **thresh, # Uncertainty thresholds + level='slide' # We want slide-level predictions + ) + print(metrics) + +.. rst-class:: sphx-glr-script-out + + .. code-block:: none -Uncertainty information can be exploited to separate slide- and patient-level predictions into low- and high-confidence. We developed an uncertainty thresholding algorithm (`BISCUIT `_) to accomplish this task. Further details about slide-level confidence estimation and uncertainty thresholding can be found in our manuscript `detailing the method `_. \ No newline at end of file + {'auc': 0.9703296703296704, + 'percent_incl': 0.907051282051282, + 'acc': 0.9222614840989399, + 'sensitivity': 0.9230769230769231, + 'specificity': 0.9214285714285714} \ No newline at end of file diff --git a/docs/_sources/validation.rst.txt b/docs/_sources/validation.rst.txt deleted file mode 100644 index 4c62b509c..000000000 --- a/docs/_sources/validation.rst.txt +++ /dev/null @@ -1,49 +0,0 @@ -.. _validation_planning: - -Validation Planning -=================== - -An important first step in creating a new project is to determine the validation plan. Three groups of data are required: - -1) **Training data** - data used for learning during training -2) **Validation data** - data used for testing during training, and early stopping (if applicable) -3) **Evaluation data** - data used for final evaluation once training has completed. Preferably an external cohort. - -Validation data is used to assess model performance and generalizability during training. Once the model and parameters have been tuned with training/validation, the final model's performance is assessed on the held-out evaluation set. - -Configuring a validation plan -***************************** - -There are several ways you can plan to validate your data. The validation settings available include: - -- **strategy**: *'bootstrap'*, *'k-fold'*, *k-fold-manual'*, *'k-fold-preserved-site'*, *'fixed'*, *'none'* -- **fraction**: (float between 0-1) [not used for k-fold validation] -- **k_fold**: int - -The default strategy is 'k-fold', with k=3. - -Validation strategy -^^^^^^^^^^^^^^^^^^^ - -The ``strategy`` option determines how the validation data is selected. - -If **fixed**, a certain percentage of your training data is set aside for testing (determined by ``fraction``). The chosen validation subset is saved to a log file and will be re-used for all training iterations. - -If **bootstrap**, validation data will be selected at random (percentage determined by ``fraction``), and all training iterations will be repeated a number of times equal to ``k_fold``. The saved and reported model training metrics will be an average of all bootstrap iterations. - -If **k-fold**, training data will be automatically separated into *k* number of groups (where *k* is equal to ``k_fold``), and all training iterations will be repeated *k* number of times using k-fold cross validation. The saved and reported model training metrics will be an average of all k-fold iterations. - -If you would like to manually separate your data into k-folds, you may do so with the **k-fold-manual** strategy. Assign each slide to a k-fold cohort in the annotations file, and designate the appropriate column header with ``k_fold_header`` - -The **k-fold-preserved-site** strategy is a cross-validation strategy that ensures site is preserved across the training/validation sets, in order to reduce bias from batch effect as described by `Howard, et al `_. This strategy is recommended when using data from The Cancer Genome Atlas (`TCGA `_). - -.. note:: - Preserved-site cross-validation requires `CPLEX `_. The original implementation of the preserved-site cross-validation algorithm described by Howard et al can be found `on GitHub `_. - -If **none**, no validation testing will be performed. - -Selecting an evaluation cohort -****************************** - -Designating an evaluation cohort is done using the project annotations file, with a column indicating whether a slide is set aside for evaluation. -The training and evaluation functions include a ``filter`` argument which will allow you to restrict your training or evaluation according to these annotations. This will be discussed in greater detail in subsequent sections. diff --git a/docs/_static/basic.css b/docs/_static/basic.css index bf18350b6..7577acb1a 100644 --- a/docs/_static/basic.css +++ b/docs/_static/basic.css @@ -4,7 +4,7 @@ * * Sphinx stylesheet -- basic theme. * - * :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS. + * :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS. * :license: BSD, see LICENSE for details. * */ @@ -222,7 +222,7 @@ table.modindextable td { /* -- general body styles --------------------------------------------------- */ div.body { - min-width: 450px; + min-width: 360px; max-width: 800px; } @@ -237,16 +237,6 @@ a.headerlink { visibility: hidden; } -a.brackets:before, -span.brackets > a:before{ - content: "["; -} - -a.brackets:after, -span.brackets > a:after { - content: "]"; -} - h1:hover > a.headerlink, h2:hover > a.headerlink, h3:hover > a.headerlink, @@ -335,12 +325,16 @@ p.sidebar-title { font-weight: bold; } +nav.contents, +aside.topic, div.admonition, div.topic, blockquote { clear: left; } /* -- topics ---------------------------------------------------------------- */ +nav.contents, +aside.topic, div.topic { border: 1px solid #ccc; padding: 7px; @@ -379,6 +373,8 @@ div.body p.centered { div.sidebar > :last-child, aside.sidebar > :last-child, +nav.contents > :last-child, +aside.topic > :last-child, div.topic > :last-child, div.admonition > :last-child { margin-bottom: 0; @@ -386,6 +382,8 @@ div.admonition > :last-child { div.sidebar::after, aside.sidebar::after, +nav.contents::after, +aside.topic::after, div.topic::after, div.admonition::after, blockquote::after { @@ -428,10 +426,6 @@ table.docutils td, table.docutils th { border-bottom: 1px solid #aaa; } -table.footnote td, table.footnote th { - border: 0 !important; -} - th { text-align: left; padding-right: 5px; @@ -615,19 +609,26 @@ ul.simple p { margin-bottom: 0; } -dl.footnote > dt, -dl.citation > dt { +aside.footnote > span, +div.citation > span { float: left; - margin-right: 0.5em; } - -dl.footnote > dd, -dl.citation > dd { +aside.footnote > span:last-of-type, +div.citation > span:last-of-type { + padding-right: 0.5em; +} +aside.footnote > p { + margin-left: 2em; +} +div.citation > p { + margin-left: 4em; +} +aside.footnote > p:last-of-type, +div.citation > p:last-of-type { margin-bottom: 0em; } - -dl.footnote > dd:after, -dl.citation > dd:after { +aside.footnote > p:last-of-type:after, +div.citation > p:last-of-type:after { content: ""; clear: both; } @@ -644,10 +645,6 @@ dl.field-list > dt { padding-right: 5px; } -dl.field-list > dt:after { - content: ":"; -} - dl.field-list > dd { padding-left: 0.5em; margin-top: 0em; diff --git a/docs/_static/css/theme.css b/docs/_static/css/theme.css index 46a5dabed..bb40ec619 100644 --- a/docs/_static/css/theme.css +++ b/docs/_static/css/theme.css @@ -6321,6 +6321,9 @@ button.bg-dark:focus { height: 100%; border: 0; } +video { + width: 100%; +} .embed-responsive-21by9::before { padding-top: 42.8571428571%; @@ -9450,52 +9453,52 @@ a.text-dark:hover, a.text-dark:focus { } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 700; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-bold.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-bold.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Bold.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Bold.ttf") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 700; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-bold-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-bold-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-BoldItalic.ttf") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-BoldItalic.ttf") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 500; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-medium.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-medium.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Regular.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Regular.woff2") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 500; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-medium-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-medium-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Italic.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Italic.woff2") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 100; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-light.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-light.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Thin.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Thin.woff2") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 100; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-light-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-light-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 400; font-style: italic; - src: url("../fonts/FreightSans/freight-sans-book-italic.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-book-italic.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff") format("woff2"); } @font-face { - font-family: FreightSans; + font-family: IBMPlexSans; font-weight: 400; font-style: normal; - src: url("../fonts/FreightSans/freight-sans-book.woff2") format("woff2"), url("../fonts/FreightSans/freight-sans-book.woff") format("woff"); + src: url("../fonts/IBMPlexSans/IBMPlexSans-Light.woff2") format("woff2"), url("../fonts/IBMPlexSans/IBMPlexSans-Light.woff2") format("woff2"); } @font-face { font-family: IBMPlexMono; @@ -9542,7 +9545,7 @@ html { } body { - font-family: FreightSans, Helvetica Neue, Helvetica, Arial, sans-serif; + font-family: IBMPlexSans, Helvetica Neue, Helvetica, Arial, sans-serif; } a:link, @@ -10995,7 +10998,7 @@ article.pytorch-article .admonition > p:last-of-type { color: #262626; } .pytorch-article div.sphx-glr-download a code, .pytorch-article div.sphx-glr-download a kbd, .pytorch-article div.sphx-glr-download a pre, .pytorch-article div.sphx-glr-download a samp, .pytorch-article div.sphx-glr-download a span.pre { - font-family: FreightSans, Helvetica Neue, Helvetica, Arial, sans-serif; + font-family: IBMPlexSans, Helvetica Neue, Helvetica, Arial, sans-serif; } .pytorch-article p.sphx-glr-script-out { diff --git a/docs/_static/doctools.js b/docs/_static/doctools.js index e1bfd708b..d06a71d75 100644 --- a/docs/_static/doctools.js +++ b/docs/_static/doctools.js @@ -2,357 +2,155 @@ * doctools.js * ~~~~~~~~~~~ * - * Sphinx JavaScript utilities for all documentation. + * Base JavaScript utilities for all Sphinx HTML documentation. * - * :copyright: Copyright 2007-2022 by the Sphinx team, see AUTHORS. + * :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS. * :license: BSD, see LICENSE for details. * */ - -/** - * select a different prefix for underscore - */ -$u = _.noConflict(); - -/** - * make the code below compatible with browsers without - * an installed firebug like debugger -if (!window.console || !console.firebug) { - var names = ["log", "debug", "info", "warn", "error", "assert", "dir", - "dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace", - "profile", "profileEnd"]; - window.console = {}; - for (var i = 0; i < names.length; ++i) - window.console[names[i]] = function() {}; -} - */ - -/** - * small helper function to urldecode strings - * - * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL - */ -jQuery.urldecode = function(x) { - if (!x) { - return x - } - return decodeURIComponent(x.replace(/\+/g, ' ')); -}; - -/** - * small helper function to urlencode strings - */ -jQuery.urlencode = encodeURIComponent; - -/** - * This function returns the parsed url parameters of the - * current request. Multiple values per key are supported, - * it will always return arrays of strings for the value parts. - */ -jQuery.getQueryParameters = function(s) { - if (typeof s === 'undefined') - s = document.location.search; - var parts = s.substr(s.indexOf('?') + 1).split('&'); - var result = {}; - for (var i = 0; i < parts.length; i++) { - var tmp = parts[i].split('=', 2); - var key = jQuery.urldecode(tmp[0]); - var value = jQuery.urldecode(tmp[1]); - if (key in result) - result[key].push(value); - else - result[key] = [value]; - } - return result; -}; - -/** - * highlight a given string on a jquery object by wrapping it in - * span elements with the given class name. - */ -jQuery.fn.highlightText = function(text, className) { - function highlight(node, addItems) { - if (node.nodeType === 3) { - var val = node.nodeValue; - var pos = val.toLowerCase().indexOf(text); - if (pos >= 0 && - !jQuery(node.parentNode).hasClass(className) && - !jQuery(node.parentNode).hasClass("nohighlight")) { - var span; - var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg"); - if (isInSVG) { - span = document.createElementNS("http://www.w3.org/2000/svg", "tspan"); - } else { - span = document.createElement("span"); - span.className = className; - } - span.appendChild(document.createTextNode(val.substr(pos, text.length))); - node.parentNode.insertBefore(span, node.parentNode.insertBefore( - document.createTextNode(val.substr(pos + text.length)), - node.nextSibling)); - node.nodeValue = val.substr(0, pos); - if (isInSVG) { - var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect"); - var bbox = node.parentElement.getBBox(); - rect.x.baseVal.value = bbox.x; - rect.y.baseVal.value = bbox.y; - rect.width.baseVal.value = bbox.width; - rect.height.baseVal.value = bbox.height; - rect.setAttribute('class', className); - addItems.push({ - "parent": node.parentNode, - "target": rect}); - } - } - } - else if (!jQuery(node).is("button, select, textarea")) { - jQuery.each(node.childNodes, function() { - highlight(this, addItems); - }); - } - } - var addItems = []; - var result = this.each(function() { - highlight(this, addItems); - }); - for (var i = 0; i < addItems.length; ++i) { - jQuery(addItems[i].parent).before(addItems[i].target); +"use strict"; + +const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([ + "TEXTAREA", + "INPUT", + "SELECT", + "BUTTON", +]); + +const _ready = (callback) => { + if (document.readyState !== "loading") { + callback(); + } else { + document.addEventListener("DOMContentLoaded", callback); } - return result; }; -/* - * backward compatibility for jQuery.browser - * This will be supported until firefox bug is fixed. - */ -if (!jQuery.browser) { - jQuery.uaMatch = function(ua) { - ua = ua.toLowerCase(); - - var match = /(chrome)[ \/]([\w.]+)/.exec(ua) || - /(webkit)[ \/]([\w.]+)/.exec(ua) || - /(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) || - /(msie) ([\w.]+)/.exec(ua) || - ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) || - []; - - return { - browser: match[ 1 ] || "", - version: match[ 2 ] || "0" - }; - }; - jQuery.browser = {}; - jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true; -} - /** * Small JavaScript module for the documentation. */ -var Documentation = { - - init : function() { - this.fixFirefoxAnchorBug(); - this.highlightSearchWords(); - this.initIndexTable(); - this.initOnKeyListeners(); +const Documentation = { + init: () => { + Documentation.initDomainIndexTable(); + Documentation.initOnKeyListeners(); }, /** * i18n support */ - TRANSLATIONS : {}, - PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; }, - LOCALE : 'unknown', + TRANSLATIONS: {}, + PLURAL_EXPR: (n) => (n === 1 ? 0 : 1), + LOCALE: "unknown", // gettext and ngettext don't access this so that the functions // can safely bound to a different name (_ = Documentation.gettext) - gettext : function(string) { - var translated = Documentation.TRANSLATIONS[string]; - if (typeof translated === 'undefined') - return string; - return (typeof translated === 'string') ? translated : translated[0]; - }, - - ngettext : function(singular, plural, n) { - var translated = Documentation.TRANSLATIONS[singular]; - if (typeof translated === 'undefined') - return (n == 1) ? singular : plural; - return translated[Documentation.PLURALEXPR(n)]; - }, - - addTranslations : function(catalog) { - for (var key in catalog.messages) - this.TRANSLATIONS[key] = catalog.messages[key]; - this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')'); - this.LOCALE = catalog.locale; - }, - - /** - * add context elements like header anchor links - */ - addContextElements : function() { - $('div[id] > :header:first').each(function() { - $('\u00B6'). - attr('href', '#' + this.id). - attr('title', _('Permalink to this headline')). - appendTo(this); - }); - $('dt[id]').each(function() { - $('\u00B6'). - attr('href', '#' + this.id). - attr('title', _('Permalink to this definition')). - appendTo(this); - }); - }, - - /** - * workaround a firefox stupidity - * see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075 - */ - fixFirefoxAnchorBug : function() { - if (document.location.hash && $.browser.mozilla) - window.setTimeout(function() { - document.location.href += ''; - }, 10); - }, - - /** - * highlight the search words provided in the url in the text - */ - highlightSearchWords : function() { - var params = $.getQueryParameters(); - var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : []; - if (terms.length) { - var body = $('div.body'); - if (!body.length) { - body = $('body'); - } - window.setTimeout(function() { - $.each(terms, function() { - body.highlightText(this.toLowerCase(), 'highlighted'); - }); - }, 10); - $('') - .appendTo($('#searchbox')); + gettext: (string) => { + const translated = Documentation.TRANSLATIONS[string]; + switch (typeof translated) { + case "undefined": + return string; // no translation + case "string": + return translated; // translation exists + default: + return translated[0]; // (singular, plural) translation tuple exists } }, - /** - * init the domain index toggle buttons - */ - initIndexTable : function() { - var togglers = $('img.toggler').click(function() { - var src = $(this).attr('src'); - var idnum = $(this).attr('id').substr(7); - $('tr.cg-' + idnum).toggle(); - if (src.substr(-9) === 'minus.png') - $(this).attr('src', src.substr(0, src.length-9) + 'plus.png'); - else - $(this).attr('src', src.substr(0, src.length-8) + 'minus.png'); - }).css('display', ''); - if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) { - togglers.click(); - } + ngettext: (singular, plural, n) => { + const translated = Documentation.TRANSLATIONS[singular]; + if (typeof translated !== "undefined") + return translated[Documentation.PLURAL_EXPR(n)]; + return n === 1 ? singular : plural; }, - /** - * helper function to hide the search marks again - */ - hideSearchWords : function() { - $('#searchbox .highlight-link').fadeOut(300); - $('span.highlighted').removeClass('highlighted'); - var url = new URL(window.location); - url.searchParams.delete('highlight'); - window.history.replaceState({}, '', url); + addTranslations: (catalog) => { + Object.assign(Documentation.TRANSLATIONS, catalog.messages); + Documentation.PLURAL_EXPR = new Function( + "n", + `return (${catalog.plural_expr})` + ); + Documentation.LOCALE = catalog.locale; }, - /** + /** * helper function to focus on search bar */ - focusSearchBar : function() { - $('input[name=q]').first().focus(); + focusSearchBar: () => { + document.querySelectorAll("input[name=q]")[0]?.focus(); }, /** - * make the url absolute + * Initialise the domain index toggle buttons */ - makeURL : function(relativeURL) { - return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL; - }, + initDomainIndexTable: () => { + const toggler = (el) => { + const idNumber = el.id.substr(7); + const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`); + if (el.src.substr(-9) === "minus.png") { + el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`; + toggledRows.forEach((el) => (el.style.display = "none")); + } else { + el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`; + toggledRows.forEach((el) => (el.style.display = "")); + } + }; - /** - * get the current relative url - */ - getCurrentURL : function() { - var path = document.location.pathname; - var parts = path.split(/\//); - $.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() { - if (this === '..') - parts.pop(); - }); - var url = parts.join('/'); - return path.substring(url.lastIndexOf('/') + 1, path.length - 1); + const togglerElements = document.querySelectorAll("img.toggler"); + togglerElements.forEach((el) => + el.addEventListener("click", (event) => toggler(event.currentTarget)) + ); + togglerElements.forEach((el) => (el.style.display = "")); + if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler); }, - initOnKeyListeners: function() { + initOnKeyListeners: () => { // only install a listener if it is really needed - if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && - !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) - return; - - $(document).keydown(function(event) { - var activeElementType = document.activeElement.tagName; - // don't navigate when in search box, textarea, dropdown or button - if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT' - && activeElementType !== 'BUTTON') { - if (event.altKey || event.ctrlKey || event.metaKey) - return; - - if (!event.shiftKey) { - switch (event.key) { - case 'ArrowLeft': - if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) - break; - var prevHref = $('link[rel="prev"]').prop('href'); - if (prevHref) { - window.location.href = prevHref; - return false; - } - break; - case 'ArrowRight': - if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) - break; - var nextHref = $('link[rel="next"]').prop('href'); - if (nextHref) { - window.location.href = nextHref; - return false; - } - break; - case 'Escape': - if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) - break; - Documentation.hideSearchWords(); - return false; - } - } - - // some keyboard layouts may need Shift to get / + if ( + !DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS && + !DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS + ) + return; + + document.addEventListener("keydown", (event) => { + // bail for input elements + if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return; + // bail with special keys + if (event.altKey || event.ctrlKey || event.metaKey) return; + + if (!event.shiftKey) { switch (event.key) { - case '/': - if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) - break; - Documentation.focusSearchBar(); - return false; + case "ArrowLeft": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const prevLink = document.querySelector('link[rel="prev"]'); + if (prevLink && prevLink.href) { + window.location.href = prevLink.href; + event.preventDefault(); + } + break; + case "ArrowRight": + if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break; + + const nextLink = document.querySelector('link[rel="next"]'); + if (nextLink && nextLink.href) { + window.location.href = nextLink.href; + event.preventDefault(); + } + break; } } + + // some keyboard layouts may need Shift to get / + switch (event.key) { + case "/": + if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break; + Documentation.focusSearchBar(); + event.preventDefault(); + } }); - } + }, }; // quick alias for translations -_ = Documentation.gettext; +const _ = Documentation.gettext; -$(document).ready(function() { - Documentation.init(); -}); +_ready(Documentation.init); diff --git a/docs/_static/documentation_options.js b/docs/_static/documentation_options.js index b6919f482..42ed3215e 100644 --- a/docs/_static/documentation_options.js +++ b/docs/_static/documentation_options.js @@ -1,9 +1,9 @@ var DOCUMENTATION_OPTIONS = { URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'), - VERSION: '1.1.1', - LANGUAGE: 'None', + VERSION: '3.0.0', + LANGUAGE: 'en', COLLAPSE_INDEX: false, - BUILDER: 'html', + BUILDER: 'dirhtml', FILE_SUFFIX: '.html', LINK_SUFFIX: '.html', HAS_SOURCE: true, diff --git a/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff b/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff deleted file mode 100644 index e31724842..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff2 b/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff2 deleted file mode 100644 index cec2dc94f..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-bold-italic.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-bold.woff b/docs/_static/fonts/FreightSans/freight-sans-bold.woff deleted file mode 100644 index de46625ed..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-bold.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-bold.woff2 b/docs/_static/fonts/FreightSans/freight-sans-bold.woff2 deleted file mode 100644 index dc05cd82b..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-bold.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff b/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff deleted file mode 100644 index a50e5038a..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff2 b/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff2 deleted file mode 100644 index fe284db66..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-book-italic.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-book.woff b/docs/_static/fonts/FreightSans/freight-sans-book.woff deleted file mode 100644 index 6ab8775f0..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-book.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-book.woff2 b/docs/_static/fonts/FreightSans/freight-sans-book.woff2 deleted file mode 100644 index 2688739f1..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-book.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff b/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff deleted file mode 100644 index beda58d4e..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff2 b/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff2 deleted file mode 100644 index e2fa0134b..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-light-italic.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-light.woff b/docs/_static/fonts/FreightSans/freight-sans-light.woff deleted file mode 100644 index 226a0bf83..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-light.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-light.woff2 b/docs/_static/fonts/FreightSans/freight-sans-light.woff2 deleted file mode 100644 index 6d8ff2c04..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-light.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff b/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff deleted file mode 100644 index a42115d63..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff2 b/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff2 deleted file mode 100644 index 16a7713a4..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-medium-italic.woff2 and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-medium.woff b/docs/_static/fonts/FreightSans/freight-sans-medium.woff deleted file mode 100644 index 5ea34539c..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-medium.woff and /dev/null differ diff --git a/docs/_static/fonts/FreightSans/freight-sans-medium.woff2 b/docs/_static/fonts/FreightSans/freight-sans-medium.woff2 deleted file mode 100644 index c58b6a528..000000000 Binary files a/docs/_static/fonts/FreightSans/freight-sans-medium.woff2 and /dev/null differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 new file mode 100644 index 000000000..40c56d40f Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Bold.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 new file mode 100644 index 000000000..32dea0b48 Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-BoldItalic.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 new file mode 100644 index 000000000..c17e30da3 Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Italic.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 new file mode 100644 index 000000000..277251d49 Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Light.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 new file mode 100644 index 000000000..6f54993aa Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-LightItalic.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 new file mode 100644 index 000000000..55c7f5fbc Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Medium.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 new file mode 100644 index 000000000..b2e190faa Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-MediumItalic.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 new file mode 100644 index 000000000..3149ce5d7 Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Regular.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 new file mode 100644 index 000000000..f2669835a Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-Thin.woff2 differ diff --git a/docs/_static/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 new file mode 100644 index 000000000..e7eb57b77 Binary files /dev/null and b/docs/_static/fonts/IBMPlexSans/IBMPlexSans-ThinItalic.woff2 differ diff --git a/docs/_static/images/logo-icon.svg b/docs/_static/images/logo-icon.svg index d9c2b2d6d..8dababf4e 100644 --- a/docs/_static/images/logo-icon.svg +++ b/docs/_static/images/logo-icon.svg @@ -1 +1,27 @@ - \ No newline at end of file + + + + + + + + + + + + + + + + + + + + diff --git a/docs/_static/images/slideflow-logo-name-large.png b/docs/_static/images/slideflow-logo-name-large.png index 23bb07023..4d122173a 100644 Binary files a/docs/_static/images/slideflow-logo-name-large.png and b/docs/_static/images/slideflow-logo-name-large.png differ diff --git a/docs/_static/jquery-3.5.1.js b/docs/_static/jquery-3.5.1.js deleted file mode 100644 index 50937333b..000000000 --- a/docs/_static/jquery-3.5.1.js +++ /dev/null @@ -1,10872 +0,0 @@ -/*! - * jQuery JavaScript Library v3.5.1 - * https://jquery.com/ - * - * Includes Sizzle.js - * https://sizzlejs.com/ - * - * Copyright JS Foundation and other contributors - * Released under the MIT license - * https://jquery.org/license - * - * Date: 2020-05-04T22:49Z - */ -( function( global, factory ) { - - "use strict"; - - if ( typeof module === "object" && typeof module.exports === "object" ) { - - // For CommonJS and CommonJS-like environments where a proper `window` - // is present, execute the factory and get jQuery. - // For environments that do not have a `window` with a `document` - // (such as Node.js), expose a factory as module.exports. - // This accentuates the need for the creation of a real `window`. - // e.g. var jQuery = require("jquery")(window); - // See ticket #14549 for more info. - module.exports = global.document ? - factory( global, true ) : - function( w ) { - if ( !w.document ) { - throw new Error( "jQuery requires a window with a document" ); - } - return factory( w ); - }; - } else { - factory( global ); - } - -// Pass this if window is not defined yet -} )( typeof window !== "undefined" ? window : this, function( window, noGlobal ) { - -// Edge <= 12 - 13+, Firefox <=18 - 45+, IE 10 - 11, Safari 5.1 - 9+, iOS 6 - 9.1 -// throw exceptions when non-strict code (e.g., ASP.NET 4.5) accesses strict mode -// arguments.callee.caller (trac-13335). But as of jQuery 3.0 (2016), strict mode should be common -// enough that all such attempts are guarded in a try block. -"use strict"; - -var arr = []; - -var getProto = Object.getPrototypeOf; - -var slice = arr.slice; - -var flat = arr.flat ? function( array ) { - return arr.flat.call( array ); -} : function( array ) { - return arr.concat.apply( [], array ); -}; - - -var push = arr.push; - -var indexOf = arr.indexOf; - -var class2type = {}; - -var toString = class2type.toString; - -var hasOwn = class2type.hasOwnProperty; - -var fnToString = hasOwn.toString; - -var ObjectFunctionString = fnToString.call( Object ); - -var support = {}; - -var isFunction = function isFunction( obj ) { - - // Support: Chrome <=57, Firefox <=52 - // In some browsers, typeof returns "function" for HTML elements - // (i.e., `typeof document.createElement( "object" ) === "function"`). - // We don't want to classify *any* DOM node as a function. - return typeof obj === "function" && typeof obj.nodeType !== "number"; - }; - - -var isWindow = function isWindow( obj ) { - return obj != null && obj === obj.window; - }; - - -var document = window.document; - - - - var preservedScriptAttributes = { - type: true, - src: true, - nonce: true, - noModule: true - }; - - function DOMEval( code, node, doc ) { - doc = doc || document; - - var i, val, - script = doc.createElement( "script" ); - - script.text = code; - if ( node ) { - for ( i in preservedScriptAttributes ) { - - // Support: Firefox 64+, Edge 18+ - // Some browsers don't support the "nonce" property on scripts. - // On the other hand, just using `getAttribute` is not enough as - // the `nonce` attribute is reset to an empty string whenever it - // becomes browsing-context connected. - // See https://github.com/whatwg/html/issues/2369 - // See https://html.spec.whatwg.org/#nonce-attributes - // The `node.getAttribute` check was added for the sake of - // `jQuery.globalEval` so that it can fake a nonce-containing node - // via an object. - val = node[ i ] || node.getAttribute && node.getAttribute( i ); - if ( val ) { - script.setAttribute( i, val ); - } - } - } - doc.head.appendChild( script ).parentNode.removeChild( script ); - } - - -function toType( obj ) { - if ( obj == null ) { - return obj + ""; - } - - // Support: Android <=2.3 only (functionish RegExp) - return typeof obj === "object" || typeof obj === "function" ? - class2type[ toString.call( obj ) ] || "object" : - typeof obj; -} -/* global Symbol */ -// Defining this global in .eslintrc.json would create a danger of using the global -// unguarded in another place, it seems safer to define global only for this module - - - -var - version = "3.5.1", - - // Define a local copy of jQuery - jQuery = function( selector, context ) { - - // The jQuery object is actually just the init constructor 'enhanced' - // Need init if jQuery is called (just allow error to be thrown if not included) - return new jQuery.fn.init( selector, context ); - }; - -jQuery.fn = jQuery.prototype = { - - // The current version of jQuery being used - jquery: version, - - constructor: jQuery, - - // The default length of a jQuery object is 0 - length: 0, - - toArray: function() { - return slice.call( this ); - }, - - // Get the Nth element in the matched element set OR - // Get the whole matched element set as a clean array - get: function( num ) { - - // Return all the elements in a clean array - if ( num == null ) { - return slice.call( this ); - } - - // Return just the one element from the set - return num < 0 ? this[ num + this.length ] : this[ num ]; - }, - - // Take an array of elements and push it onto the stack - // (returning the new matched element set) - pushStack: function( elems ) { - - // Build a new jQuery matched element set - var ret = jQuery.merge( this.constructor(), elems ); - - // Add the old object onto the stack (as a reference) - ret.prevObject = this; - - // Return the newly-formed element set - return ret; - }, - - // Execute a callback for every element in the matched set. - each: function( callback ) { - return jQuery.each( this, callback ); - }, - - map: function( callback ) { - return this.pushStack( jQuery.map( this, function( elem, i ) { - return callback.call( elem, i, elem ); - } ) ); - }, - - slice: function() { - return this.pushStack( slice.apply( this, arguments ) ); - }, - - first: function() { - return this.eq( 0 ); - }, - - last: function() { - return this.eq( -1 ); - }, - - even: function() { - return this.pushStack( jQuery.grep( this, function( _elem, i ) { - return ( i + 1 ) % 2; - } ) ); - }, - - odd: function() { - return this.pushStack( jQuery.grep( this, function( _elem, i ) { - return i % 2; - } ) ); - }, - - eq: function( i ) { - var len = this.length, - j = +i + ( i < 0 ? len : 0 ); - return this.pushStack( j >= 0 && j < len ? [ this[ j ] ] : [] ); - }, - - end: function() { - return this.prevObject || this.constructor(); - }, - - // For internal use only. - // Behaves like an Array's method, not like a jQuery method. - push: push, - sort: arr.sort, - splice: arr.splice -}; - -jQuery.extend = jQuery.fn.extend = function() { - var options, name, src, copy, copyIsArray, clone, - target = arguments[ 0 ] || {}, - i = 1, - length = arguments.length, - deep = false; - - // Handle a deep copy situation - if ( typeof target === "boolean" ) { - deep = target; - - // Skip the boolean and the target - target = arguments[ i ] || {}; - i++; - } - - // Handle case when target is a string or something (possible in deep copy) - if ( typeof target !== "object" && !isFunction( target ) ) { - target = {}; - } - - // Extend jQuery itself if only one argument is passed - if ( i === length ) { - target = this; - i--; - } - - for ( ; i < length; i++ ) { - - // Only deal with non-null/undefined values - if ( ( options = arguments[ i ] ) != null ) { - - // Extend the base object - for ( name in options ) { - copy = options[ name ]; - - // Prevent Object.prototype pollution - // Prevent never-ending loop - if ( name === "__proto__" || target === copy ) { - continue; - } - - // Recurse if we're merging plain objects or arrays - if ( deep && copy && ( jQuery.isPlainObject( copy ) || - ( copyIsArray = Array.isArray( copy ) ) ) ) { - src = target[ name ]; - - // Ensure proper type for the source value - if ( copyIsArray && !Array.isArray( src ) ) { - clone = []; - } else if ( !copyIsArray && !jQuery.isPlainObject( src ) ) { - clone = {}; - } else { - clone = src; - } - copyIsArray = false; - - // Never move original objects, clone them - target[ name ] = jQuery.extend( deep, clone, copy ); - - // Don't bring in undefined values - } else if ( copy !== undefined ) { - target[ name ] = copy; - } - } - } - } - - // Return the modified object - return target; -}; - -jQuery.extend( { - - // Unique for each copy of jQuery on the page - expando: "jQuery" + ( version + Math.random() ).replace( /\D/g, "" ), - - // Assume jQuery is ready without the ready module - isReady: true, - - error: function( msg ) { - throw new Error( msg ); - }, - - noop: function() {}, - - isPlainObject: function( obj ) { - var proto, Ctor; - - // Detect obvious negatives - // Use toString instead of jQuery.type to catch host objects - if ( !obj || toString.call( obj ) !== "[object Object]" ) { - return false; - } - - proto = getProto( obj ); - - // Objects with no prototype (e.g., `Object.create( null )`) are plain - if ( !proto ) { - return true; - } - - // Objects with prototype are plain iff they were constructed by a global Object function - Ctor = hasOwn.call( proto, "constructor" ) && proto.constructor; - return typeof Ctor === "function" && fnToString.call( Ctor ) === ObjectFunctionString; - }, - - isEmptyObject: function( obj ) { - var name; - - for ( name in obj ) { - return false; - } - return true; - }, - - // Evaluates a script in a provided context; falls back to the global one - // if not specified. - globalEval: function( code, options, doc ) { - DOMEval( code, { nonce: options && options.nonce }, doc ); - }, - - each: function( obj, callback ) { - var length, i = 0; - - if ( isArrayLike( obj ) ) { - length = obj.length; - for ( ; i < length; i++ ) { - if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { - break; - } - } - } else { - for ( i in obj ) { - if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { - break; - } - } - } - - return obj; - }, - - // results is for internal usage only - makeArray: function( arr, results ) { - var ret = results || []; - - if ( arr != null ) { - if ( isArrayLike( Object( arr ) ) ) { - jQuery.merge( ret, - typeof arr === "string" ? - [ arr ] : arr - ); - } else { - push.call( ret, arr ); - } - } - - return ret; - }, - - inArray: function( elem, arr, i ) { - return arr == null ? -1 : indexOf.call( arr, elem, i ); - }, - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - merge: function( first, second ) { - var len = +second.length, - j = 0, - i = first.length; - - for ( ; j < len; j++ ) { - first[ i++ ] = second[ j ]; - } - - first.length = i; - - return first; - }, - - grep: function( elems, callback, invert ) { - var callbackInverse, - matches = [], - i = 0, - length = elems.length, - callbackExpect = !invert; - - // Go through the array, only saving the items - // that pass the validator function - for ( ; i < length; i++ ) { - callbackInverse = !callback( elems[ i ], i ); - if ( callbackInverse !== callbackExpect ) { - matches.push( elems[ i ] ); - } - } - - return matches; - }, - - // arg is for internal usage only - map: function( elems, callback, arg ) { - var length, value, - i = 0, - ret = []; - - // Go through the array, translating each of the items to their new values - if ( isArrayLike( elems ) ) { - length = elems.length; - for ( ; i < length; i++ ) { - value = callback( elems[ i ], i, arg ); - - if ( value != null ) { - ret.push( value ); - } - } - - // Go through every key on the object, - } else { - for ( i in elems ) { - value = callback( elems[ i ], i, arg ); - - if ( value != null ) { - ret.push( value ); - } - } - } - - // Flatten any nested arrays - return flat( ret ); - }, - - // A global GUID counter for objects - guid: 1, - - // jQuery.support is not used in Core but other projects attach their - // properties to it so it needs to exist. - support: support -} ); - -if ( typeof Symbol === "function" ) { - jQuery.fn[ Symbol.iterator ] = arr[ Symbol.iterator ]; -} - -// Populate the class2type map -jQuery.each( "Boolean Number String Function Array Date RegExp Object Error Symbol".split( " " ), -function( _i, name ) { - class2type[ "[object " + name + "]" ] = name.toLowerCase(); -} ); - -function isArrayLike( obj ) { - - // Support: real iOS 8.2 only (not reproducible in simulator) - // `in` check used to prevent JIT error (gh-2145) - // hasOwn isn't used here due to false negatives - // regarding Nodelist length in IE - var length = !!obj && "length" in obj && obj.length, - type = toType( obj ); - - if ( isFunction( obj ) || isWindow( obj ) ) { - return false; - } - - return type === "array" || length === 0 || - typeof length === "number" && length > 0 && ( length - 1 ) in obj; -} -var Sizzle = -/*! - * Sizzle CSS Selector Engine v2.3.5 - * https://sizzlejs.com/ - * - * Copyright JS Foundation and other contributors - * Released under the MIT license - * https://js.foundation/ - * - * Date: 2020-03-14 - */ -( function( window ) { -var i, - support, - Expr, - getText, - isXML, - tokenize, - compile, - select, - outermostContext, - sortInput, - hasDuplicate, - - // Local document vars - setDocument, - document, - docElem, - documentIsHTML, - rbuggyQSA, - rbuggyMatches, - matches, - contains, - - // Instance-specific data - expando = "sizzle" + 1 * new Date(), - preferredDoc = window.document, - dirruns = 0, - done = 0, - classCache = createCache(), - tokenCache = createCache(), - compilerCache = createCache(), - nonnativeSelectorCache = createCache(), - sortOrder = function( a, b ) { - if ( a === b ) { - hasDuplicate = true; - } - return 0; - }, - - // Instance methods - hasOwn = ( {} ).hasOwnProperty, - arr = [], - pop = arr.pop, - pushNative = arr.push, - push = arr.push, - slice = arr.slice, - - // Use a stripped-down indexOf as it's faster than native - // https://jsperf.com/thor-indexof-vs-for/5 - indexOf = function( list, elem ) { - var i = 0, - len = list.length; - for ( ; i < len; i++ ) { - if ( list[ i ] === elem ) { - return i; - } - } - return -1; - }, - - booleans = "checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|" + - "ismap|loop|multiple|open|readonly|required|scoped", - - // Regular expressions - - // http://www.w3.org/TR/css3-selectors/#whitespace - whitespace = "[\\x20\\t\\r\\n\\f]", - - // https://www.w3.org/TR/css-syntax-3/#ident-token-diagram - identifier = "(?:\\\\[\\da-fA-F]{1,6}" + whitespace + - "?|\\\\[^\\r\\n\\f]|[\\w-]|[^\0-\\x7f])+", - - // Attribute selectors: http://www.w3.org/TR/selectors/#attribute-selectors - attributes = "\\[" + whitespace + "*(" + identifier + ")(?:" + whitespace + - - // Operator (capture 2) - "*([*^$|!~]?=)" + whitespace + - - // "Attribute values must be CSS identifiers [capture 5] - // or strings [capture 3 or capture 4]" - "*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|(" + identifier + "))|)" + - whitespace + "*\\]", - - pseudos = ":(" + identifier + ")(?:\\((" + - - // To reduce the number of selectors needing tokenize in the preFilter, prefer arguments: - // 1. quoted (capture 3; capture 4 or capture 5) - "('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|" + - - // 2. simple (capture 6) - "((?:\\\\.|[^\\\\()[\\]]|" + attributes + ")*)|" + - - // 3. anything else (capture 2) - ".*" + - ")\\)|)", - - // Leading and non-escaped trailing whitespace, capturing some non-whitespace characters preceding the latter - rwhitespace = new RegExp( whitespace + "+", "g" ), - rtrim = new RegExp( "^" + whitespace + "+|((?:^|[^\\\\])(?:\\\\.)*)" + - whitespace + "+$", "g" ), - - rcomma = new RegExp( "^" + whitespace + "*," + whitespace + "*" ), - rcombinators = new RegExp( "^" + whitespace + "*([>+~]|" + whitespace + ")" + whitespace + - "*" ), - rdescend = new RegExp( whitespace + "|>" ), - - rpseudo = new RegExp( pseudos ), - ridentifier = new RegExp( "^" + identifier + "$" ), - - matchExpr = { - "ID": new RegExp( "^#(" + identifier + ")" ), - "CLASS": new RegExp( "^\\.(" + identifier + ")" ), - "TAG": new RegExp( "^(" + identifier + "|[*])" ), - "ATTR": new RegExp( "^" + attributes ), - "PSEUDO": new RegExp( "^" + pseudos ), - "CHILD": new RegExp( "^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\(" + - whitespace + "*(even|odd|(([+-]|)(\\d*)n|)" + whitespace + "*(?:([+-]|)" + - whitespace + "*(\\d+)|))" + whitespace + "*\\)|)", "i" ), - "bool": new RegExp( "^(?:" + booleans + ")$", "i" ), - - // For use in libraries implementing .is() - // We use this for POS matching in `select` - "needsContext": new RegExp( "^" + whitespace + - "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + - "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" ) - }, - - rhtml = /HTML$/i, - rinputs = /^(?:input|select|textarea|button)$/i, - rheader = /^h\d$/i, - - rnative = /^[^{]+\{\s*\[native \w/, - - // Easily-parseable/retrievable ID or TAG or CLASS selectors - rquickExpr = /^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/, - - rsibling = /[+~]/, - - // CSS escapes - // http://www.w3.org/TR/CSS21/syndata.html#escaped-characters - runescape = new RegExp( "\\\\[\\da-fA-F]{1,6}" + whitespace + "?|\\\\([^\\r\\n\\f])", "g" ), - funescape = function( escape, nonHex ) { - var high = "0x" + escape.slice( 1 ) - 0x10000; - - return nonHex ? - - // Strip the backslash prefix from a non-hex escape sequence - nonHex : - - // Replace a hexadecimal escape sequence with the encoded Unicode code point - // Support: IE <=11+ - // For values outside the Basic Multilingual Plane (BMP), manually construct a - // surrogate pair - high < 0 ? - String.fromCharCode( high + 0x10000 ) : - String.fromCharCode( high >> 10 | 0xD800, high & 0x3FF | 0xDC00 ); - }, - - // CSS string/identifier serialization - // https://drafts.csswg.org/cssom/#common-serializing-idioms - rcssescape = /([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g, - fcssescape = function( ch, asCodePoint ) { - if ( asCodePoint ) { - - // U+0000 NULL becomes U+FFFD REPLACEMENT CHARACTER - if ( ch === "\0" ) { - return "\uFFFD"; - } - - // Control characters and (dependent upon position) numbers get escaped as code points - return ch.slice( 0, -1 ) + "\\" + - ch.charCodeAt( ch.length - 1 ).toString( 16 ) + " "; - } - - // Other potentially-special ASCII characters get backslash-escaped - return "\\" + ch; - }, - - // Used for iframes - // See setDocument() - // Removing the function wrapper causes a "Permission Denied" - // error in IE - unloadHandler = function() { - setDocument(); - }, - - inDisabledFieldset = addCombinator( - function( elem ) { - return elem.disabled === true && elem.nodeName.toLowerCase() === "fieldset"; - }, - { dir: "parentNode", next: "legend" } - ); - -// Optimize for push.apply( _, NodeList ) -try { - push.apply( - ( arr = slice.call( preferredDoc.childNodes ) ), - preferredDoc.childNodes - ); - - // Support: Android<4.0 - // Detect silently failing push.apply - // eslint-disable-next-line no-unused-expressions - arr[ preferredDoc.childNodes.length ].nodeType; -} catch ( e ) { - push = { apply: arr.length ? - - // Leverage slice if possible - function( target, els ) { - pushNative.apply( target, slice.call( els ) ); - } : - - // Support: IE<9 - // Otherwise append directly - function( target, els ) { - var j = target.length, - i = 0; - - // Can't trust NodeList.length - while ( ( target[ j++ ] = els[ i++ ] ) ) {} - target.length = j - 1; - } - }; -} - -function Sizzle( selector, context, results, seed ) { - var m, i, elem, nid, match, groups, newSelector, - newContext = context && context.ownerDocument, - - // nodeType defaults to 9, since context defaults to document - nodeType = context ? context.nodeType : 9; - - results = results || []; - - // Return early from calls with invalid selector or context - if ( typeof selector !== "string" || !selector || - nodeType !== 1 && nodeType !== 9 && nodeType !== 11 ) { - - return results; - } - - // Try to shortcut find operations (as opposed to filters) in HTML documents - if ( !seed ) { - setDocument( context ); - context = context || document; - - if ( documentIsHTML ) { - - // If the selector is sufficiently simple, try using a "get*By*" DOM method - // (excepting DocumentFragment context, where the methods don't exist) - if ( nodeType !== 11 && ( match = rquickExpr.exec( selector ) ) ) { - - // ID selector - if ( ( m = match[ 1 ] ) ) { - - // Document context - if ( nodeType === 9 ) { - if ( ( elem = context.getElementById( m ) ) ) { - - // Support: IE, Opera, Webkit - // TODO: identify versions - // getElementById can match elements by name instead of ID - if ( elem.id === m ) { - results.push( elem ); - return results; - } - } else { - return results; - } - - // Element context - } else { - - // Support: IE, Opera, Webkit - // TODO: identify versions - // getElementById can match elements by name instead of ID - if ( newContext && ( elem = newContext.getElementById( m ) ) && - contains( context, elem ) && - elem.id === m ) { - - results.push( elem ); - return results; - } - } - - // Type selector - } else if ( match[ 2 ] ) { - push.apply( results, context.getElementsByTagName( selector ) ); - return results; - - // Class selector - } else if ( ( m = match[ 3 ] ) && support.getElementsByClassName && - context.getElementsByClassName ) { - - push.apply( results, context.getElementsByClassName( m ) ); - return results; - } - } - - // Take advantage of querySelectorAll - if ( support.qsa && - !nonnativeSelectorCache[ selector + " " ] && - ( !rbuggyQSA || !rbuggyQSA.test( selector ) ) && - - // Support: IE 8 only - // Exclude object elements - ( nodeType !== 1 || context.nodeName.toLowerCase() !== "object" ) ) { - - newSelector = selector; - newContext = context; - - // qSA considers elements outside a scoping root when evaluating child or - // descendant combinators, which is not what we want. - // In such cases, we work around the behavior by prefixing every selector in the - // list with an ID selector referencing the scope context. - // The technique has to be used as well when a leading combinator is used - // as such selectors are not recognized by querySelectorAll. - // Thanks to Andrew Dupont for this technique. - if ( nodeType === 1 && - ( rdescend.test( selector ) || rcombinators.test( selector ) ) ) { - - // Expand context for sibling selectors - newContext = rsibling.test( selector ) && testContext( context.parentNode ) || - context; - - // We can use :scope instead of the ID hack if the browser - // supports it & if we're not changing the context. - if ( newContext !== context || !support.scope ) { - - // Capture the context ID, setting it first if necessary - if ( ( nid = context.getAttribute( "id" ) ) ) { - nid = nid.replace( rcssescape, fcssescape ); - } else { - context.setAttribute( "id", ( nid = expando ) ); - } - } - - // Prefix every selector in the list - groups = tokenize( selector ); - i = groups.length; - while ( i-- ) { - groups[ i ] = ( nid ? "#" + nid : ":scope" ) + " " + - toSelector( groups[ i ] ); - } - newSelector = groups.join( "," ); - } - - try { - push.apply( results, - newContext.querySelectorAll( newSelector ) - ); - return results; - } catch ( qsaError ) { - nonnativeSelectorCache( selector, true ); - } finally { - if ( nid === expando ) { - context.removeAttribute( "id" ); - } - } - } - } - } - - // All others - return select( selector.replace( rtrim, "$1" ), context, results, seed ); -} - -/** - * Create key-value caches of limited size - * @returns {function(string, object)} Returns the Object data after storing it on itself with - * property name the (space-suffixed) string and (if the cache is larger than Expr.cacheLength) - * deleting the oldest entry - */ -function createCache() { - var keys = []; - - function cache( key, value ) { - - // Use (key + " ") to avoid collision with native prototype properties (see Issue #157) - if ( keys.push( key + " " ) > Expr.cacheLength ) { - - // Only keep the most recent entries - delete cache[ keys.shift() ]; - } - return ( cache[ key + " " ] = value ); - } - return cache; -} - -/** - * Mark a function for special use by Sizzle - * @param {Function} fn The function to mark - */ -function markFunction( fn ) { - fn[ expando ] = true; - return fn; -} - -/** - * Support testing using an element - * @param {Function} fn Passed the created element and returns a boolean result - */ -function assert( fn ) { - var el = document.createElement( "fieldset" ); - - try { - return !!fn( el ); - } catch ( e ) { - return false; - } finally { - - // Remove from its parent by default - if ( el.parentNode ) { - el.parentNode.removeChild( el ); - } - - // release memory in IE - el = null; - } -} - -/** - * Adds the same handler for all of the specified attrs - * @param {String} attrs Pipe-separated list of attributes - * @param {Function} handler The method that will be applied - */ -function addHandle( attrs, handler ) { - var arr = attrs.split( "|" ), - i = arr.length; - - while ( i-- ) { - Expr.attrHandle[ arr[ i ] ] = handler; - } -} - -/** - * Checks document order of two siblings - * @param {Element} a - * @param {Element} b - * @returns {Number} Returns less than 0 if a precedes b, greater than 0 if a follows b - */ -function siblingCheck( a, b ) { - var cur = b && a, - diff = cur && a.nodeType === 1 && b.nodeType === 1 && - a.sourceIndex - b.sourceIndex; - - // Use IE sourceIndex if available on both nodes - if ( diff ) { - return diff; - } - - // Check if b follows a - if ( cur ) { - while ( ( cur = cur.nextSibling ) ) { - if ( cur === b ) { - return -1; - } - } - } - - return a ? 1 : -1; -} - -/** - * Returns a function to use in pseudos for input types - * @param {String} type - */ -function createInputPseudo( type ) { - return function( elem ) { - var name = elem.nodeName.toLowerCase(); - return name === "input" && elem.type === type; - }; -} - -/** - * Returns a function to use in pseudos for buttons - * @param {String} type - */ -function createButtonPseudo( type ) { - return function( elem ) { - var name = elem.nodeName.toLowerCase(); - return ( name === "input" || name === "button" ) && elem.type === type; - }; -} - -/** - * Returns a function to use in pseudos for :enabled/:disabled - * @param {Boolean} disabled true for :disabled; false for :enabled - */ -function createDisabledPseudo( disabled ) { - - // Known :disabled false positives: fieldset[disabled] > legend:nth-of-type(n+2) :can-disable - return function( elem ) { - - // Only certain elements can match :enabled or :disabled - // https://html.spec.whatwg.org/multipage/scripting.html#selector-enabled - // https://html.spec.whatwg.org/multipage/scripting.html#selector-disabled - if ( "form" in elem ) { - - // Check for inherited disabledness on relevant non-disabled elements: - // * listed form-associated elements in a disabled fieldset - // https://html.spec.whatwg.org/multipage/forms.html#category-listed - // https://html.spec.whatwg.org/multipage/forms.html#concept-fe-disabled - // * option elements in a disabled optgroup - // https://html.spec.whatwg.org/multipage/forms.html#concept-option-disabled - // All such elements have a "form" property. - if ( elem.parentNode && elem.disabled === false ) { - - // Option elements defer to a parent optgroup if present - if ( "label" in elem ) { - if ( "label" in elem.parentNode ) { - return elem.parentNode.disabled === disabled; - } else { - return elem.disabled === disabled; - } - } - - // Support: IE 6 - 11 - // Use the isDisabled shortcut property to check for disabled fieldset ancestors - return elem.isDisabled === disabled || - - // Where there is no isDisabled, check manually - /* jshint -W018 */ - elem.isDisabled !== !disabled && - inDisabledFieldset( elem ) === disabled; - } - - return elem.disabled === disabled; - - // Try to winnow out elements that can't be disabled before trusting the disabled property. - // Some victims get caught in our net (label, legend, menu, track), but it shouldn't - // even exist on them, let alone have a boolean value. - } else if ( "label" in elem ) { - return elem.disabled === disabled; - } - - // Remaining elements are neither :enabled nor :disabled - return false; - }; -} - -/** - * Returns a function to use in pseudos for positionals - * @param {Function} fn - */ -function createPositionalPseudo( fn ) { - return markFunction( function( argument ) { - argument = +argument; - return markFunction( function( seed, matches ) { - var j, - matchIndexes = fn( [], seed.length, argument ), - i = matchIndexes.length; - - // Match elements found at the specified indexes - while ( i-- ) { - if ( seed[ ( j = matchIndexes[ i ] ) ] ) { - seed[ j ] = !( matches[ j ] = seed[ j ] ); - } - } - } ); - } ); -} - -/** - * Checks a node for validity as a Sizzle context - * @param {Element|Object=} context - * @returns {Element|Object|Boolean} The input node if acceptable, otherwise a falsy value - */ -function testContext( context ) { - return context && typeof context.getElementsByTagName !== "undefined" && context; -} - -// Expose support vars for convenience -support = Sizzle.support = {}; - -/** - * Detects XML nodes - * @param {Element|Object} elem An element or a document - * @returns {Boolean} True iff elem is a non-HTML XML node - */ -isXML = Sizzle.isXML = function( elem ) { - var namespace = elem.namespaceURI, - docElem = ( elem.ownerDocument || elem ).documentElement; - - // Support: IE <=8 - // Assume HTML when documentElement doesn't yet exist, such as inside loading iframes - // https://bugs.jquery.com/ticket/4833 - return !rhtml.test( namespace || docElem && docElem.nodeName || "HTML" ); -}; - -/** - * Sets document-related variables once based on the current document - * @param {Element|Object} [doc] An element or document object to use to set the document - * @returns {Object} Returns the current document - */ -setDocument = Sizzle.setDocument = function( node ) { - var hasCompare, subWindow, - doc = node ? node.ownerDocument || node : preferredDoc; - - // Return early if doc is invalid or already selected - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( doc == document || doc.nodeType !== 9 || !doc.documentElement ) { - return document; - } - - // Update global variables - document = doc; - docElem = document.documentElement; - documentIsHTML = !isXML( document ); - - // Support: IE 9 - 11+, Edge 12 - 18+ - // Accessing iframe documents after unload throws "permission denied" errors (jQuery #13936) - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( preferredDoc != document && - ( subWindow = document.defaultView ) && subWindow.top !== subWindow ) { - - // Support: IE 11, Edge - if ( subWindow.addEventListener ) { - subWindow.addEventListener( "unload", unloadHandler, false ); - - // Support: IE 9 - 10 only - } else if ( subWindow.attachEvent ) { - subWindow.attachEvent( "onunload", unloadHandler ); - } - } - - // Support: IE 8 - 11+, Edge 12 - 18+, Chrome <=16 - 25 only, Firefox <=3.6 - 31 only, - // Safari 4 - 5 only, Opera <=11.6 - 12.x only - // IE/Edge & older browsers don't support the :scope pseudo-class. - // Support: Safari 6.0 only - // Safari 6.0 supports :scope but it's an alias of :root there. - support.scope = assert( function( el ) { - docElem.appendChild( el ).appendChild( document.createElement( "div" ) ); - return typeof el.querySelectorAll !== "undefined" && - !el.querySelectorAll( ":scope fieldset div" ).length; - } ); - - /* Attributes - ---------------------------------------------------------------------- */ - - // Support: IE<8 - // Verify that getAttribute really returns attributes and not properties - // (excepting IE8 booleans) - support.attributes = assert( function( el ) { - el.className = "i"; - return !el.getAttribute( "className" ); - } ); - - /* getElement(s)By* - ---------------------------------------------------------------------- */ - - // Check if getElementsByTagName("*") returns only elements - support.getElementsByTagName = assert( function( el ) { - el.appendChild( document.createComment( "" ) ); - return !el.getElementsByTagName( "*" ).length; - } ); - - // Support: IE<9 - support.getElementsByClassName = rnative.test( document.getElementsByClassName ); - - // Support: IE<10 - // Check if getElementById returns elements by name - // The broken getElementById methods don't pick up programmatically-set names, - // so use a roundabout getElementsByName test - support.getById = assert( function( el ) { - docElem.appendChild( el ).id = expando; - return !document.getElementsByName || !document.getElementsByName( expando ).length; - } ); - - // ID filter and find - if ( support.getById ) { - Expr.filter[ "ID" ] = function( id ) { - var attrId = id.replace( runescape, funescape ); - return function( elem ) { - return elem.getAttribute( "id" ) === attrId; - }; - }; - Expr.find[ "ID" ] = function( id, context ) { - if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { - var elem = context.getElementById( id ); - return elem ? [ elem ] : []; - } - }; - } else { - Expr.filter[ "ID" ] = function( id ) { - var attrId = id.replace( runescape, funescape ); - return function( elem ) { - var node = typeof elem.getAttributeNode !== "undefined" && - elem.getAttributeNode( "id" ); - return node && node.value === attrId; - }; - }; - - // Support: IE 6 - 7 only - // getElementById is not reliable as a find shortcut - Expr.find[ "ID" ] = function( id, context ) { - if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { - var node, i, elems, - elem = context.getElementById( id ); - - if ( elem ) { - - // Verify the id attribute - node = elem.getAttributeNode( "id" ); - if ( node && node.value === id ) { - return [ elem ]; - } - - // Fall back on getElementsByName - elems = context.getElementsByName( id ); - i = 0; - while ( ( elem = elems[ i++ ] ) ) { - node = elem.getAttributeNode( "id" ); - if ( node && node.value === id ) { - return [ elem ]; - } - } - } - - return []; - } - }; - } - - // Tag - Expr.find[ "TAG" ] = support.getElementsByTagName ? - function( tag, context ) { - if ( typeof context.getElementsByTagName !== "undefined" ) { - return context.getElementsByTagName( tag ); - - // DocumentFragment nodes don't have gEBTN - } else if ( support.qsa ) { - return context.querySelectorAll( tag ); - } - } : - - function( tag, context ) { - var elem, - tmp = [], - i = 0, - - // By happy coincidence, a (broken) gEBTN appears on DocumentFragment nodes too - results = context.getElementsByTagName( tag ); - - // Filter out possible comments - if ( tag === "*" ) { - while ( ( elem = results[ i++ ] ) ) { - if ( elem.nodeType === 1 ) { - tmp.push( elem ); - } - } - - return tmp; - } - return results; - }; - - // Class - Expr.find[ "CLASS" ] = support.getElementsByClassName && function( className, context ) { - if ( typeof context.getElementsByClassName !== "undefined" && documentIsHTML ) { - return context.getElementsByClassName( className ); - } - }; - - /* QSA/matchesSelector - ---------------------------------------------------------------------- */ - - // QSA and matchesSelector support - - // matchesSelector(:active) reports false when true (IE9/Opera 11.5) - rbuggyMatches = []; - - // qSa(:focus) reports false when true (Chrome 21) - // We allow this because of a bug in IE8/9 that throws an error - // whenever `document.activeElement` is accessed on an iframe - // So, we allow :focus to pass through QSA all the time to avoid the IE error - // See https://bugs.jquery.com/ticket/13378 - rbuggyQSA = []; - - if ( ( support.qsa = rnative.test( document.querySelectorAll ) ) ) { - - // Build QSA regex - // Regex strategy adopted from Diego Perini - assert( function( el ) { - - var input; - - // Select is set to empty string on purpose - // This is to test IE's treatment of not explicitly - // setting a boolean content attribute, - // since its presence should be enough - // https://bugs.jquery.com/ticket/12359 - docElem.appendChild( el ).innerHTML = "" + - ""; - - // Support: IE8, Opera 11-12.16 - // Nothing should be selected when empty strings follow ^= or $= or *= - // The test attribute must be unknown in Opera but "safe" for WinRT - // https://msdn.microsoft.com/en-us/library/ie/hh465388.aspx#attribute_section - if ( el.querySelectorAll( "[msallowcapture^='']" ).length ) { - rbuggyQSA.push( "[*^$]=" + whitespace + "*(?:''|\"\")" ); - } - - // Support: IE8 - // Boolean attributes and "value" are not treated correctly - if ( !el.querySelectorAll( "[selected]" ).length ) { - rbuggyQSA.push( "\\[" + whitespace + "*(?:value|" + booleans + ")" ); - } - - // Support: Chrome<29, Android<4.4, Safari<7.0+, iOS<7.0+, PhantomJS<1.9.8+ - if ( !el.querySelectorAll( "[id~=" + expando + "-]" ).length ) { - rbuggyQSA.push( "~=" ); - } - - // Support: IE 11+, Edge 15 - 18+ - // IE 11/Edge don't find elements on a `[name='']` query in some cases. - // Adding a temporary attribute to the document before the selection works - // around the issue. - // Interestingly, IE 10 & older don't seem to have the issue. - input = document.createElement( "input" ); - input.setAttribute( "name", "" ); - el.appendChild( input ); - if ( !el.querySelectorAll( "[name='']" ).length ) { - rbuggyQSA.push( "\\[" + whitespace + "*name" + whitespace + "*=" + - whitespace + "*(?:''|\"\")" ); - } - - // Webkit/Opera - :checked should return selected option elements - // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked - // IE8 throws error here and will not see later tests - if ( !el.querySelectorAll( ":checked" ).length ) { - rbuggyQSA.push( ":checked" ); - } - - // Support: Safari 8+, iOS 8+ - // https://bugs.webkit.org/show_bug.cgi?id=136851 - // In-page `selector#id sibling-combinator selector` fails - if ( !el.querySelectorAll( "a#" + expando + "+*" ).length ) { - rbuggyQSA.push( ".#.+[+~]" ); - } - - // Support: Firefox <=3.6 - 5 only - // Old Firefox doesn't throw on a badly-escaped identifier. - el.querySelectorAll( "\\\f" ); - rbuggyQSA.push( "[\\r\\n\\f]" ); - } ); - - assert( function( el ) { - el.innerHTML = "" + - ""; - - // Support: Windows 8 Native Apps - // The type and name attributes are restricted during .innerHTML assignment - var input = document.createElement( "input" ); - input.setAttribute( "type", "hidden" ); - el.appendChild( input ).setAttribute( "name", "D" ); - - // Support: IE8 - // Enforce case-sensitivity of name attribute - if ( el.querySelectorAll( "[name=d]" ).length ) { - rbuggyQSA.push( "name" + whitespace + "*[*^$|!~]?=" ); - } - - // FF 3.5 - :enabled/:disabled and hidden elements (hidden elements are still enabled) - // IE8 throws error here and will not see later tests - if ( el.querySelectorAll( ":enabled" ).length !== 2 ) { - rbuggyQSA.push( ":enabled", ":disabled" ); - } - - // Support: IE9-11+ - // IE's :disabled selector does not pick up the children of disabled fieldsets - docElem.appendChild( el ).disabled = true; - if ( el.querySelectorAll( ":disabled" ).length !== 2 ) { - rbuggyQSA.push( ":enabled", ":disabled" ); - } - - // Support: Opera 10 - 11 only - // Opera 10-11 does not throw on post-comma invalid pseudos - el.querySelectorAll( "*,:x" ); - rbuggyQSA.push( ",.*:" ); - } ); - } - - if ( ( support.matchesSelector = rnative.test( ( matches = docElem.matches || - docElem.webkitMatchesSelector || - docElem.mozMatchesSelector || - docElem.oMatchesSelector || - docElem.msMatchesSelector ) ) ) ) { - - assert( function( el ) { - - // Check to see if it's possible to do matchesSelector - // on a disconnected node (IE 9) - support.disconnectedMatch = matches.call( el, "*" ); - - // This should fail with an exception - // Gecko does not error, returns false instead - matches.call( el, "[s!='']:x" ); - rbuggyMatches.push( "!=", pseudos ); - } ); - } - - rbuggyQSA = rbuggyQSA.length && new RegExp( rbuggyQSA.join( "|" ) ); - rbuggyMatches = rbuggyMatches.length && new RegExp( rbuggyMatches.join( "|" ) ); - - /* Contains - ---------------------------------------------------------------------- */ - hasCompare = rnative.test( docElem.compareDocumentPosition ); - - // Element contains another - // Purposefully self-exclusive - // As in, an element does not contain itself - contains = hasCompare || rnative.test( docElem.contains ) ? - function( a, b ) { - var adown = a.nodeType === 9 ? a.documentElement : a, - bup = b && b.parentNode; - return a === bup || !!( bup && bup.nodeType === 1 && ( - adown.contains ? - adown.contains( bup ) : - a.compareDocumentPosition && a.compareDocumentPosition( bup ) & 16 - ) ); - } : - function( a, b ) { - if ( b ) { - while ( ( b = b.parentNode ) ) { - if ( b === a ) { - return true; - } - } - } - return false; - }; - - /* Sorting - ---------------------------------------------------------------------- */ - - // Document order sorting - sortOrder = hasCompare ? - function( a, b ) { - - // Flag for duplicate removal - if ( a === b ) { - hasDuplicate = true; - return 0; - } - - // Sort on method existence if only one input has compareDocumentPosition - var compare = !a.compareDocumentPosition - !b.compareDocumentPosition; - if ( compare ) { - return compare; - } - - // Calculate position if both inputs belong to the same document - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - compare = ( a.ownerDocument || a ) == ( b.ownerDocument || b ) ? - a.compareDocumentPosition( b ) : - - // Otherwise we know they are disconnected - 1; - - // Disconnected nodes - if ( compare & 1 || - ( !support.sortDetached && b.compareDocumentPosition( a ) === compare ) ) { - - // Choose the first element that is related to our preferred document - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( a == document || a.ownerDocument == preferredDoc && - contains( preferredDoc, a ) ) { - return -1; - } - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( b == document || b.ownerDocument == preferredDoc && - contains( preferredDoc, b ) ) { - return 1; - } - - // Maintain original order - return sortInput ? - ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : - 0; - } - - return compare & 4 ? -1 : 1; - } : - function( a, b ) { - - // Exit early if the nodes are identical - if ( a === b ) { - hasDuplicate = true; - return 0; - } - - var cur, - i = 0, - aup = a.parentNode, - bup = b.parentNode, - ap = [ a ], - bp = [ b ]; - - // Parentless nodes are either documents or disconnected - if ( !aup || !bup ) { - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - /* eslint-disable eqeqeq */ - return a == document ? -1 : - b == document ? 1 : - /* eslint-enable eqeqeq */ - aup ? -1 : - bup ? 1 : - sortInput ? - ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : - 0; - - // If the nodes are siblings, we can do a quick check - } else if ( aup === bup ) { - return siblingCheck( a, b ); - } - - // Otherwise we need full lists of their ancestors for comparison - cur = a; - while ( ( cur = cur.parentNode ) ) { - ap.unshift( cur ); - } - cur = b; - while ( ( cur = cur.parentNode ) ) { - bp.unshift( cur ); - } - - // Walk down the tree looking for a discrepancy - while ( ap[ i ] === bp[ i ] ) { - i++; - } - - return i ? - - // Do a sibling check if the nodes have a common ancestor - siblingCheck( ap[ i ], bp[ i ] ) : - - // Otherwise nodes in our document sort first - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - /* eslint-disable eqeqeq */ - ap[ i ] == preferredDoc ? -1 : - bp[ i ] == preferredDoc ? 1 : - /* eslint-enable eqeqeq */ - 0; - }; - - return document; -}; - -Sizzle.matches = function( expr, elements ) { - return Sizzle( expr, null, null, elements ); -}; - -Sizzle.matchesSelector = function( elem, expr ) { - setDocument( elem ); - - if ( support.matchesSelector && documentIsHTML && - !nonnativeSelectorCache[ expr + " " ] && - ( !rbuggyMatches || !rbuggyMatches.test( expr ) ) && - ( !rbuggyQSA || !rbuggyQSA.test( expr ) ) ) { - - try { - var ret = matches.call( elem, expr ); - - // IE 9's matchesSelector returns false on disconnected nodes - if ( ret || support.disconnectedMatch || - - // As well, disconnected nodes are said to be in a document - // fragment in IE 9 - elem.document && elem.document.nodeType !== 11 ) { - return ret; - } - } catch ( e ) { - nonnativeSelectorCache( expr, true ); - } - } - - return Sizzle( expr, document, null, [ elem ] ).length > 0; -}; - -Sizzle.contains = function( context, elem ) { - - // Set document vars if needed - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( ( context.ownerDocument || context ) != document ) { - setDocument( context ); - } - return contains( context, elem ); -}; - -Sizzle.attr = function( elem, name ) { - - // Set document vars if needed - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( ( elem.ownerDocument || elem ) != document ) { - setDocument( elem ); - } - - var fn = Expr.attrHandle[ name.toLowerCase() ], - - // Don't get fooled by Object.prototype properties (jQuery #13807) - val = fn && hasOwn.call( Expr.attrHandle, name.toLowerCase() ) ? - fn( elem, name, !documentIsHTML ) : - undefined; - - return val !== undefined ? - val : - support.attributes || !documentIsHTML ? - elem.getAttribute( name ) : - ( val = elem.getAttributeNode( name ) ) && val.specified ? - val.value : - null; -}; - -Sizzle.escape = function( sel ) { - return ( sel + "" ).replace( rcssescape, fcssescape ); -}; - -Sizzle.error = function( msg ) { - throw new Error( "Syntax error, unrecognized expression: " + msg ); -}; - -/** - * Document sorting and removing duplicates - * @param {ArrayLike} results - */ -Sizzle.uniqueSort = function( results ) { - var elem, - duplicates = [], - j = 0, - i = 0; - - // Unless we *know* we can detect duplicates, assume their presence - hasDuplicate = !support.detectDuplicates; - sortInput = !support.sortStable && results.slice( 0 ); - results.sort( sortOrder ); - - if ( hasDuplicate ) { - while ( ( elem = results[ i++ ] ) ) { - if ( elem === results[ i ] ) { - j = duplicates.push( i ); - } - } - while ( j-- ) { - results.splice( duplicates[ j ], 1 ); - } - } - - // Clear input after sorting to release objects - // See https://github.com/jquery/sizzle/pull/225 - sortInput = null; - - return results; -}; - -/** - * Utility function for retrieving the text value of an array of DOM nodes - * @param {Array|Element} elem - */ -getText = Sizzle.getText = function( elem ) { - var node, - ret = "", - i = 0, - nodeType = elem.nodeType; - - if ( !nodeType ) { - - // If no nodeType, this is expected to be an array - while ( ( node = elem[ i++ ] ) ) { - - // Do not traverse comment nodes - ret += getText( node ); - } - } else if ( nodeType === 1 || nodeType === 9 || nodeType === 11 ) { - - // Use textContent for elements - // innerText usage removed for consistency of new lines (jQuery #11153) - if ( typeof elem.textContent === "string" ) { - return elem.textContent; - } else { - - // Traverse its children - for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { - ret += getText( elem ); - } - } - } else if ( nodeType === 3 || nodeType === 4 ) { - return elem.nodeValue; - } - - // Do not include comment or processing instruction nodes - - return ret; -}; - -Expr = Sizzle.selectors = { - - // Can be adjusted by the user - cacheLength: 50, - - createPseudo: markFunction, - - match: matchExpr, - - attrHandle: {}, - - find: {}, - - relative: { - ">": { dir: "parentNode", first: true }, - " ": { dir: "parentNode" }, - "+": { dir: "previousSibling", first: true }, - "~": { dir: "previousSibling" } - }, - - preFilter: { - "ATTR": function( match ) { - match[ 1 ] = match[ 1 ].replace( runescape, funescape ); - - // Move the given value to match[3] whether quoted or unquoted - match[ 3 ] = ( match[ 3 ] || match[ 4 ] || - match[ 5 ] || "" ).replace( runescape, funescape ); - - if ( match[ 2 ] === "~=" ) { - match[ 3 ] = " " + match[ 3 ] + " "; - } - - return match.slice( 0, 4 ); - }, - - "CHILD": function( match ) { - - /* matches from matchExpr["CHILD"] - 1 type (only|nth|...) - 2 what (child|of-type) - 3 argument (even|odd|\d*|\d*n([+-]\d+)?|...) - 4 xn-component of xn+y argument ([+-]?\d*n|) - 5 sign of xn-component - 6 x of xn-component - 7 sign of y-component - 8 y of y-component - */ - match[ 1 ] = match[ 1 ].toLowerCase(); - - if ( match[ 1 ].slice( 0, 3 ) === "nth" ) { - - // nth-* requires argument - if ( !match[ 3 ] ) { - Sizzle.error( match[ 0 ] ); - } - - // numeric x and y parameters for Expr.filter.CHILD - // remember that false/true cast respectively to 0/1 - match[ 4 ] = +( match[ 4 ] ? - match[ 5 ] + ( match[ 6 ] || 1 ) : - 2 * ( match[ 3 ] === "even" || match[ 3 ] === "odd" ) ); - match[ 5 ] = +( ( match[ 7 ] + match[ 8 ] ) || match[ 3 ] === "odd" ); - - // other types prohibit arguments - } else if ( match[ 3 ] ) { - Sizzle.error( match[ 0 ] ); - } - - return match; - }, - - "PSEUDO": function( match ) { - var excess, - unquoted = !match[ 6 ] && match[ 2 ]; - - if ( matchExpr[ "CHILD" ].test( match[ 0 ] ) ) { - return null; - } - - // Accept quoted arguments as-is - if ( match[ 3 ] ) { - match[ 2 ] = match[ 4 ] || match[ 5 ] || ""; - - // Strip excess characters from unquoted arguments - } else if ( unquoted && rpseudo.test( unquoted ) && - - // Get excess from tokenize (recursively) - ( excess = tokenize( unquoted, true ) ) && - - // advance to the next closing parenthesis - ( excess = unquoted.indexOf( ")", unquoted.length - excess ) - unquoted.length ) ) { - - // excess is a negative index - match[ 0 ] = match[ 0 ].slice( 0, excess ); - match[ 2 ] = unquoted.slice( 0, excess ); - } - - // Return only captures needed by the pseudo filter method (type and argument) - return match.slice( 0, 3 ); - } - }, - - filter: { - - "TAG": function( nodeNameSelector ) { - var nodeName = nodeNameSelector.replace( runescape, funescape ).toLowerCase(); - return nodeNameSelector === "*" ? - function() { - return true; - } : - function( elem ) { - return elem.nodeName && elem.nodeName.toLowerCase() === nodeName; - }; - }, - - "CLASS": function( className ) { - var pattern = classCache[ className + " " ]; - - return pattern || - ( pattern = new RegExp( "(^|" + whitespace + - ")" + className + "(" + whitespace + "|$)" ) ) && classCache( - className, function( elem ) { - return pattern.test( - typeof elem.className === "string" && elem.className || - typeof elem.getAttribute !== "undefined" && - elem.getAttribute( "class" ) || - "" - ); - } ); - }, - - "ATTR": function( name, operator, check ) { - return function( elem ) { - var result = Sizzle.attr( elem, name ); - - if ( result == null ) { - return operator === "!="; - } - if ( !operator ) { - return true; - } - - result += ""; - - /* eslint-disable max-len */ - - return operator === "=" ? result === check : - operator === "!=" ? result !== check : - operator === "^=" ? check && result.indexOf( check ) === 0 : - operator === "*=" ? check && result.indexOf( check ) > -1 : - operator === "$=" ? check && result.slice( -check.length ) === check : - operator === "~=" ? ( " " + result.replace( rwhitespace, " " ) + " " ).indexOf( check ) > -1 : - operator === "|=" ? result === check || result.slice( 0, check.length + 1 ) === check + "-" : - false; - /* eslint-enable max-len */ - - }; - }, - - "CHILD": function( type, what, _argument, first, last ) { - var simple = type.slice( 0, 3 ) !== "nth", - forward = type.slice( -4 ) !== "last", - ofType = what === "of-type"; - - return first === 1 && last === 0 ? - - // Shortcut for :nth-*(n) - function( elem ) { - return !!elem.parentNode; - } : - - function( elem, _context, xml ) { - var cache, uniqueCache, outerCache, node, nodeIndex, start, - dir = simple !== forward ? "nextSibling" : "previousSibling", - parent = elem.parentNode, - name = ofType && elem.nodeName.toLowerCase(), - useCache = !xml && !ofType, - diff = false; - - if ( parent ) { - - // :(first|last|only)-(child|of-type) - if ( simple ) { - while ( dir ) { - node = elem; - while ( ( node = node[ dir ] ) ) { - if ( ofType ? - node.nodeName.toLowerCase() === name : - node.nodeType === 1 ) { - - return false; - } - } - - // Reverse direction for :only-* (if we haven't yet done so) - start = dir = type === "only" && !start && "nextSibling"; - } - return true; - } - - start = [ forward ? parent.firstChild : parent.lastChild ]; - - // non-xml :nth-child(...) stores cache data on `parent` - if ( forward && useCache ) { - - // Seek `elem` from a previously-cached index - - // ...in a gzip-friendly way - node = parent; - outerCache = node[ expando ] || ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - cache = uniqueCache[ type ] || []; - nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; - diff = nodeIndex && cache[ 2 ]; - node = nodeIndex && parent.childNodes[ nodeIndex ]; - - while ( ( node = ++nodeIndex && node && node[ dir ] || - - // Fallback to seeking `elem` from the start - ( diff = nodeIndex = 0 ) || start.pop() ) ) { - - // When found, cache indexes on `parent` and break - if ( node.nodeType === 1 && ++diff && node === elem ) { - uniqueCache[ type ] = [ dirruns, nodeIndex, diff ]; - break; - } - } - - } else { - - // Use previously-cached element index if available - if ( useCache ) { - - // ...in a gzip-friendly way - node = elem; - outerCache = node[ expando ] || ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - cache = uniqueCache[ type ] || []; - nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; - diff = nodeIndex; - } - - // xml :nth-child(...) - // or :nth-last-child(...) or :nth(-last)?-of-type(...) - if ( diff === false ) { - - // Use the same loop as above to seek `elem` from the start - while ( ( node = ++nodeIndex && node && node[ dir ] || - ( diff = nodeIndex = 0 ) || start.pop() ) ) { - - if ( ( ofType ? - node.nodeName.toLowerCase() === name : - node.nodeType === 1 ) && - ++diff ) { - - // Cache the index of each encountered element - if ( useCache ) { - outerCache = node[ expando ] || - ( node[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ node.uniqueID ] || - ( outerCache[ node.uniqueID ] = {} ); - - uniqueCache[ type ] = [ dirruns, diff ]; - } - - if ( node === elem ) { - break; - } - } - } - } - } - - // Incorporate the offset, then check against cycle size - diff -= last; - return diff === first || ( diff % first === 0 && diff / first >= 0 ); - } - }; - }, - - "PSEUDO": function( pseudo, argument ) { - - // pseudo-class names are case-insensitive - // http://www.w3.org/TR/selectors/#pseudo-classes - // Prioritize by case sensitivity in case custom pseudos are added with uppercase letters - // Remember that setFilters inherits from pseudos - var args, - fn = Expr.pseudos[ pseudo ] || Expr.setFilters[ pseudo.toLowerCase() ] || - Sizzle.error( "unsupported pseudo: " + pseudo ); - - // The user may use createPseudo to indicate that - // arguments are needed to create the filter function - // just as Sizzle does - if ( fn[ expando ] ) { - return fn( argument ); - } - - // But maintain support for old signatures - if ( fn.length > 1 ) { - args = [ pseudo, pseudo, "", argument ]; - return Expr.setFilters.hasOwnProperty( pseudo.toLowerCase() ) ? - markFunction( function( seed, matches ) { - var idx, - matched = fn( seed, argument ), - i = matched.length; - while ( i-- ) { - idx = indexOf( seed, matched[ i ] ); - seed[ idx ] = !( matches[ idx ] = matched[ i ] ); - } - } ) : - function( elem ) { - return fn( elem, 0, args ); - }; - } - - return fn; - } - }, - - pseudos: { - - // Potentially complex pseudos - "not": markFunction( function( selector ) { - - // Trim the selector passed to compile - // to avoid treating leading and trailing - // spaces as combinators - var input = [], - results = [], - matcher = compile( selector.replace( rtrim, "$1" ) ); - - return matcher[ expando ] ? - markFunction( function( seed, matches, _context, xml ) { - var elem, - unmatched = matcher( seed, null, xml, [] ), - i = seed.length; - - // Match elements unmatched by `matcher` - while ( i-- ) { - if ( ( elem = unmatched[ i ] ) ) { - seed[ i ] = !( matches[ i ] = elem ); - } - } - } ) : - function( elem, _context, xml ) { - input[ 0 ] = elem; - matcher( input, null, xml, results ); - - // Don't keep the element (issue #299) - input[ 0 ] = null; - return !results.pop(); - }; - } ), - - "has": markFunction( function( selector ) { - return function( elem ) { - return Sizzle( selector, elem ).length > 0; - }; - } ), - - "contains": markFunction( function( text ) { - text = text.replace( runescape, funescape ); - return function( elem ) { - return ( elem.textContent || getText( elem ) ).indexOf( text ) > -1; - }; - } ), - - // "Whether an element is represented by a :lang() selector - // is based solely on the element's language value - // being equal to the identifier C, - // or beginning with the identifier C immediately followed by "-". - // The matching of C against the element's language value is performed case-insensitively. - // The identifier C does not have to be a valid language name." - // http://www.w3.org/TR/selectors/#lang-pseudo - "lang": markFunction( function( lang ) { - - // lang value must be a valid identifier - if ( !ridentifier.test( lang || "" ) ) { - Sizzle.error( "unsupported lang: " + lang ); - } - lang = lang.replace( runescape, funescape ).toLowerCase(); - return function( elem ) { - var elemLang; - do { - if ( ( elemLang = documentIsHTML ? - elem.lang : - elem.getAttribute( "xml:lang" ) || elem.getAttribute( "lang" ) ) ) { - - elemLang = elemLang.toLowerCase(); - return elemLang === lang || elemLang.indexOf( lang + "-" ) === 0; - } - } while ( ( elem = elem.parentNode ) && elem.nodeType === 1 ); - return false; - }; - } ), - - // Miscellaneous - "target": function( elem ) { - var hash = window.location && window.location.hash; - return hash && hash.slice( 1 ) === elem.id; - }, - - "root": function( elem ) { - return elem === docElem; - }, - - "focus": function( elem ) { - return elem === document.activeElement && - ( !document.hasFocus || document.hasFocus() ) && - !!( elem.type || elem.href || ~elem.tabIndex ); - }, - - // Boolean properties - "enabled": createDisabledPseudo( false ), - "disabled": createDisabledPseudo( true ), - - "checked": function( elem ) { - - // In CSS3, :checked should return both checked and selected elements - // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked - var nodeName = elem.nodeName.toLowerCase(); - return ( nodeName === "input" && !!elem.checked ) || - ( nodeName === "option" && !!elem.selected ); - }, - - "selected": function( elem ) { - - // Accessing this property makes selected-by-default - // options in Safari work properly - if ( elem.parentNode ) { - // eslint-disable-next-line no-unused-expressions - elem.parentNode.selectedIndex; - } - - return elem.selected === true; - }, - - // Contents - "empty": function( elem ) { - - // http://www.w3.org/TR/selectors/#empty-pseudo - // :empty is negated by element (1) or content nodes (text: 3; cdata: 4; entity ref: 5), - // but not by others (comment: 8; processing instruction: 7; etc.) - // nodeType < 6 works because attributes (2) do not appear as children - for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { - if ( elem.nodeType < 6 ) { - return false; - } - } - return true; - }, - - "parent": function( elem ) { - return !Expr.pseudos[ "empty" ]( elem ); - }, - - // Element/input types - "header": function( elem ) { - return rheader.test( elem.nodeName ); - }, - - "input": function( elem ) { - return rinputs.test( elem.nodeName ); - }, - - "button": function( elem ) { - var name = elem.nodeName.toLowerCase(); - return name === "input" && elem.type === "button" || name === "button"; - }, - - "text": function( elem ) { - var attr; - return elem.nodeName.toLowerCase() === "input" && - elem.type === "text" && - - // Support: IE<8 - // New HTML5 attribute values (e.g., "search") appear with elem.type === "text" - ( ( attr = elem.getAttribute( "type" ) ) == null || - attr.toLowerCase() === "text" ); - }, - - // Position-in-collection - "first": createPositionalPseudo( function() { - return [ 0 ]; - } ), - - "last": createPositionalPseudo( function( _matchIndexes, length ) { - return [ length - 1 ]; - } ), - - "eq": createPositionalPseudo( function( _matchIndexes, length, argument ) { - return [ argument < 0 ? argument + length : argument ]; - } ), - - "even": createPositionalPseudo( function( matchIndexes, length ) { - var i = 0; - for ( ; i < length; i += 2 ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "odd": createPositionalPseudo( function( matchIndexes, length ) { - var i = 1; - for ( ; i < length; i += 2 ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "lt": createPositionalPseudo( function( matchIndexes, length, argument ) { - var i = argument < 0 ? - argument + length : - argument > length ? - length : - argument; - for ( ; --i >= 0; ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ), - - "gt": createPositionalPseudo( function( matchIndexes, length, argument ) { - var i = argument < 0 ? argument + length : argument; - for ( ; ++i < length; ) { - matchIndexes.push( i ); - } - return matchIndexes; - } ) - } -}; - -Expr.pseudos[ "nth" ] = Expr.pseudos[ "eq" ]; - -// Add button/input type pseudos -for ( i in { radio: true, checkbox: true, file: true, password: true, image: true } ) { - Expr.pseudos[ i ] = createInputPseudo( i ); -} -for ( i in { submit: true, reset: true } ) { - Expr.pseudos[ i ] = createButtonPseudo( i ); -} - -// Easy API for creating new setFilters -function setFilters() {} -setFilters.prototype = Expr.filters = Expr.pseudos; -Expr.setFilters = new setFilters(); - -tokenize = Sizzle.tokenize = function( selector, parseOnly ) { - var matched, match, tokens, type, - soFar, groups, preFilters, - cached = tokenCache[ selector + " " ]; - - if ( cached ) { - return parseOnly ? 0 : cached.slice( 0 ); - } - - soFar = selector; - groups = []; - preFilters = Expr.preFilter; - - while ( soFar ) { - - // Comma and first run - if ( !matched || ( match = rcomma.exec( soFar ) ) ) { - if ( match ) { - - // Don't consume trailing commas as valid - soFar = soFar.slice( match[ 0 ].length ) || soFar; - } - groups.push( ( tokens = [] ) ); - } - - matched = false; - - // Combinators - if ( ( match = rcombinators.exec( soFar ) ) ) { - matched = match.shift(); - tokens.push( { - value: matched, - - // Cast descendant combinators to space - type: match[ 0 ].replace( rtrim, " " ) - } ); - soFar = soFar.slice( matched.length ); - } - - // Filters - for ( type in Expr.filter ) { - if ( ( match = matchExpr[ type ].exec( soFar ) ) && ( !preFilters[ type ] || - ( match = preFilters[ type ]( match ) ) ) ) { - matched = match.shift(); - tokens.push( { - value: matched, - type: type, - matches: match - } ); - soFar = soFar.slice( matched.length ); - } - } - - if ( !matched ) { - break; - } - } - - // Return the length of the invalid excess - // if we're just parsing - // Otherwise, throw an error or return tokens - return parseOnly ? - soFar.length : - soFar ? - Sizzle.error( selector ) : - - // Cache the tokens - tokenCache( selector, groups ).slice( 0 ); -}; - -function toSelector( tokens ) { - var i = 0, - len = tokens.length, - selector = ""; - for ( ; i < len; i++ ) { - selector += tokens[ i ].value; - } - return selector; -} - -function addCombinator( matcher, combinator, base ) { - var dir = combinator.dir, - skip = combinator.next, - key = skip || dir, - checkNonElements = base && key === "parentNode", - doneName = done++; - - return combinator.first ? - - // Check against closest ancestor/preceding element - function( elem, context, xml ) { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - return matcher( elem, context, xml ); - } - } - return false; - } : - - // Check against all ancestor/preceding elements - function( elem, context, xml ) { - var oldCache, uniqueCache, outerCache, - newCache = [ dirruns, doneName ]; - - // We can't set arbitrary data on XML nodes, so they don't benefit from combinator caching - if ( xml ) { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - if ( matcher( elem, context, xml ) ) { - return true; - } - } - } - } else { - while ( ( elem = elem[ dir ] ) ) { - if ( elem.nodeType === 1 || checkNonElements ) { - outerCache = elem[ expando ] || ( elem[ expando ] = {} ); - - // Support: IE <9 only - // Defend against cloned attroperties (jQuery gh-1709) - uniqueCache = outerCache[ elem.uniqueID ] || - ( outerCache[ elem.uniqueID ] = {} ); - - if ( skip && skip === elem.nodeName.toLowerCase() ) { - elem = elem[ dir ] || elem; - } else if ( ( oldCache = uniqueCache[ key ] ) && - oldCache[ 0 ] === dirruns && oldCache[ 1 ] === doneName ) { - - // Assign to newCache so results back-propagate to previous elements - return ( newCache[ 2 ] = oldCache[ 2 ] ); - } else { - - // Reuse newcache so results back-propagate to previous elements - uniqueCache[ key ] = newCache; - - // A match means we're done; a fail means we have to keep checking - if ( ( newCache[ 2 ] = matcher( elem, context, xml ) ) ) { - return true; - } - } - } - } - } - return false; - }; -} - -function elementMatcher( matchers ) { - return matchers.length > 1 ? - function( elem, context, xml ) { - var i = matchers.length; - while ( i-- ) { - if ( !matchers[ i ]( elem, context, xml ) ) { - return false; - } - } - return true; - } : - matchers[ 0 ]; -} - -function multipleContexts( selector, contexts, results ) { - var i = 0, - len = contexts.length; - for ( ; i < len; i++ ) { - Sizzle( selector, contexts[ i ], results ); - } - return results; -} - -function condense( unmatched, map, filter, context, xml ) { - var elem, - newUnmatched = [], - i = 0, - len = unmatched.length, - mapped = map != null; - - for ( ; i < len; i++ ) { - if ( ( elem = unmatched[ i ] ) ) { - if ( !filter || filter( elem, context, xml ) ) { - newUnmatched.push( elem ); - if ( mapped ) { - map.push( i ); - } - } - } - } - - return newUnmatched; -} - -function setMatcher( preFilter, selector, matcher, postFilter, postFinder, postSelector ) { - if ( postFilter && !postFilter[ expando ] ) { - postFilter = setMatcher( postFilter ); - } - if ( postFinder && !postFinder[ expando ] ) { - postFinder = setMatcher( postFinder, postSelector ); - } - return markFunction( function( seed, results, context, xml ) { - var temp, i, elem, - preMap = [], - postMap = [], - preexisting = results.length, - - // Get initial elements from seed or context - elems = seed || multipleContexts( - selector || "*", - context.nodeType ? [ context ] : context, - [] - ), - - // Prefilter to get matcher input, preserving a map for seed-results synchronization - matcherIn = preFilter && ( seed || !selector ) ? - condense( elems, preMap, preFilter, context, xml ) : - elems, - - matcherOut = matcher ? - - // If we have a postFinder, or filtered seed, or non-seed postFilter or preexisting results, - postFinder || ( seed ? preFilter : preexisting || postFilter ) ? - - // ...intermediate processing is necessary - [] : - - // ...otherwise use results directly - results : - matcherIn; - - // Find primary matches - if ( matcher ) { - matcher( matcherIn, matcherOut, context, xml ); - } - - // Apply postFilter - if ( postFilter ) { - temp = condense( matcherOut, postMap ); - postFilter( temp, [], context, xml ); - - // Un-match failing elements by moving them back to matcherIn - i = temp.length; - while ( i-- ) { - if ( ( elem = temp[ i ] ) ) { - matcherOut[ postMap[ i ] ] = !( matcherIn[ postMap[ i ] ] = elem ); - } - } - } - - if ( seed ) { - if ( postFinder || preFilter ) { - if ( postFinder ) { - - // Get the final matcherOut by condensing this intermediate into postFinder contexts - temp = []; - i = matcherOut.length; - while ( i-- ) { - if ( ( elem = matcherOut[ i ] ) ) { - - // Restore matcherIn since elem is not yet a final match - temp.push( ( matcherIn[ i ] = elem ) ); - } - } - postFinder( null, ( matcherOut = [] ), temp, xml ); - } - - // Move matched elements from seed to results to keep them synchronized - i = matcherOut.length; - while ( i-- ) { - if ( ( elem = matcherOut[ i ] ) && - ( temp = postFinder ? indexOf( seed, elem ) : preMap[ i ] ) > -1 ) { - - seed[ temp ] = !( results[ temp ] = elem ); - } - } - } - - // Add elements to results, through postFinder if defined - } else { - matcherOut = condense( - matcherOut === results ? - matcherOut.splice( preexisting, matcherOut.length ) : - matcherOut - ); - if ( postFinder ) { - postFinder( null, results, matcherOut, xml ); - } else { - push.apply( results, matcherOut ); - } - } - } ); -} - -function matcherFromTokens( tokens ) { - var checkContext, matcher, j, - len = tokens.length, - leadingRelative = Expr.relative[ tokens[ 0 ].type ], - implicitRelative = leadingRelative || Expr.relative[ " " ], - i = leadingRelative ? 1 : 0, - - // The foundational matcher ensures that elements are reachable from top-level context(s) - matchContext = addCombinator( function( elem ) { - return elem === checkContext; - }, implicitRelative, true ), - matchAnyContext = addCombinator( function( elem ) { - return indexOf( checkContext, elem ) > -1; - }, implicitRelative, true ), - matchers = [ function( elem, context, xml ) { - var ret = ( !leadingRelative && ( xml || context !== outermostContext ) ) || ( - ( checkContext = context ).nodeType ? - matchContext( elem, context, xml ) : - matchAnyContext( elem, context, xml ) ); - - // Avoid hanging onto element (issue #299) - checkContext = null; - return ret; - } ]; - - for ( ; i < len; i++ ) { - if ( ( matcher = Expr.relative[ tokens[ i ].type ] ) ) { - matchers = [ addCombinator( elementMatcher( matchers ), matcher ) ]; - } else { - matcher = Expr.filter[ tokens[ i ].type ].apply( null, tokens[ i ].matches ); - - // Return special upon seeing a positional matcher - if ( matcher[ expando ] ) { - - // Find the next relative operator (if any) for proper handling - j = ++i; - for ( ; j < len; j++ ) { - if ( Expr.relative[ tokens[ j ].type ] ) { - break; - } - } - return setMatcher( - i > 1 && elementMatcher( matchers ), - i > 1 && toSelector( - - // If the preceding token was a descendant combinator, insert an implicit any-element `*` - tokens - .slice( 0, i - 1 ) - .concat( { value: tokens[ i - 2 ].type === " " ? "*" : "" } ) - ).replace( rtrim, "$1" ), - matcher, - i < j && matcherFromTokens( tokens.slice( i, j ) ), - j < len && matcherFromTokens( ( tokens = tokens.slice( j ) ) ), - j < len && toSelector( tokens ) - ); - } - matchers.push( matcher ); - } - } - - return elementMatcher( matchers ); -} - -function matcherFromGroupMatchers( elementMatchers, setMatchers ) { - var bySet = setMatchers.length > 0, - byElement = elementMatchers.length > 0, - superMatcher = function( seed, context, xml, results, outermost ) { - var elem, j, matcher, - matchedCount = 0, - i = "0", - unmatched = seed && [], - setMatched = [], - contextBackup = outermostContext, - - // We must always have either seed elements or outermost context - elems = seed || byElement && Expr.find[ "TAG" ]( "*", outermost ), - - // Use integer dirruns iff this is the outermost matcher - dirrunsUnique = ( dirruns += contextBackup == null ? 1 : Math.random() || 0.1 ), - len = elems.length; - - if ( outermost ) { - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - outermostContext = context == document || context || outermost; - } - - // Add elements passing elementMatchers directly to results - // Support: IE<9, Safari - // Tolerate NodeList properties (IE: "length"; Safari: ) matching elements by id - for ( ; i !== len && ( elem = elems[ i ] ) != null; i++ ) { - if ( byElement && elem ) { - j = 0; - - // Support: IE 11+, Edge 17 - 18+ - // IE/Edge sometimes throw a "Permission denied" error when strict-comparing - // two documents; shallow comparisons work. - // eslint-disable-next-line eqeqeq - if ( !context && elem.ownerDocument != document ) { - setDocument( elem ); - xml = !documentIsHTML; - } - while ( ( matcher = elementMatchers[ j++ ] ) ) { - if ( matcher( elem, context || document, xml ) ) { - results.push( elem ); - break; - } - } - if ( outermost ) { - dirruns = dirrunsUnique; - } - } - - // Track unmatched elements for set filters - if ( bySet ) { - - // They will have gone through all possible matchers - if ( ( elem = !matcher && elem ) ) { - matchedCount--; - } - - // Lengthen the array for every element, matched or not - if ( seed ) { - unmatched.push( elem ); - } - } - } - - // `i` is now the count of elements visited above, and adding it to `matchedCount` - // makes the latter nonnegative. - matchedCount += i; - - // Apply set filters to unmatched elements - // NOTE: This can be skipped if there are no unmatched elements (i.e., `matchedCount` - // equals `i`), unless we didn't visit _any_ elements in the above loop because we have - // no element matchers and no seed. - // Incrementing an initially-string "0" `i` allows `i` to remain a string only in that - // case, which will result in a "00" `matchedCount` that differs from `i` but is also - // numerically zero. - if ( bySet && i !== matchedCount ) { - j = 0; - while ( ( matcher = setMatchers[ j++ ] ) ) { - matcher( unmatched, setMatched, context, xml ); - } - - if ( seed ) { - - // Reintegrate element matches to eliminate the need for sorting - if ( matchedCount > 0 ) { - while ( i-- ) { - if ( !( unmatched[ i ] || setMatched[ i ] ) ) { - setMatched[ i ] = pop.call( results ); - } - } - } - - // Discard index placeholder values to get only actual matches - setMatched = condense( setMatched ); - } - - // Add matches to results - push.apply( results, setMatched ); - - // Seedless set matches succeeding multiple successful matchers stipulate sorting - if ( outermost && !seed && setMatched.length > 0 && - ( matchedCount + setMatchers.length ) > 1 ) { - - Sizzle.uniqueSort( results ); - } - } - - // Override manipulation of globals by nested matchers - if ( outermost ) { - dirruns = dirrunsUnique; - outermostContext = contextBackup; - } - - return unmatched; - }; - - return bySet ? - markFunction( superMatcher ) : - superMatcher; -} - -compile = Sizzle.compile = function( selector, match /* Internal Use Only */ ) { - var i, - setMatchers = [], - elementMatchers = [], - cached = compilerCache[ selector + " " ]; - - if ( !cached ) { - - // Generate a function of recursive functions that can be used to check each element - if ( !match ) { - match = tokenize( selector ); - } - i = match.length; - while ( i-- ) { - cached = matcherFromTokens( match[ i ] ); - if ( cached[ expando ] ) { - setMatchers.push( cached ); - } else { - elementMatchers.push( cached ); - } - } - - // Cache the compiled function - cached = compilerCache( - selector, - matcherFromGroupMatchers( elementMatchers, setMatchers ) - ); - - // Save selector and tokenization - cached.selector = selector; - } - return cached; -}; - -/** - * A low-level selection function that works with Sizzle's compiled - * selector functions - * @param {String|Function} selector A selector or a pre-compiled - * selector function built with Sizzle.compile - * @param {Element} context - * @param {Array} [results] - * @param {Array} [seed] A set of elements to match against - */ -select = Sizzle.select = function( selector, context, results, seed ) { - var i, tokens, token, type, find, - compiled = typeof selector === "function" && selector, - match = !seed && tokenize( ( selector = compiled.selector || selector ) ); - - results = results || []; - - // Try to minimize operations if there is only one selector in the list and no seed - // (the latter of which guarantees us context) - if ( match.length === 1 ) { - - // Reduce context if the leading compound selector is an ID - tokens = match[ 0 ] = match[ 0 ].slice( 0 ); - if ( tokens.length > 2 && ( token = tokens[ 0 ] ).type === "ID" && - context.nodeType === 9 && documentIsHTML && Expr.relative[ tokens[ 1 ].type ] ) { - - context = ( Expr.find[ "ID" ]( token.matches[ 0 ] - .replace( runescape, funescape ), context ) || [] )[ 0 ]; - if ( !context ) { - return results; - - // Precompiled matchers will still verify ancestry, so step up a level - } else if ( compiled ) { - context = context.parentNode; - } - - selector = selector.slice( tokens.shift().value.length ); - } - - // Fetch a seed set for right-to-left matching - i = matchExpr[ "needsContext" ].test( selector ) ? 0 : tokens.length; - while ( i-- ) { - token = tokens[ i ]; - - // Abort if we hit a combinator - if ( Expr.relative[ ( type = token.type ) ] ) { - break; - } - if ( ( find = Expr.find[ type ] ) ) { - - // Search, expanding context for leading sibling combinators - if ( ( seed = find( - token.matches[ 0 ].replace( runescape, funescape ), - rsibling.test( tokens[ 0 ].type ) && testContext( context.parentNode ) || - context - ) ) ) { - - // If seed is empty or no tokens remain, we can return early - tokens.splice( i, 1 ); - selector = seed.length && toSelector( tokens ); - if ( !selector ) { - push.apply( results, seed ); - return results; - } - - break; - } - } - } - } - - // Compile and execute a filtering function if one is not provided - // Provide `match` to avoid retokenization if we modified the selector above - ( compiled || compile( selector, match ) )( - seed, - context, - !documentIsHTML, - results, - !context || rsibling.test( selector ) && testContext( context.parentNode ) || context - ); - return results; -}; - -// One-time assignments - -// Sort stability -support.sortStable = expando.split( "" ).sort( sortOrder ).join( "" ) === expando; - -// Support: Chrome 14-35+ -// Always assume duplicates if they aren't passed to the comparison function -support.detectDuplicates = !!hasDuplicate; - -// Initialize against the default document -setDocument(); - -// Support: Webkit<537.32 - Safari 6.0.3/Chrome 25 (fixed in Chrome 27) -// Detached nodes confoundingly follow *each other* -support.sortDetached = assert( function( el ) { - - // Should return 1, but returns 4 (following) - return el.compareDocumentPosition( document.createElement( "fieldset" ) ) & 1; -} ); - -// Support: IE<8 -// Prevent attribute/property "interpolation" -// https://msdn.microsoft.com/en-us/library/ms536429%28VS.85%29.aspx -if ( !assert( function( el ) { - el.innerHTML = ""; - return el.firstChild.getAttribute( "href" ) === "#"; -} ) ) { - addHandle( "type|href|height|width", function( elem, name, isXML ) { - if ( !isXML ) { - return elem.getAttribute( name, name.toLowerCase() === "type" ? 1 : 2 ); - } - } ); -} - -// Support: IE<9 -// Use defaultValue in place of getAttribute("value") -if ( !support.attributes || !assert( function( el ) { - el.innerHTML = ""; - el.firstChild.setAttribute( "value", "" ); - return el.firstChild.getAttribute( "value" ) === ""; -} ) ) { - addHandle( "value", function( elem, _name, isXML ) { - if ( !isXML && elem.nodeName.toLowerCase() === "input" ) { - return elem.defaultValue; - } - } ); -} - -// Support: IE<9 -// Use getAttributeNode to fetch booleans when getAttribute lies -if ( !assert( function( el ) { - return el.getAttribute( "disabled" ) == null; -} ) ) { - addHandle( booleans, function( elem, name, isXML ) { - var val; - if ( !isXML ) { - return elem[ name ] === true ? name.toLowerCase() : - ( val = elem.getAttributeNode( name ) ) && val.specified ? - val.value : - null; - } - } ); -} - -return Sizzle; - -} )( window ); - - - -jQuery.find = Sizzle; -jQuery.expr = Sizzle.selectors; - -// Deprecated -jQuery.expr[ ":" ] = jQuery.expr.pseudos; -jQuery.uniqueSort = jQuery.unique = Sizzle.uniqueSort; -jQuery.text = Sizzle.getText; -jQuery.isXMLDoc = Sizzle.isXML; -jQuery.contains = Sizzle.contains; -jQuery.escapeSelector = Sizzle.escape; - - - - -var dir = function( elem, dir, until ) { - var matched = [], - truncate = until !== undefined; - - while ( ( elem = elem[ dir ] ) && elem.nodeType !== 9 ) { - if ( elem.nodeType === 1 ) { - if ( truncate && jQuery( elem ).is( until ) ) { - break; - } - matched.push( elem ); - } - } - return matched; -}; - - -var siblings = function( n, elem ) { - var matched = []; - - for ( ; n; n = n.nextSibling ) { - if ( n.nodeType === 1 && n !== elem ) { - matched.push( n ); - } - } - - return matched; -}; - - -var rneedsContext = jQuery.expr.match.needsContext; - - - -function nodeName( elem, name ) { - - return elem.nodeName && elem.nodeName.toLowerCase() === name.toLowerCase(); - -}; -var rsingleTag = ( /^<([a-z][^\/\0>:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i ); - - - -// Implement the identical functionality for filter and not -function winnow( elements, qualifier, not ) { - if ( isFunction( qualifier ) ) { - return jQuery.grep( elements, function( elem, i ) { - return !!qualifier.call( elem, i, elem ) !== not; - } ); - } - - // Single element - if ( qualifier.nodeType ) { - return jQuery.grep( elements, function( elem ) { - return ( elem === qualifier ) !== not; - } ); - } - - // Arraylike of elements (jQuery, arguments, Array) - if ( typeof qualifier !== "string" ) { - return jQuery.grep( elements, function( elem ) { - return ( indexOf.call( qualifier, elem ) > -1 ) !== not; - } ); - } - - // Filtered directly for both simple and complex selectors - return jQuery.filter( qualifier, elements, not ); -} - -jQuery.filter = function( expr, elems, not ) { - var elem = elems[ 0 ]; - - if ( not ) { - expr = ":not(" + expr + ")"; - } - - if ( elems.length === 1 && elem.nodeType === 1 ) { - return jQuery.find.matchesSelector( elem, expr ) ? [ elem ] : []; - } - - return jQuery.find.matches( expr, jQuery.grep( elems, function( elem ) { - return elem.nodeType === 1; - } ) ); -}; - -jQuery.fn.extend( { - find: function( selector ) { - var i, ret, - len = this.length, - self = this; - - if ( typeof selector !== "string" ) { - return this.pushStack( jQuery( selector ).filter( function() { - for ( i = 0; i < len; i++ ) { - if ( jQuery.contains( self[ i ], this ) ) { - return true; - } - } - } ) ); - } - - ret = this.pushStack( [] ); - - for ( i = 0; i < len; i++ ) { - jQuery.find( selector, self[ i ], ret ); - } - - return len > 1 ? jQuery.uniqueSort( ret ) : ret; - }, - filter: function( selector ) { - return this.pushStack( winnow( this, selector || [], false ) ); - }, - not: function( selector ) { - return this.pushStack( winnow( this, selector || [], true ) ); - }, - is: function( selector ) { - return !!winnow( - this, - - // If this is a positional/relative selector, check membership in the returned set - // so $("p:first").is("p:last") won't return true for a doc with two "p". - typeof selector === "string" && rneedsContext.test( selector ) ? - jQuery( selector ) : - selector || [], - false - ).length; - } -} ); - - -// Initialize a jQuery object - - -// A central reference to the root jQuery(document) -var rootjQuery, - - // A simple way to check for HTML strings - // Prioritize #id over to avoid XSS via location.hash (#9521) - // Strict HTML recognition (#11290: must start with <) - // Shortcut simple #id case for speed - rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/, - - init = jQuery.fn.init = function( selector, context, root ) { - var match, elem; - - // HANDLE: $(""), $(null), $(undefined), $(false) - if ( !selector ) { - return this; - } - - // Method init() accepts an alternate rootjQuery - // so migrate can support jQuery.sub (gh-2101) - root = root || rootjQuery; - - // Handle HTML strings - if ( typeof selector === "string" ) { - if ( selector[ 0 ] === "<" && - selector[ selector.length - 1 ] === ">" && - selector.length >= 3 ) { - - // Assume that strings that start and end with <> are HTML and skip the regex check - match = [ null, selector, null ]; - - } else { - match = rquickExpr.exec( selector ); - } - - // Match html or make sure no context is specified for #id - if ( match && ( match[ 1 ] || !context ) ) { - - // HANDLE: $(html) -> $(array) - if ( match[ 1 ] ) { - context = context instanceof jQuery ? context[ 0 ] : context; - - // Option to run scripts is true for back-compat - // Intentionally let the error be thrown if parseHTML is not present - jQuery.merge( this, jQuery.parseHTML( - match[ 1 ], - context && context.nodeType ? context.ownerDocument || context : document, - true - ) ); - - // HANDLE: $(html, props) - if ( rsingleTag.test( match[ 1 ] ) && jQuery.isPlainObject( context ) ) { - for ( match in context ) { - - // Properties of context are called as methods if possible - if ( isFunction( this[ match ] ) ) { - this[ match ]( context[ match ] ); - - // ...and otherwise set as attributes - } else { - this.attr( match, context[ match ] ); - } - } - } - - return this; - - // HANDLE: $(#id) - } else { - elem = document.getElementById( match[ 2 ] ); - - if ( elem ) { - - // Inject the element directly into the jQuery object - this[ 0 ] = elem; - this.length = 1; - } - return this; - } - - // HANDLE: $(expr, $(...)) - } else if ( !context || context.jquery ) { - return ( context || root ).find( selector ); - - // HANDLE: $(expr, context) - // (which is just equivalent to: $(context).find(expr) - } else { - return this.constructor( context ).find( selector ); - } - - // HANDLE: $(DOMElement) - } else if ( selector.nodeType ) { - this[ 0 ] = selector; - this.length = 1; - return this; - - // HANDLE: $(function) - // Shortcut for document ready - } else if ( isFunction( selector ) ) { - return root.ready !== undefined ? - root.ready( selector ) : - - // Execute immediately if ready is not present - selector( jQuery ); - } - - return jQuery.makeArray( selector, this ); - }; - -// Give the init function the jQuery prototype for later instantiation -init.prototype = jQuery.fn; - -// Initialize central reference -rootjQuery = jQuery( document ); - - -var rparentsprev = /^(?:parents|prev(?:Until|All))/, - - // Methods guaranteed to produce a unique set when starting from a unique set - guaranteedUnique = { - children: true, - contents: true, - next: true, - prev: true - }; - -jQuery.fn.extend( { - has: function( target ) { - var targets = jQuery( target, this ), - l = targets.length; - - return this.filter( function() { - var i = 0; - for ( ; i < l; i++ ) { - if ( jQuery.contains( this, targets[ i ] ) ) { - return true; - } - } - } ); - }, - - closest: function( selectors, context ) { - var cur, - i = 0, - l = this.length, - matched = [], - targets = typeof selectors !== "string" && jQuery( selectors ); - - // Positional selectors never match, since there's no _selection_ context - if ( !rneedsContext.test( selectors ) ) { - for ( ; i < l; i++ ) { - for ( cur = this[ i ]; cur && cur !== context; cur = cur.parentNode ) { - - // Always skip document fragments - if ( cur.nodeType < 11 && ( targets ? - targets.index( cur ) > -1 : - - // Don't pass non-elements to Sizzle - cur.nodeType === 1 && - jQuery.find.matchesSelector( cur, selectors ) ) ) { - - matched.push( cur ); - break; - } - } - } - } - - return this.pushStack( matched.length > 1 ? jQuery.uniqueSort( matched ) : matched ); - }, - - // Determine the position of an element within the set - index: function( elem ) { - - // No argument, return index in parent - if ( !elem ) { - return ( this[ 0 ] && this[ 0 ].parentNode ) ? this.first().prevAll().length : -1; - } - - // Index in selector - if ( typeof elem === "string" ) { - return indexOf.call( jQuery( elem ), this[ 0 ] ); - } - - // Locate the position of the desired element - return indexOf.call( this, - - // If it receives a jQuery object, the first element is used - elem.jquery ? elem[ 0 ] : elem - ); - }, - - add: function( selector, context ) { - return this.pushStack( - jQuery.uniqueSort( - jQuery.merge( this.get(), jQuery( selector, context ) ) - ) - ); - }, - - addBack: function( selector ) { - return this.add( selector == null ? - this.prevObject : this.prevObject.filter( selector ) - ); - } -} ); - -function sibling( cur, dir ) { - while ( ( cur = cur[ dir ] ) && cur.nodeType !== 1 ) {} - return cur; -} - -jQuery.each( { - parent: function( elem ) { - var parent = elem.parentNode; - return parent && parent.nodeType !== 11 ? parent : null; - }, - parents: function( elem ) { - return dir( elem, "parentNode" ); - }, - parentsUntil: function( elem, _i, until ) { - return dir( elem, "parentNode", until ); - }, - next: function( elem ) { - return sibling( elem, "nextSibling" ); - }, - prev: function( elem ) { - return sibling( elem, "previousSibling" ); - }, - nextAll: function( elem ) { - return dir( elem, "nextSibling" ); - }, - prevAll: function( elem ) { - return dir( elem, "previousSibling" ); - }, - nextUntil: function( elem, _i, until ) { - return dir( elem, "nextSibling", until ); - }, - prevUntil: function( elem, _i, until ) { - return dir( elem, "previousSibling", until ); - }, - siblings: function( elem ) { - return siblings( ( elem.parentNode || {} ).firstChild, elem ); - }, - children: function( elem ) { - return siblings( elem.firstChild ); - }, - contents: function( elem ) { - if ( elem.contentDocument != null && - - // Support: IE 11+ - // elements with no `data` attribute has an object - // `contentDocument` with a `null` prototype. - getProto( elem.contentDocument ) ) { - - return elem.contentDocument; - } - - // Support: IE 9 - 11 only, iOS 7 only, Android Browser <=4.3 only - // Treat the template element as a regular one in browsers that - // don't support it. - if ( nodeName( elem, "template" ) ) { - elem = elem.content || elem; - } - - return jQuery.merge( [], elem.childNodes ); - } -}, function( name, fn ) { - jQuery.fn[ name ] = function( until, selector ) { - var matched = jQuery.map( this, fn, until ); - - if ( name.slice( -5 ) !== "Until" ) { - selector = until; - } - - if ( selector && typeof selector === "string" ) { - matched = jQuery.filter( selector, matched ); - } - - if ( this.length > 1 ) { - - // Remove duplicates - if ( !guaranteedUnique[ name ] ) { - jQuery.uniqueSort( matched ); - } - - // Reverse order for parents* and prev-derivatives - if ( rparentsprev.test( name ) ) { - matched.reverse(); - } - } - - return this.pushStack( matched ); - }; -} ); -var rnothtmlwhite = ( /[^\x20\t\r\n\f]+/g ); - - - -// Convert String-formatted options into Object-formatted ones -function createOptions( options ) { - var object = {}; - jQuery.each( options.match( rnothtmlwhite ) || [], function( _, flag ) { - object[ flag ] = true; - } ); - return object; -} - -/* - * Create a callback list using the following parameters: - * - * options: an optional list of space-separated options that will change how - * the callback list behaves or a more traditional option object - * - * By default a callback list will act like an event callback list and can be - * "fired" multiple times. - * - * Possible options: - * - * once: will ensure the callback list can only be fired once (like a Deferred) - * - * memory: will keep track of previous values and will call any callback added - * after the list has been fired right away with the latest "memorized" - * values (like a Deferred) - * - * unique: will ensure a callback can only be added once (no duplicate in the list) - * - * stopOnFalse: interrupt callings when a callback returns false - * - */ -jQuery.Callbacks = function( options ) { - - // Convert options from String-formatted to Object-formatted if needed - // (we check in cache first) - options = typeof options === "string" ? - createOptions( options ) : - jQuery.extend( {}, options ); - - var // Flag to know if list is currently firing - firing, - - // Last fire value for non-forgettable lists - memory, - - // Flag to know if list was already fired - fired, - - // Flag to prevent firing - locked, - - // Actual callback list - list = [], - - // Queue of execution data for repeatable lists - queue = [], - - // Index of currently firing callback (modified by add/remove as needed) - firingIndex = -1, - - // Fire callbacks - fire = function() { - - // Enforce single-firing - locked = locked || options.once; - - // Execute callbacks for all pending executions, - // respecting firingIndex overrides and runtime changes - fired = firing = true; - for ( ; queue.length; firingIndex = -1 ) { - memory = queue.shift(); - while ( ++firingIndex < list.length ) { - - // Run callback and check for early termination - if ( list[ firingIndex ].apply( memory[ 0 ], memory[ 1 ] ) === false && - options.stopOnFalse ) { - - // Jump to end and forget the data so .add doesn't re-fire - firingIndex = list.length; - memory = false; - } - } - } - - // Forget the data if we're done with it - if ( !options.memory ) { - memory = false; - } - - firing = false; - - // Clean up if we're done firing for good - if ( locked ) { - - // Keep an empty list if we have data for future add calls - if ( memory ) { - list = []; - - // Otherwise, this object is spent - } else { - list = ""; - } - } - }, - - // Actual Callbacks object - self = { - - // Add a callback or a collection of callbacks to the list - add: function() { - if ( list ) { - - // If we have memory from a past run, we should fire after adding - if ( memory && !firing ) { - firingIndex = list.length - 1; - queue.push( memory ); - } - - ( function add( args ) { - jQuery.each( args, function( _, arg ) { - if ( isFunction( arg ) ) { - if ( !options.unique || !self.has( arg ) ) { - list.push( arg ); - } - } else if ( arg && arg.length && toType( arg ) !== "string" ) { - - // Inspect recursively - add( arg ); - } - } ); - } )( arguments ); - - if ( memory && !firing ) { - fire(); - } - } - return this; - }, - - // Remove a callback from the list - remove: function() { - jQuery.each( arguments, function( _, arg ) { - var index; - while ( ( index = jQuery.inArray( arg, list, index ) ) > -1 ) { - list.splice( index, 1 ); - - // Handle firing indexes - if ( index <= firingIndex ) { - firingIndex--; - } - } - } ); - return this; - }, - - // Check if a given callback is in the list. - // If no argument is given, return whether or not list has callbacks attached. - has: function( fn ) { - return fn ? - jQuery.inArray( fn, list ) > -1 : - list.length > 0; - }, - - // Remove all callbacks from the list - empty: function() { - if ( list ) { - list = []; - } - return this; - }, - - // Disable .fire and .add - // Abort any current/pending executions - // Clear all callbacks and values - disable: function() { - locked = queue = []; - list = memory = ""; - return this; - }, - disabled: function() { - return !list; - }, - - // Disable .fire - // Also disable .add unless we have memory (since it would have no effect) - // Abort any pending executions - lock: function() { - locked = queue = []; - if ( !memory && !firing ) { - list = memory = ""; - } - return this; - }, - locked: function() { - return !!locked; - }, - - // Call all callbacks with the given context and arguments - fireWith: function( context, args ) { - if ( !locked ) { - args = args || []; - args = [ context, args.slice ? args.slice() : args ]; - queue.push( args ); - if ( !firing ) { - fire(); - } - } - return this; - }, - - // Call all the callbacks with the given arguments - fire: function() { - self.fireWith( this, arguments ); - return this; - }, - - // To know if the callbacks have already been called at least once - fired: function() { - return !!fired; - } - }; - - return self; -}; - - -function Identity( v ) { - return v; -} -function Thrower( ex ) { - throw ex; -} - -function adoptValue( value, resolve, reject, noValue ) { - var method; - - try { - - // Check for promise aspect first to privilege synchronous behavior - if ( value && isFunction( ( method = value.promise ) ) ) { - method.call( value ).done( resolve ).fail( reject ); - - // Other thenables - } else if ( value && isFunction( ( method = value.then ) ) ) { - method.call( value, resolve, reject ); - - // Other non-thenables - } else { - - // Control `resolve` arguments by letting Array#slice cast boolean `noValue` to integer: - // * false: [ value ].slice( 0 ) => resolve( value ) - // * true: [ value ].slice( 1 ) => resolve() - resolve.apply( undefined, [ value ].slice( noValue ) ); - } - - // For Promises/A+, convert exceptions into rejections - // Since jQuery.when doesn't unwrap thenables, we can skip the extra checks appearing in - // Deferred#then to conditionally suppress rejection. - } catch ( value ) { - - // Support: Android 4.0 only - // Strict mode functions invoked without .call/.apply get global-object context - reject.apply( undefined, [ value ] ); - } -} - -jQuery.extend( { - - Deferred: function( func ) { - var tuples = [ - - // action, add listener, callbacks, - // ... .then handlers, argument index, [final state] - [ "notify", "progress", jQuery.Callbacks( "memory" ), - jQuery.Callbacks( "memory" ), 2 ], - [ "resolve", "done", jQuery.Callbacks( "once memory" ), - jQuery.Callbacks( "once memory" ), 0, "resolved" ], - [ "reject", "fail", jQuery.Callbacks( "once memory" ), - jQuery.Callbacks( "once memory" ), 1, "rejected" ] - ], - state = "pending", - promise = { - state: function() { - return state; - }, - always: function() { - deferred.done( arguments ).fail( arguments ); - return this; - }, - "catch": function( fn ) { - return promise.then( null, fn ); - }, - - // Keep pipe for back-compat - pipe: function( /* fnDone, fnFail, fnProgress */ ) { - var fns = arguments; - - return jQuery.Deferred( function( newDefer ) { - jQuery.each( tuples, function( _i, tuple ) { - - // Map tuples (progress, done, fail) to arguments (done, fail, progress) - var fn = isFunction( fns[ tuple[ 4 ] ] ) && fns[ tuple[ 4 ] ]; - - // deferred.progress(function() { bind to newDefer or newDefer.notify }) - // deferred.done(function() { bind to newDefer or newDefer.resolve }) - // deferred.fail(function() { bind to newDefer or newDefer.reject }) - deferred[ tuple[ 1 ] ]( function() { - var returned = fn && fn.apply( this, arguments ); - if ( returned && isFunction( returned.promise ) ) { - returned.promise() - .progress( newDefer.notify ) - .done( newDefer.resolve ) - .fail( newDefer.reject ); - } else { - newDefer[ tuple[ 0 ] + "With" ]( - this, - fn ? [ returned ] : arguments - ); - } - } ); - } ); - fns = null; - } ).promise(); - }, - then: function( onFulfilled, onRejected, onProgress ) { - var maxDepth = 0; - function resolve( depth, deferred, handler, special ) { - return function() { - var that = this, - args = arguments, - mightThrow = function() { - var returned, then; - - // Support: Promises/A+ section 2.3.3.3.3 - // https://promisesaplus.com/#point-59 - // Ignore double-resolution attempts - if ( depth < maxDepth ) { - return; - } - - returned = handler.apply( that, args ); - - // Support: Promises/A+ section 2.3.1 - // https://promisesaplus.com/#point-48 - if ( returned === deferred.promise() ) { - throw new TypeError( "Thenable self-resolution" ); - } - - // Support: Promises/A+ sections 2.3.3.1, 3.5 - // https://promisesaplus.com/#point-54 - // https://promisesaplus.com/#point-75 - // Retrieve `then` only once - then = returned && - - // Support: Promises/A+ section 2.3.4 - // https://promisesaplus.com/#point-64 - // Only check objects and functions for thenability - ( typeof returned === "object" || - typeof returned === "function" ) && - returned.then; - - // Handle a returned thenable - if ( isFunction( then ) ) { - - // Special processors (notify) just wait for resolution - if ( special ) { - then.call( - returned, - resolve( maxDepth, deferred, Identity, special ), - resolve( maxDepth, deferred, Thrower, special ) - ); - - // Normal processors (resolve) also hook into progress - } else { - - // ...and disregard older resolution values - maxDepth++; - - then.call( - returned, - resolve( maxDepth, deferred, Identity, special ), - resolve( maxDepth, deferred, Thrower, special ), - resolve( maxDepth, deferred, Identity, - deferred.notifyWith ) - ); - } - - // Handle all other returned values - } else { - - // Only substitute handlers pass on context - // and multiple values (non-spec behavior) - if ( handler !== Identity ) { - that = undefined; - args = [ returned ]; - } - - // Process the value(s) - // Default process is resolve - ( special || deferred.resolveWith )( that, args ); - } - }, - - // Only normal processors (resolve) catch and reject exceptions - process = special ? - mightThrow : - function() { - try { - mightThrow(); - } catch ( e ) { - - if ( jQuery.Deferred.exceptionHook ) { - jQuery.Deferred.exceptionHook( e, - process.stackTrace ); - } - - // Support: Promises/A+ section 2.3.3.3.4.1 - // https://promisesaplus.com/#point-61 - // Ignore post-resolution exceptions - if ( depth + 1 >= maxDepth ) { - - // Only substitute handlers pass on context - // and multiple values (non-spec behavior) - if ( handler !== Thrower ) { - that = undefined; - args = [ e ]; - } - - deferred.rejectWith( that, args ); - } - } - }; - - // Support: Promises/A+ section 2.3.3.3.1 - // https://promisesaplus.com/#point-57 - // Re-resolve promises immediately to dodge false rejection from - // subsequent errors - if ( depth ) { - process(); - } else { - - // Call an optional hook to record the stack, in case of exception - // since it's otherwise lost when execution goes async - if ( jQuery.Deferred.getStackHook ) { - process.stackTrace = jQuery.Deferred.getStackHook(); - } - window.setTimeout( process ); - } - }; - } - - return jQuery.Deferred( function( newDefer ) { - - // progress_handlers.add( ... ) - tuples[ 0 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onProgress ) ? - onProgress : - Identity, - newDefer.notifyWith - ) - ); - - // fulfilled_handlers.add( ... ) - tuples[ 1 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onFulfilled ) ? - onFulfilled : - Identity - ) - ); - - // rejected_handlers.add( ... ) - tuples[ 2 ][ 3 ].add( - resolve( - 0, - newDefer, - isFunction( onRejected ) ? - onRejected : - Thrower - ) - ); - } ).promise(); - }, - - // Get a promise for this deferred - // If obj is provided, the promise aspect is added to the object - promise: function( obj ) { - return obj != null ? jQuery.extend( obj, promise ) : promise; - } - }, - deferred = {}; - - // Add list-specific methods - jQuery.each( tuples, function( i, tuple ) { - var list = tuple[ 2 ], - stateString = tuple[ 5 ]; - - // promise.progress = list.add - // promise.done = list.add - // promise.fail = list.add - promise[ tuple[ 1 ] ] = list.add; - - // Handle state - if ( stateString ) { - list.add( - function() { - - // state = "resolved" (i.e., fulfilled) - // state = "rejected" - state = stateString; - }, - - // rejected_callbacks.disable - // fulfilled_callbacks.disable - tuples[ 3 - i ][ 2 ].disable, - - // rejected_handlers.disable - // fulfilled_handlers.disable - tuples[ 3 - i ][ 3 ].disable, - - // progress_callbacks.lock - tuples[ 0 ][ 2 ].lock, - - // progress_handlers.lock - tuples[ 0 ][ 3 ].lock - ); - } - - // progress_handlers.fire - // fulfilled_handlers.fire - // rejected_handlers.fire - list.add( tuple[ 3 ].fire ); - - // deferred.notify = function() { deferred.notifyWith(...) } - // deferred.resolve = function() { deferred.resolveWith(...) } - // deferred.reject = function() { deferred.rejectWith(...) } - deferred[ tuple[ 0 ] ] = function() { - deferred[ tuple[ 0 ] + "With" ]( this === deferred ? undefined : this, arguments ); - return this; - }; - - // deferred.notifyWith = list.fireWith - // deferred.resolveWith = list.fireWith - // deferred.rejectWith = list.fireWith - deferred[ tuple[ 0 ] + "With" ] = list.fireWith; - } ); - - // Make the deferred a promise - promise.promise( deferred ); - - // Call given func if any - if ( func ) { - func.call( deferred, deferred ); - } - - // All done! - return deferred; - }, - - // Deferred helper - when: function( singleValue ) { - var - - // count of uncompleted subordinates - remaining = arguments.length, - - // count of unprocessed arguments - i = remaining, - - // subordinate fulfillment data - resolveContexts = Array( i ), - resolveValues = slice.call( arguments ), - - // the master Deferred - master = jQuery.Deferred(), - - // subordinate callback factory - updateFunc = function( i ) { - return function( value ) { - resolveContexts[ i ] = this; - resolveValues[ i ] = arguments.length > 1 ? slice.call( arguments ) : value; - if ( !( --remaining ) ) { - master.resolveWith( resolveContexts, resolveValues ); - } - }; - }; - - // Single- and empty arguments are adopted like Promise.resolve - if ( remaining <= 1 ) { - adoptValue( singleValue, master.done( updateFunc( i ) ).resolve, master.reject, - !remaining ); - - // Use .then() to unwrap secondary thenables (cf. gh-3000) - if ( master.state() === "pending" || - isFunction( resolveValues[ i ] && resolveValues[ i ].then ) ) { - - return master.then(); - } - } - - // Multiple arguments are aggregated like Promise.all array elements - while ( i-- ) { - adoptValue( resolveValues[ i ], updateFunc( i ), master.reject ); - } - - return master.promise(); - } -} ); - - -// These usually indicate a programmer mistake during development, -// warn about them ASAP rather than swallowing them by default. -var rerrorNames = /^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/; - -jQuery.Deferred.exceptionHook = function( error, stack ) { - - // Support: IE 8 - 9 only - // Console exists when dev tools are open, which can happen at any time - if ( window.console && window.console.warn && error && rerrorNames.test( error.name ) ) { - window.console.warn( "jQuery.Deferred exception: " + error.message, error.stack, stack ); - } -}; - - - - -jQuery.readyException = function( error ) { - window.setTimeout( function() { - throw error; - } ); -}; - - - - -// The deferred used on DOM ready -var readyList = jQuery.Deferred(); - -jQuery.fn.ready = function( fn ) { - - readyList - .then( fn ) - - // Wrap jQuery.readyException in a function so that the lookup - // happens at the time of error handling instead of callback - // registration. - .catch( function( error ) { - jQuery.readyException( error ); - } ); - - return this; -}; - -jQuery.extend( { - - // Is the DOM ready to be used? Set to true once it occurs. - isReady: false, - - // A counter to track how many items to wait for before - // the ready event fires. See #6781 - readyWait: 1, - - // Handle when the DOM is ready - ready: function( wait ) { - - // Abort if there are pending holds or we're already ready - if ( wait === true ? --jQuery.readyWait : jQuery.isReady ) { - return; - } - - // Remember that the DOM is ready - jQuery.isReady = true; - - // If a normal DOM Ready event fired, decrement, and wait if need be - if ( wait !== true && --jQuery.readyWait > 0 ) { - return; - } - - // If there are functions bound, to execute - readyList.resolveWith( document, [ jQuery ] ); - } -} ); - -jQuery.ready.then = readyList.then; - -// The ready event handler and self cleanup method -function completed() { - document.removeEventListener( "DOMContentLoaded", completed ); - window.removeEventListener( "load", completed ); - jQuery.ready(); -} - -// Catch cases where $(document).ready() is called -// after the browser event has already occurred. -// Support: IE <=9 - 10 only -// Older IE sometimes signals "interactive" too soon -if ( document.readyState === "complete" || - ( document.readyState !== "loading" && !document.documentElement.doScroll ) ) { - - // Handle it asynchronously to allow scripts the opportunity to delay ready - window.setTimeout( jQuery.ready ); - -} else { - - // Use the handy event callback - document.addEventListener( "DOMContentLoaded", completed ); - - // A fallback to window.onload, that will always work - window.addEventListener( "load", completed ); -} - - - - -// Multifunctional method to get and set values of a collection -// The value/s can optionally be executed if it's a function -var access = function( elems, fn, key, value, chainable, emptyGet, raw ) { - var i = 0, - len = elems.length, - bulk = key == null; - - // Sets many values - if ( toType( key ) === "object" ) { - chainable = true; - for ( i in key ) { - access( elems, fn, i, key[ i ], true, emptyGet, raw ); - } - - // Sets one value - } else if ( value !== undefined ) { - chainable = true; - - if ( !isFunction( value ) ) { - raw = true; - } - - if ( bulk ) { - - // Bulk operations run against the entire set - if ( raw ) { - fn.call( elems, value ); - fn = null; - - // ...except when executing function values - } else { - bulk = fn; - fn = function( elem, _key, value ) { - return bulk.call( jQuery( elem ), value ); - }; - } - } - - if ( fn ) { - for ( ; i < len; i++ ) { - fn( - elems[ i ], key, raw ? - value : - value.call( elems[ i ], i, fn( elems[ i ], key ) ) - ); - } - } - } - - if ( chainable ) { - return elems; - } - - // Gets - if ( bulk ) { - return fn.call( elems ); - } - - return len ? fn( elems[ 0 ], key ) : emptyGet; -}; - - -// Matches dashed string for camelizing -var rmsPrefix = /^-ms-/, - rdashAlpha = /-([a-z])/g; - -// Used by camelCase as callback to replace() -function fcamelCase( _all, letter ) { - return letter.toUpperCase(); -} - -// Convert dashed to camelCase; used by the css and data modules -// Support: IE <=9 - 11, Edge 12 - 15 -// Microsoft forgot to hump their vendor prefix (#9572) -function camelCase( string ) { - return string.replace( rmsPrefix, "ms-" ).replace( rdashAlpha, fcamelCase ); -} -var acceptData = function( owner ) { - - // Accepts only: - // - Node - // - Node.ELEMENT_NODE - // - Node.DOCUMENT_NODE - // - Object - // - Any - return owner.nodeType === 1 || owner.nodeType === 9 || !( +owner.nodeType ); -}; - - - - -function Data() { - this.expando = jQuery.expando + Data.uid++; -} - -Data.uid = 1; - -Data.prototype = { - - cache: function( owner ) { - - // Check if the owner object already has a cache - var value = owner[ this.expando ]; - - // If not, create one - if ( !value ) { - value = {}; - - // We can accept data for non-element nodes in modern browsers, - // but we should not, see #8335. - // Always return an empty object. - if ( acceptData( owner ) ) { - - // If it is a node unlikely to be stringify-ed or looped over - // use plain assignment - if ( owner.nodeType ) { - owner[ this.expando ] = value; - - // Otherwise secure it in a non-enumerable property - // configurable must be true to allow the property to be - // deleted when data is removed - } else { - Object.defineProperty( owner, this.expando, { - value: value, - configurable: true - } ); - } - } - } - - return value; - }, - set: function( owner, data, value ) { - var prop, - cache = this.cache( owner ); - - // Handle: [ owner, key, value ] args - // Always use camelCase key (gh-2257) - if ( typeof data === "string" ) { - cache[ camelCase( data ) ] = value; - - // Handle: [ owner, { properties } ] args - } else { - - // Copy the properties one-by-one to the cache object - for ( prop in data ) { - cache[ camelCase( prop ) ] = data[ prop ]; - } - } - return cache; - }, - get: function( owner, key ) { - return key === undefined ? - this.cache( owner ) : - - // Always use camelCase key (gh-2257) - owner[ this.expando ] && owner[ this.expando ][ camelCase( key ) ]; - }, - access: function( owner, key, value ) { - - // In cases where either: - // - // 1. No key was specified - // 2. A string key was specified, but no value provided - // - // Take the "read" path and allow the get method to determine - // which value to return, respectively either: - // - // 1. The entire cache object - // 2. The data stored at the key - // - if ( key === undefined || - ( ( key && typeof key === "string" ) && value === undefined ) ) { - - return this.get( owner, key ); - } - - // When the key is not a string, or both a key and value - // are specified, set or extend (existing objects) with either: - // - // 1. An object of properties - // 2. A key and value - // - this.set( owner, key, value ); - - // Since the "set" path can have two possible entry points - // return the expected data based on which path was taken[*] - return value !== undefined ? value : key; - }, - remove: function( owner, key ) { - var i, - cache = owner[ this.expando ]; - - if ( cache === undefined ) { - return; - } - - if ( key !== undefined ) { - - // Support array or space separated string of keys - if ( Array.isArray( key ) ) { - - // If key is an array of keys... - // We always set camelCase keys, so remove that. - key = key.map( camelCase ); - } else { - key = camelCase( key ); - - // If a key with the spaces exists, use it. - // Otherwise, create an array by matching non-whitespace - key = key in cache ? - [ key ] : - ( key.match( rnothtmlwhite ) || [] ); - } - - i = key.length; - - while ( i-- ) { - delete cache[ key[ i ] ]; - } - } - - // Remove the expando if there's no more data - if ( key === undefined || jQuery.isEmptyObject( cache ) ) { - - // Support: Chrome <=35 - 45 - // Webkit & Blink performance suffers when deleting properties - // from DOM nodes, so set to undefined instead - // https://bugs.chromium.org/p/chromium/issues/detail?id=378607 (bug restricted) - if ( owner.nodeType ) { - owner[ this.expando ] = undefined; - } else { - delete owner[ this.expando ]; - } - } - }, - hasData: function( owner ) { - var cache = owner[ this.expando ]; - return cache !== undefined && !jQuery.isEmptyObject( cache ); - } -}; -var dataPriv = new Data(); - -var dataUser = new Data(); - - - -// Implementation Summary -// -// 1. Enforce API surface and semantic compatibility with 1.9.x branch -// 2. Improve the module's maintainability by reducing the storage -// paths to a single mechanism. -// 3. Use the same single mechanism to support "private" and "user" data. -// 4. _Never_ expose "private" data to user code (TODO: Drop _data, _removeData) -// 5. Avoid exposing implementation details on user objects (eg. expando properties) -// 6. Provide a clear path for implementation upgrade to WeakMap in 2014 - -var rbrace = /^(?:\{[\w\W]*\}|\[[\w\W]*\])$/, - rmultiDash = /[A-Z]/g; - -function getData( data ) { - if ( data === "true" ) { - return true; - } - - if ( data === "false" ) { - return false; - } - - if ( data === "null" ) { - return null; - } - - // Only convert to a number if it doesn't change the string - if ( data === +data + "" ) { - return +data; - } - - if ( rbrace.test( data ) ) { - return JSON.parse( data ); - } - - return data; -} - -function dataAttr( elem, key, data ) { - var name; - - // If nothing was found internally, try to fetch any - // data from the HTML5 data-* attribute - if ( data === undefined && elem.nodeType === 1 ) { - name = "data-" + key.replace( rmultiDash, "-$&" ).toLowerCase(); - data = elem.getAttribute( name ); - - if ( typeof data === "string" ) { - try { - data = getData( data ); - } catch ( e ) {} - - // Make sure we set the data so it isn't changed later - dataUser.set( elem, key, data ); - } else { - data = undefined; - } - } - return data; -} - -jQuery.extend( { - hasData: function( elem ) { - return dataUser.hasData( elem ) || dataPriv.hasData( elem ); - }, - - data: function( elem, name, data ) { - return dataUser.access( elem, name, data ); - }, - - removeData: function( elem, name ) { - dataUser.remove( elem, name ); - }, - - // TODO: Now that all calls to _data and _removeData have been replaced - // with direct calls to dataPriv methods, these can be deprecated. - _data: function( elem, name, data ) { - return dataPriv.access( elem, name, data ); - }, - - _removeData: function( elem, name ) { - dataPriv.remove( elem, name ); - } -} ); - -jQuery.fn.extend( { - data: function( key, value ) { - var i, name, data, - elem = this[ 0 ], - attrs = elem && elem.attributes; - - // Gets all values - if ( key === undefined ) { - if ( this.length ) { - data = dataUser.get( elem ); - - if ( elem.nodeType === 1 && !dataPriv.get( elem, "hasDataAttrs" ) ) { - i = attrs.length; - while ( i-- ) { - - // Support: IE 11 only - // The attrs elements can be null (#14894) - if ( attrs[ i ] ) { - name = attrs[ i ].name; - if ( name.indexOf( "data-" ) === 0 ) { - name = camelCase( name.slice( 5 ) ); - dataAttr( elem, name, data[ name ] ); - } - } - } - dataPriv.set( elem, "hasDataAttrs", true ); - } - } - - return data; - } - - // Sets multiple values - if ( typeof key === "object" ) { - return this.each( function() { - dataUser.set( this, key ); - } ); - } - - return access( this, function( value ) { - var data; - - // The calling jQuery object (element matches) is not empty - // (and therefore has an element appears at this[ 0 ]) and the - // `value` parameter was not undefined. An empty jQuery object - // will result in `undefined` for elem = this[ 0 ] which will - // throw an exception if an attempt to read a data cache is made. - if ( elem && value === undefined ) { - - // Attempt to get data from the cache - // The key will always be camelCased in Data - data = dataUser.get( elem, key ); - if ( data !== undefined ) { - return data; - } - - // Attempt to "discover" the data in - // HTML5 custom data-* attrs - data = dataAttr( elem, key ); - if ( data !== undefined ) { - return data; - } - - // We tried really hard, but the data doesn't exist. - return; - } - - // Set the data... - this.each( function() { - - // We always store the camelCased key - dataUser.set( this, key, value ); - } ); - }, null, value, arguments.length > 1, null, true ); - }, - - removeData: function( key ) { - return this.each( function() { - dataUser.remove( this, key ); - } ); - } -} ); - - -jQuery.extend( { - queue: function( elem, type, data ) { - var queue; - - if ( elem ) { - type = ( type || "fx" ) + "queue"; - queue = dataPriv.get( elem, type ); - - // Speed up dequeue by getting out quickly if this is just a lookup - if ( data ) { - if ( !queue || Array.isArray( data ) ) { - queue = dataPriv.access( elem, type, jQuery.makeArray( data ) ); - } else { - queue.push( data ); - } - } - return queue || []; - } - }, - - dequeue: function( elem, type ) { - type = type || "fx"; - - var queue = jQuery.queue( elem, type ), - startLength = queue.length, - fn = queue.shift(), - hooks = jQuery._queueHooks( elem, type ), - next = function() { - jQuery.dequeue( elem, type ); - }; - - // If the fx queue is dequeued, always remove the progress sentinel - if ( fn === "inprogress" ) { - fn = queue.shift(); - startLength--; - } - - if ( fn ) { - - // Add a progress sentinel to prevent the fx queue from being - // automatically dequeued - if ( type === "fx" ) { - queue.unshift( "inprogress" ); - } - - // Clear up the last queue stop function - delete hooks.stop; - fn.call( elem, next, hooks ); - } - - if ( !startLength && hooks ) { - hooks.empty.fire(); - } - }, - - // Not public - generate a queueHooks object, or return the current one - _queueHooks: function( elem, type ) { - var key = type + "queueHooks"; - return dataPriv.get( elem, key ) || dataPriv.access( elem, key, { - empty: jQuery.Callbacks( "once memory" ).add( function() { - dataPriv.remove( elem, [ type + "queue", key ] ); - } ) - } ); - } -} ); - -jQuery.fn.extend( { - queue: function( type, data ) { - var setter = 2; - - if ( typeof type !== "string" ) { - data = type; - type = "fx"; - setter--; - } - - if ( arguments.length < setter ) { - return jQuery.queue( this[ 0 ], type ); - } - - return data === undefined ? - this : - this.each( function() { - var queue = jQuery.queue( this, type, data ); - - // Ensure a hooks for this queue - jQuery._queueHooks( this, type ); - - if ( type === "fx" && queue[ 0 ] !== "inprogress" ) { - jQuery.dequeue( this, type ); - } - } ); - }, - dequeue: function( type ) { - return this.each( function() { - jQuery.dequeue( this, type ); - } ); - }, - clearQueue: function( type ) { - return this.queue( type || "fx", [] ); - }, - - // Get a promise resolved when queues of a certain type - // are emptied (fx is the type by default) - promise: function( type, obj ) { - var tmp, - count = 1, - defer = jQuery.Deferred(), - elements = this, - i = this.length, - resolve = function() { - if ( !( --count ) ) { - defer.resolveWith( elements, [ elements ] ); - } - }; - - if ( typeof type !== "string" ) { - obj = type; - type = undefined; - } - type = type || "fx"; - - while ( i-- ) { - tmp = dataPriv.get( elements[ i ], type + "queueHooks" ); - if ( tmp && tmp.empty ) { - count++; - tmp.empty.add( resolve ); - } - } - resolve(); - return defer.promise( obj ); - } -} ); -var pnum = ( /[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/ ).source; - -var rcssNum = new RegExp( "^(?:([+-])=|)(" + pnum + ")([a-z%]*)$", "i" ); - - -var cssExpand = [ "Top", "Right", "Bottom", "Left" ]; - -var documentElement = document.documentElement; - - - - var isAttached = function( elem ) { - return jQuery.contains( elem.ownerDocument, elem ); - }, - composed = { composed: true }; - - // Support: IE 9 - 11+, Edge 12 - 18+, iOS 10.0 - 10.2 only - // Check attachment across shadow DOM boundaries when possible (gh-3504) - // Support: iOS 10.0-10.2 only - // Early iOS 10 versions support `attachShadow` but not `getRootNode`, - // leading to errors. We need to check for `getRootNode`. - if ( documentElement.getRootNode ) { - isAttached = function( elem ) { - return jQuery.contains( elem.ownerDocument, elem ) || - elem.getRootNode( composed ) === elem.ownerDocument; - }; - } -var isHiddenWithinTree = function( elem, el ) { - - // isHiddenWithinTree might be called from jQuery#filter function; - // in that case, element will be second argument - elem = el || elem; - - // Inline style trumps all - return elem.style.display === "none" || - elem.style.display === "" && - - // Otherwise, check computed style - // Support: Firefox <=43 - 45 - // Disconnected elements can have computed display: none, so first confirm that elem is - // in the document. - isAttached( elem ) && - - jQuery.css( elem, "display" ) === "none"; - }; - - - -function adjustCSS( elem, prop, valueParts, tween ) { - var adjusted, scale, - maxIterations = 20, - currentValue = tween ? - function() { - return tween.cur(); - } : - function() { - return jQuery.css( elem, prop, "" ); - }, - initial = currentValue(), - unit = valueParts && valueParts[ 3 ] || ( jQuery.cssNumber[ prop ] ? "" : "px" ), - - // Starting value computation is required for potential unit mismatches - initialInUnit = elem.nodeType && - ( jQuery.cssNumber[ prop ] || unit !== "px" && +initial ) && - rcssNum.exec( jQuery.css( elem, prop ) ); - - if ( initialInUnit && initialInUnit[ 3 ] !== unit ) { - - // Support: Firefox <=54 - // Halve the iteration target value to prevent interference from CSS upper bounds (gh-2144) - initial = initial / 2; - - // Trust units reported by jQuery.css - unit = unit || initialInUnit[ 3 ]; - - // Iteratively approximate from a nonzero starting point - initialInUnit = +initial || 1; - - while ( maxIterations-- ) { - - // Evaluate and update our best guess (doubling guesses that zero out). - // Finish if the scale equals or crosses 1 (making the old*new product non-positive). - jQuery.style( elem, prop, initialInUnit + unit ); - if ( ( 1 - scale ) * ( 1 - ( scale = currentValue() / initial || 0.5 ) ) <= 0 ) { - maxIterations = 0; - } - initialInUnit = initialInUnit / scale; - - } - - initialInUnit = initialInUnit * 2; - jQuery.style( elem, prop, initialInUnit + unit ); - - // Make sure we update the tween properties later on - valueParts = valueParts || []; - } - - if ( valueParts ) { - initialInUnit = +initialInUnit || +initial || 0; - - // Apply relative offset (+=/-=) if specified - adjusted = valueParts[ 1 ] ? - initialInUnit + ( valueParts[ 1 ] + 1 ) * valueParts[ 2 ] : - +valueParts[ 2 ]; - if ( tween ) { - tween.unit = unit; - tween.start = initialInUnit; - tween.end = adjusted; - } - } - return adjusted; -} - - -var defaultDisplayMap = {}; - -function getDefaultDisplay( elem ) { - var temp, - doc = elem.ownerDocument, - nodeName = elem.nodeName, - display = defaultDisplayMap[ nodeName ]; - - if ( display ) { - return display; - } - - temp = doc.body.appendChild( doc.createElement( nodeName ) ); - display = jQuery.css( temp, "display" ); - - temp.parentNode.removeChild( temp ); - - if ( display === "none" ) { - display = "block"; - } - defaultDisplayMap[ nodeName ] = display; - - return display; -} - -function showHide( elements, show ) { - var display, elem, - values = [], - index = 0, - length = elements.length; - - // Determine new display value for elements that need to change - for ( ; index < length; index++ ) { - elem = elements[ index ]; - if ( !elem.style ) { - continue; - } - - display = elem.style.display; - if ( show ) { - - // Since we force visibility upon cascade-hidden elements, an immediate (and slow) - // check is required in this first loop unless we have a nonempty display value (either - // inline or about-to-be-restored) - if ( display === "none" ) { - values[ index ] = dataPriv.get( elem, "display" ) || null; - if ( !values[ index ] ) { - elem.style.display = ""; - } - } - if ( elem.style.display === "" && isHiddenWithinTree( elem ) ) { - values[ index ] = getDefaultDisplay( elem ); - } - } else { - if ( display !== "none" ) { - values[ index ] = "none"; - - // Remember what we're overwriting - dataPriv.set( elem, "display", display ); - } - } - } - - // Set the display of the elements in a second loop to avoid constant reflow - for ( index = 0; index < length; index++ ) { - if ( values[ index ] != null ) { - elements[ index ].style.display = values[ index ]; - } - } - - return elements; -} - -jQuery.fn.extend( { - show: function() { - return showHide( this, true ); - }, - hide: function() { - return showHide( this ); - }, - toggle: function( state ) { - if ( typeof state === "boolean" ) { - return state ? this.show() : this.hide(); - } - - return this.each( function() { - if ( isHiddenWithinTree( this ) ) { - jQuery( this ).show(); - } else { - jQuery( this ).hide(); - } - } ); - } -} ); -var rcheckableType = ( /^(?:checkbox|radio)$/i ); - -var rtagName = ( /<([a-z][^\/\0>\x20\t\r\n\f]*)/i ); - -var rscriptType = ( /^$|^module$|\/(?:java|ecma)script/i ); - - - -( function() { - var fragment = document.createDocumentFragment(), - div = fragment.appendChild( document.createElement( "div" ) ), - input = document.createElement( "input" ); - - // Support: Android 4.0 - 4.3 only - // Check state lost if the name is set (#11217) - // Support: Windows Web Apps (WWA) - // `name` and `type` must use .setAttribute for WWA (#14901) - input.setAttribute( "type", "radio" ); - input.setAttribute( "checked", "checked" ); - input.setAttribute( "name", "t" ); - - div.appendChild( input ); - - // Support: Android <=4.1 only - // Older WebKit doesn't clone checked state correctly in fragments - support.checkClone = div.cloneNode( true ).cloneNode( true ).lastChild.checked; - - // Support: IE <=11 only - // Make sure textarea (and checkbox) defaultValue is properly cloned - div.innerHTML = ""; - support.noCloneChecked = !!div.cloneNode( true ).lastChild.defaultValue; - - // Support: IE <=9 only - // IE <=9 replaces "; - support.option = !!div.lastChild; -} )(); - - -// We have to close these tags to support XHTML (#13200) -var wrapMap = { - - // XHTML parsers do not magically insert elements in the - // same way that tag soup parsers do. So we cannot shorten - // this by omitting or other required elements. - thead: [ 1, "", "
" ], - col: [ 2, "", "
" ], - tr: [ 2, "", "
" ], - td: [ 3, "", "
" ], - - _default: [ 0, "", "" ] -}; - -wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead; -wrapMap.th = wrapMap.td; - -// Support: IE <=9 only -if ( !support.option ) { - wrapMap.optgroup = wrapMap.option = [ 1, "" ]; -} - - -function getAll( context, tag ) { - - // Support: IE <=9 - 11 only - // Use typeof to avoid zero-argument method invocation on host objects (#15151) - var ret; - - if ( typeof context.getElementsByTagName !== "undefined" ) { - ret = context.getElementsByTagName( tag || "*" ); - - } else if ( typeof context.querySelectorAll !== "undefined" ) { - ret = context.querySelectorAll( tag || "*" ); - - } else { - ret = []; - } - - if ( tag === undefined || tag && nodeName( context, tag ) ) { - return jQuery.merge( [ context ], ret ); - } - - return ret; -} - - -// Mark scripts as having already been evaluated -function setGlobalEval( elems, refElements ) { - var i = 0, - l = elems.length; - - for ( ; i < l; i++ ) { - dataPriv.set( - elems[ i ], - "globalEval", - !refElements || dataPriv.get( refElements[ i ], "globalEval" ) - ); - } -} - - -var rhtml = /<|&#?\w+;/; - -function buildFragment( elems, context, scripts, selection, ignored ) { - var elem, tmp, tag, wrap, attached, j, - fragment = context.createDocumentFragment(), - nodes = [], - i = 0, - l = elems.length; - - for ( ; i < l; i++ ) { - elem = elems[ i ]; - - if ( elem || elem === 0 ) { - - // Add nodes directly - if ( toType( elem ) === "object" ) { - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( nodes, elem.nodeType ? [ elem ] : elem ); - - // Convert non-html into a text node - } else if ( !rhtml.test( elem ) ) { - nodes.push( context.createTextNode( elem ) ); - - // Convert html into DOM nodes - } else { - tmp = tmp || fragment.appendChild( context.createElement( "div" ) ); - - // Deserialize a standard representation - tag = ( rtagName.exec( elem ) || [ "", "" ] )[ 1 ].toLowerCase(); - wrap = wrapMap[ tag ] || wrapMap._default; - tmp.innerHTML = wrap[ 1 ] + jQuery.htmlPrefilter( elem ) + wrap[ 2 ]; - - // Descend through wrappers to the right content - j = wrap[ 0 ]; - while ( j-- ) { - tmp = tmp.lastChild; - } - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( nodes, tmp.childNodes ); - - // Remember the top-level container - tmp = fragment.firstChild; - - // Ensure the created nodes are orphaned (#12392) - tmp.textContent = ""; - } - } - } - - // Remove wrapper from fragment - fragment.textContent = ""; - - i = 0; - while ( ( elem = nodes[ i++ ] ) ) { - - // Skip elements already in the context collection (trac-4087) - if ( selection && jQuery.inArray( elem, selection ) > -1 ) { - if ( ignored ) { - ignored.push( elem ); - } - continue; - } - - attached = isAttached( elem ); - - // Append to fragment - tmp = getAll( fragment.appendChild( elem ), "script" ); - - // Preserve script evaluation history - if ( attached ) { - setGlobalEval( tmp ); - } - - // Capture executables - if ( scripts ) { - j = 0; - while ( ( elem = tmp[ j++ ] ) ) { - if ( rscriptType.test( elem.type || "" ) ) { - scripts.push( elem ); - } - } - } - } - - return fragment; -} - - -var - rkeyEvent = /^key/, - rmouseEvent = /^(?:mouse|pointer|contextmenu|drag|drop)|click/, - rtypenamespace = /^([^.]*)(?:\.(.+)|)/; - -function returnTrue() { - return true; -} - -function returnFalse() { - return false; -} - -// Support: IE <=9 - 11+ -// focus() and blur() are asynchronous, except when they are no-op. -// So expect focus to be synchronous when the element is already active, -// and blur to be synchronous when the element is not already active. -// (focus and blur are always synchronous in other supported browsers, -// this just defines when we can count on it). -function expectSync( elem, type ) { - return ( elem === safeActiveElement() ) === ( type === "focus" ); -} - -// Support: IE <=9 only -// Accessing document.activeElement can throw unexpectedly -// https://bugs.jquery.com/ticket/13393 -function safeActiveElement() { - try { - return document.activeElement; - } catch ( err ) { } -} - -function on( elem, types, selector, data, fn, one ) { - var origFn, type; - - // Types can be a map of types/handlers - if ( typeof types === "object" ) { - - // ( types-Object, selector, data ) - if ( typeof selector !== "string" ) { - - // ( types-Object, data ) - data = data || selector; - selector = undefined; - } - for ( type in types ) { - on( elem, type, selector, data, types[ type ], one ); - } - return elem; - } - - if ( data == null && fn == null ) { - - // ( types, fn ) - fn = selector; - data = selector = undefined; - } else if ( fn == null ) { - if ( typeof selector === "string" ) { - - // ( types, selector, fn ) - fn = data; - data = undefined; - } else { - - // ( types, data, fn ) - fn = data; - data = selector; - selector = undefined; - } - } - if ( fn === false ) { - fn = returnFalse; - } else if ( !fn ) { - return elem; - } - - if ( one === 1 ) { - origFn = fn; - fn = function( event ) { - - // Can use an empty set, since event contains the info - jQuery().off( event ); - return origFn.apply( this, arguments ); - }; - - // Use same guid so caller can remove using origFn - fn.guid = origFn.guid || ( origFn.guid = jQuery.guid++ ); - } - return elem.each( function() { - jQuery.event.add( this, types, fn, data, selector ); - } ); -} - -/* - * Helper functions for managing events -- not part of the public interface. - * Props to Dean Edwards' addEvent library for many of the ideas. - */ -jQuery.event = { - - global: {}, - - add: function( elem, types, handler, data, selector ) { - - var handleObjIn, eventHandle, tmp, - events, t, handleObj, - special, handlers, type, namespaces, origType, - elemData = dataPriv.get( elem ); - - // Only attach events to objects that accept data - if ( !acceptData( elem ) ) { - return; - } - - // Caller can pass in an object of custom data in lieu of the handler - if ( handler.handler ) { - handleObjIn = handler; - handler = handleObjIn.handler; - selector = handleObjIn.selector; - } - - // Ensure that invalid selectors throw exceptions at attach time - // Evaluate against documentElement in case elem is a non-element node (e.g., document) - if ( selector ) { - jQuery.find.matchesSelector( documentElement, selector ); - } - - // Make sure that the handler has a unique ID, used to find/remove it later - if ( !handler.guid ) { - handler.guid = jQuery.guid++; - } - - // Init the element's event structure and main handler, if this is the first - if ( !( events = elemData.events ) ) { - events = elemData.events = Object.create( null ); - } - if ( !( eventHandle = elemData.handle ) ) { - eventHandle = elemData.handle = function( e ) { - - // Discard the second event of a jQuery.event.trigger() and - // when an event is called after a page has unloaded - return typeof jQuery !== "undefined" && jQuery.event.triggered !== e.type ? - jQuery.event.dispatch.apply( elem, arguments ) : undefined; - }; - } - - // Handle multiple events separated by a space - types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; - t = types.length; - while ( t-- ) { - tmp = rtypenamespace.exec( types[ t ] ) || []; - type = origType = tmp[ 1 ]; - namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); - - // There *must* be a type, no attaching namespace-only handlers - if ( !type ) { - continue; - } - - // If event changes its type, use the special event handlers for the changed type - special = jQuery.event.special[ type ] || {}; - - // If selector defined, determine special event api type, otherwise given type - type = ( selector ? special.delegateType : special.bindType ) || type; - - // Update special based on newly reset type - special = jQuery.event.special[ type ] || {}; - - // handleObj is passed to all event handlers - handleObj = jQuery.extend( { - type: type, - origType: origType, - data: data, - handler: handler, - guid: handler.guid, - selector: selector, - needsContext: selector && jQuery.expr.match.needsContext.test( selector ), - namespace: namespaces.join( "." ) - }, handleObjIn ); - - // Init the event handler queue if we're the first - if ( !( handlers = events[ type ] ) ) { - handlers = events[ type ] = []; - handlers.delegateCount = 0; - - // Only use addEventListener if the special events handler returns false - if ( !special.setup || - special.setup.call( elem, data, namespaces, eventHandle ) === false ) { - - if ( elem.addEventListener ) { - elem.addEventListener( type, eventHandle ); - } - } - } - - if ( special.add ) { - special.add.call( elem, handleObj ); - - if ( !handleObj.handler.guid ) { - handleObj.handler.guid = handler.guid; - } - } - - // Add to the element's handler list, delegates in front - if ( selector ) { - handlers.splice( handlers.delegateCount++, 0, handleObj ); - } else { - handlers.push( handleObj ); - } - - // Keep track of which events have ever been used, for event optimization - jQuery.event.global[ type ] = true; - } - - }, - - // Detach an event or set of events from an element - remove: function( elem, types, handler, selector, mappedTypes ) { - - var j, origCount, tmp, - events, t, handleObj, - special, handlers, type, namespaces, origType, - elemData = dataPriv.hasData( elem ) && dataPriv.get( elem ); - - if ( !elemData || !( events = elemData.events ) ) { - return; - } - - // Once for each type.namespace in types; type may be omitted - types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; - t = types.length; - while ( t-- ) { - tmp = rtypenamespace.exec( types[ t ] ) || []; - type = origType = tmp[ 1 ]; - namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); - - // Unbind all events (on this namespace, if provided) for the element - if ( !type ) { - for ( type in events ) { - jQuery.event.remove( elem, type + types[ t ], handler, selector, true ); - } - continue; - } - - special = jQuery.event.special[ type ] || {}; - type = ( selector ? special.delegateType : special.bindType ) || type; - handlers = events[ type ] || []; - tmp = tmp[ 2 ] && - new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ); - - // Remove matching events - origCount = j = handlers.length; - while ( j-- ) { - handleObj = handlers[ j ]; - - if ( ( mappedTypes || origType === handleObj.origType ) && - ( !handler || handler.guid === handleObj.guid ) && - ( !tmp || tmp.test( handleObj.namespace ) ) && - ( !selector || selector === handleObj.selector || - selector === "**" && handleObj.selector ) ) { - handlers.splice( j, 1 ); - - if ( handleObj.selector ) { - handlers.delegateCount--; - } - if ( special.remove ) { - special.remove.call( elem, handleObj ); - } - } - } - - // Remove generic event handler if we removed something and no more handlers exist - // (avoids potential for endless recursion during removal of special event handlers) - if ( origCount && !handlers.length ) { - if ( !special.teardown || - special.teardown.call( elem, namespaces, elemData.handle ) === false ) { - - jQuery.removeEvent( elem, type, elemData.handle ); - } - - delete events[ type ]; - } - } - - // Remove data and the expando if it's no longer used - if ( jQuery.isEmptyObject( events ) ) { - dataPriv.remove( elem, "handle events" ); - } - }, - - dispatch: function( nativeEvent ) { - - var i, j, ret, matched, handleObj, handlerQueue, - args = new Array( arguments.length ), - - // Make a writable jQuery.Event from the native event object - event = jQuery.event.fix( nativeEvent ), - - handlers = ( - dataPriv.get( this, "events" ) || Object.create( null ) - )[ event.type ] || [], - special = jQuery.event.special[ event.type ] || {}; - - // Use the fix-ed jQuery.Event rather than the (read-only) native event - args[ 0 ] = event; - - for ( i = 1; i < arguments.length; i++ ) { - args[ i ] = arguments[ i ]; - } - - event.delegateTarget = this; - - // Call the preDispatch hook for the mapped type, and let it bail if desired - if ( special.preDispatch && special.preDispatch.call( this, event ) === false ) { - return; - } - - // Determine handlers - handlerQueue = jQuery.event.handlers.call( this, event, handlers ); - - // Run delegates first; they may want to stop propagation beneath us - i = 0; - while ( ( matched = handlerQueue[ i++ ] ) && !event.isPropagationStopped() ) { - event.currentTarget = matched.elem; - - j = 0; - while ( ( handleObj = matched.handlers[ j++ ] ) && - !event.isImmediatePropagationStopped() ) { - - // If the event is namespaced, then each handler is only invoked if it is - // specially universal or its namespaces are a superset of the event's. - if ( !event.rnamespace || handleObj.namespace === false || - event.rnamespace.test( handleObj.namespace ) ) { - - event.handleObj = handleObj; - event.data = handleObj.data; - - ret = ( ( jQuery.event.special[ handleObj.origType ] || {} ).handle || - handleObj.handler ).apply( matched.elem, args ); - - if ( ret !== undefined ) { - if ( ( event.result = ret ) === false ) { - event.preventDefault(); - event.stopPropagation(); - } - } - } - } - } - - // Call the postDispatch hook for the mapped type - if ( special.postDispatch ) { - special.postDispatch.call( this, event ); - } - - return event.result; - }, - - handlers: function( event, handlers ) { - var i, handleObj, sel, matchedHandlers, matchedSelectors, - handlerQueue = [], - delegateCount = handlers.delegateCount, - cur = event.target; - - // Find delegate handlers - if ( delegateCount && - - // Support: IE <=9 - // Black-hole SVG instance trees (trac-13180) - cur.nodeType && - - // Support: Firefox <=42 - // Suppress spec-violating clicks indicating a non-primary pointer button (trac-3861) - // https://www.w3.org/TR/DOM-Level-3-Events/#event-type-click - // Support: IE 11 only - // ...but not arrow key "clicks" of radio inputs, which can have `button` -1 (gh-2343) - !( event.type === "click" && event.button >= 1 ) ) { - - for ( ; cur !== this; cur = cur.parentNode || this ) { - - // Don't check non-elements (#13208) - // Don't process clicks on disabled elements (#6911, #8165, #11382, #11764) - if ( cur.nodeType === 1 && !( event.type === "click" && cur.disabled === true ) ) { - matchedHandlers = []; - matchedSelectors = {}; - for ( i = 0; i < delegateCount; i++ ) { - handleObj = handlers[ i ]; - - // Don't conflict with Object.prototype properties (#13203) - sel = handleObj.selector + " "; - - if ( matchedSelectors[ sel ] === undefined ) { - matchedSelectors[ sel ] = handleObj.needsContext ? - jQuery( sel, this ).index( cur ) > -1 : - jQuery.find( sel, this, null, [ cur ] ).length; - } - if ( matchedSelectors[ sel ] ) { - matchedHandlers.push( handleObj ); - } - } - if ( matchedHandlers.length ) { - handlerQueue.push( { elem: cur, handlers: matchedHandlers } ); - } - } - } - } - - // Add the remaining (directly-bound) handlers - cur = this; - if ( delegateCount < handlers.length ) { - handlerQueue.push( { elem: cur, handlers: handlers.slice( delegateCount ) } ); - } - - return handlerQueue; - }, - - addProp: function( name, hook ) { - Object.defineProperty( jQuery.Event.prototype, name, { - enumerable: true, - configurable: true, - - get: isFunction( hook ) ? - function() { - if ( this.originalEvent ) { - return hook( this.originalEvent ); - } - } : - function() { - if ( this.originalEvent ) { - return this.originalEvent[ name ]; - } - }, - - set: function( value ) { - Object.defineProperty( this, name, { - enumerable: true, - configurable: true, - writable: true, - value: value - } ); - } - } ); - }, - - fix: function( originalEvent ) { - return originalEvent[ jQuery.expando ] ? - originalEvent : - new jQuery.Event( originalEvent ); - }, - - special: { - load: { - - // Prevent triggered image.load events from bubbling to window.load - noBubble: true - }, - click: { - - // Utilize native event to ensure correct state for checkable inputs - setup: function( data ) { - - // For mutual compressibility with _default, replace `this` access with a local var. - // `|| data` is dead code meant only to preserve the variable through minification. - var el = this || data; - - // Claim the first handler - if ( rcheckableType.test( el.type ) && - el.click && nodeName( el, "input" ) ) { - - // dataPriv.set( el, "click", ... ) - leverageNative( el, "click", returnTrue ); - } - - // Return false to allow normal processing in the caller - return false; - }, - trigger: function( data ) { - - // For mutual compressibility with _default, replace `this` access with a local var. - // `|| data` is dead code meant only to preserve the variable through minification. - var el = this || data; - - // Force setup before triggering a click - if ( rcheckableType.test( el.type ) && - el.click && nodeName( el, "input" ) ) { - - leverageNative( el, "click" ); - } - - // Return non-false to allow normal event-path propagation - return true; - }, - - // For cross-browser consistency, suppress native .click() on links - // Also prevent it if we're currently inside a leveraged native-event stack - _default: function( event ) { - var target = event.target; - return rcheckableType.test( target.type ) && - target.click && nodeName( target, "input" ) && - dataPriv.get( target, "click" ) || - nodeName( target, "a" ); - } - }, - - beforeunload: { - postDispatch: function( event ) { - - // Support: Firefox 20+ - // Firefox doesn't alert if the returnValue field is not set. - if ( event.result !== undefined && event.originalEvent ) { - event.originalEvent.returnValue = event.result; - } - } - } - } -}; - -// Ensure the presence of an event listener that handles manually-triggered -// synthetic events by interrupting progress until reinvoked in response to -// *native* events that it fires directly, ensuring that state changes have -// already occurred before other listeners are invoked. -function leverageNative( el, type, expectSync ) { - - // Missing expectSync indicates a trigger call, which must force setup through jQuery.event.add - if ( !expectSync ) { - if ( dataPriv.get( el, type ) === undefined ) { - jQuery.event.add( el, type, returnTrue ); - } - return; - } - - // Register the controller as a special universal handler for all event namespaces - dataPriv.set( el, type, false ); - jQuery.event.add( el, type, { - namespace: false, - handler: function( event ) { - var notAsync, result, - saved = dataPriv.get( this, type ); - - if ( ( event.isTrigger & 1 ) && this[ type ] ) { - - // Interrupt processing of the outer synthetic .trigger()ed event - // Saved data should be false in such cases, but might be a leftover capture object - // from an async native handler (gh-4350) - if ( !saved.length ) { - - // Store arguments for use when handling the inner native event - // There will always be at least one argument (an event object), so this array - // will not be confused with a leftover capture object. - saved = slice.call( arguments ); - dataPriv.set( this, type, saved ); - - // Trigger the native event and capture its result - // Support: IE <=9 - 11+ - // focus() and blur() are asynchronous - notAsync = expectSync( this, type ); - this[ type ](); - result = dataPriv.get( this, type ); - if ( saved !== result || notAsync ) { - dataPriv.set( this, type, false ); - } else { - result = {}; - } - if ( saved !== result ) { - - // Cancel the outer synthetic event - event.stopImmediatePropagation(); - event.preventDefault(); - return result.value; - } - - // If this is an inner synthetic event for an event with a bubbling surrogate - // (focus or blur), assume that the surrogate already propagated from triggering the - // native event and prevent that from happening again here. - // This technically gets the ordering wrong w.r.t. to `.trigger()` (in which the - // bubbling surrogate propagates *after* the non-bubbling base), but that seems - // less bad than duplication. - } else if ( ( jQuery.event.special[ type ] || {} ).delegateType ) { - event.stopPropagation(); - } - - // If this is a native event triggered above, everything is now in order - // Fire an inner synthetic event with the original arguments - } else if ( saved.length ) { - - // ...and capture the result - dataPriv.set( this, type, { - value: jQuery.event.trigger( - - // Support: IE <=9 - 11+ - // Extend with the prototype to reset the above stopImmediatePropagation() - jQuery.extend( saved[ 0 ], jQuery.Event.prototype ), - saved.slice( 1 ), - this - ) - } ); - - // Abort handling of the native event - event.stopImmediatePropagation(); - } - } - } ); -} - -jQuery.removeEvent = function( elem, type, handle ) { - - // This "if" is needed for plain objects - if ( elem.removeEventListener ) { - elem.removeEventListener( type, handle ); - } -}; - -jQuery.Event = function( src, props ) { - - // Allow instantiation without the 'new' keyword - if ( !( this instanceof jQuery.Event ) ) { - return new jQuery.Event( src, props ); - } - - // Event object - if ( src && src.type ) { - this.originalEvent = src; - this.type = src.type; - - // Events bubbling up the document may have been marked as prevented - // by a handler lower down the tree; reflect the correct value. - this.isDefaultPrevented = src.defaultPrevented || - src.defaultPrevented === undefined && - - // Support: Android <=2.3 only - src.returnValue === false ? - returnTrue : - returnFalse; - - // Create target properties - // Support: Safari <=6 - 7 only - // Target should not be a text node (#504, #13143) - this.target = ( src.target && src.target.nodeType === 3 ) ? - src.target.parentNode : - src.target; - - this.currentTarget = src.currentTarget; - this.relatedTarget = src.relatedTarget; - - // Event type - } else { - this.type = src; - } - - // Put explicitly provided properties onto the event object - if ( props ) { - jQuery.extend( this, props ); - } - - // Create a timestamp if incoming event doesn't have one - this.timeStamp = src && src.timeStamp || Date.now(); - - // Mark it as fixed - this[ jQuery.expando ] = true; -}; - -// jQuery.Event is based on DOM3 Events as specified by the ECMAScript Language Binding -// https://www.w3.org/TR/2003/WD-DOM-Level-3-Events-20030331/ecma-script-binding.html -jQuery.Event.prototype = { - constructor: jQuery.Event, - isDefaultPrevented: returnFalse, - isPropagationStopped: returnFalse, - isImmediatePropagationStopped: returnFalse, - isSimulated: false, - - preventDefault: function() { - var e = this.originalEvent; - - this.isDefaultPrevented = returnTrue; - - if ( e && !this.isSimulated ) { - e.preventDefault(); - } - }, - stopPropagation: function() { - var e = this.originalEvent; - - this.isPropagationStopped = returnTrue; - - if ( e && !this.isSimulated ) { - e.stopPropagation(); - } - }, - stopImmediatePropagation: function() { - var e = this.originalEvent; - - this.isImmediatePropagationStopped = returnTrue; - - if ( e && !this.isSimulated ) { - e.stopImmediatePropagation(); - } - - this.stopPropagation(); - } -}; - -// Includes all common event props including KeyEvent and MouseEvent specific props -jQuery.each( { - altKey: true, - bubbles: true, - cancelable: true, - changedTouches: true, - ctrlKey: true, - detail: true, - eventPhase: true, - metaKey: true, - pageX: true, - pageY: true, - shiftKey: true, - view: true, - "char": true, - code: true, - charCode: true, - key: true, - keyCode: true, - button: true, - buttons: true, - clientX: true, - clientY: true, - offsetX: true, - offsetY: true, - pointerId: true, - pointerType: true, - screenX: true, - screenY: true, - targetTouches: true, - toElement: true, - touches: true, - - which: function( event ) { - var button = event.button; - - // Add which for key events - if ( event.which == null && rkeyEvent.test( event.type ) ) { - return event.charCode != null ? event.charCode : event.keyCode; - } - - // Add which for click: 1 === left; 2 === middle; 3 === right - if ( !event.which && button !== undefined && rmouseEvent.test( event.type ) ) { - if ( button & 1 ) { - return 1; - } - - if ( button & 2 ) { - return 3; - } - - if ( button & 4 ) { - return 2; - } - - return 0; - } - - return event.which; - } -}, jQuery.event.addProp ); - -jQuery.each( { focus: "focusin", blur: "focusout" }, function( type, delegateType ) { - jQuery.event.special[ type ] = { - - // Utilize native event if possible so blur/focus sequence is correct - setup: function() { - - // Claim the first handler - // dataPriv.set( this, "focus", ... ) - // dataPriv.set( this, "blur", ... ) - leverageNative( this, type, expectSync ); - - // Return false to allow normal processing in the caller - return false; - }, - trigger: function() { - - // Force setup before trigger - leverageNative( this, type ); - - // Return non-false to allow normal event-path propagation - return true; - }, - - delegateType: delegateType - }; -} ); - -// Create mouseenter/leave events using mouseover/out and event-time checks -// so that event delegation works in jQuery. -// Do the same for pointerenter/pointerleave and pointerover/pointerout -// -// Support: Safari 7 only -// Safari sends mouseenter too often; see: -// https://bugs.chromium.org/p/chromium/issues/detail?id=470258 -// for the description of the bug (it existed in older Chrome versions as well). -jQuery.each( { - mouseenter: "mouseover", - mouseleave: "mouseout", - pointerenter: "pointerover", - pointerleave: "pointerout" -}, function( orig, fix ) { - jQuery.event.special[ orig ] = { - delegateType: fix, - bindType: fix, - - handle: function( event ) { - var ret, - target = this, - related = event.relatedTarget, - handleObj = event.handleObj; - - // For mouseenter/leave call the handler if related is outside the target. - // NB: No relatedTarget if the mouse left/entered the browser window - if ( !related || ( related !== target && !jQuery.contains( target, related ) ) ) { - event.type = handleObj.origType; - ret = handleObj.handler.apply( this, arguments ); - event.type = fix; - } - return ret; - } - }; -} ); - -jQuery.fn.extend( { - - on: function( types, selector, data, fn ) { - return on( this, types, selector, data, fn ); - }, - one: function( types, selector, data, fn ) { - return on( this, types, selector, data, fn, 1 ); - }, - off: function( types, selector, fn ) { - var handleObj, type; - if ( types && types.preventDefault && types.handleObj ) { - - // ( event ) dispatched jQuery.Event - handleObj = types.handleObj; - jQuery( types.delegateTarget ).off( - handleObj.namespace ? - handleObj.origType + "." + handleObj.namespace : - handleObj.origType, - handleObj.selector, - handleObj.handler - ); - return this; - } - if ( typeof types === "object" ) { - - // ( types-object [, selector] ) - for ( type in types ) { - this.off( type, selector, types[ type ] ); - } - return this; - } - if ( selector === false || typeof selector === "function" ) { - - // ( types [, fn] ) - fn = selector; - selector = undefined; - } - if ( fn === false ) { - fn = returnFalse; - } - return this.each( function() { - jQuery.event.remove( this, types, fn, selector ); - } ); - } -} ); - - -var - - // Support: IE <=10 - 11, Edge 12 - 13 only - // In IE/Edge using regex groups here causes severe slowdowns. - // See https://connect.microsoft.com/IE/feedback/details/1736512/ - rnoInnerhtml = /\s*$/g; - -// Prefer a tbody over its parent table for containing new rows -function manipulationTarget( elem, content ) { - if ( nodeName( elem, "table" ) && - nodeName( content.nodeType !== 11 ? content : content.firstChild, "tr" ) ) { - - return jQuery( elem ).children( "tbody" )[ 0 ] || elem; - } - - return elem; -} - -// Replace/restore the type attribute of script elements for safe DOM manipulation -function disableScript( elem ) { - elem.type = ( elem.getAttribute( "type" ) !== null ) + "/" + elem.type; - return elem; -} -function restoreScript( elem ) { - if ( ( elem.type || "" ).slice( 0, 5 ) === "true/" ) { - elem.type = elem.type.slice( 5 ); - } else { - elem.removeAttribute( "type" ); - } - - return elem; -} - -function cloneCopyEvent( src, dest ) { - var i, l, type, pdataOld, udataOld, udataCur, events; - - if ( dest.nodeType !== 1 ) { - return; - } - - // 1. Copy private data: events, handlers, etc. - if ( dataPriv.hasData( src ) ) { - pdataOld = dataPriv.get( src ); - events = pdataOld.events; - - if ( events ) { - dataPriv.remove( dest, "handle events" ); - - for ( type in events ) { - for ( i = 0, l = events[ type ].length; i < l; i++ ) { - jQuery.event.add( dest, type, events[ type ][ i ] ); - } - } - } - } - - // 2. Copy user data - if ( dataUser.hasData( src ) ) { - udataOld = dataUser.access( src ); - udataCur = jQuery.extend( {}, udataOld ); - - dataUser.set( dest, udataCur ); - } -} - -// Fix IE bugs, see support tests -function fixInput( src, dest ) { - var nodeName = dest.nodeName.toLowerCase(); - - // Fails to persist the checked state of a cloned checkbox or radio button. - if ( nodeName === "input" && rcheckableType.test( src.type ) ) { - dest.checked = src.checked; - - // Fails to return the selected option to the default selected state when cloning options - } else if ( nodeName === "input" || nodeName === "textarea" ) { - dest.defaultValue = src.defaultValue; - } -} - -function domManip( collection, args, callback, ignored ) { - - // Flatten any nested arrays - args = flat( args ); - - var fragment, first, scripts, hasScripts, node, doc, - i = 0, - l = collection.length, - iNoClone = l - 1, - value = args[ 0 ], - valueIsFunction = isFunction( value ); - - // We can't cloneNode fragments that contain checked, in WebKit - if ( valueIsFunction || - ( l > 1 && typeof value === "string" && - !support.checkClone && rchecked.test( value ) ) ) { - return collection.each( function( index ) { - var self = collection.eq( index ); - if ( valueIsFunction ) { - args[ 0 ] = value.call( this, index, self.html() ); - } - domManip( self, args, callback, ignored ); - } ); - } - - if ( l ) { - fragment = buildFragment( args, collection[ 0 ].ownerDocument, false, collection, ignored ); - first = fragment.firstChild; - - if ( fragment.childNodes.length === 1 ) { - fragment = first; - } - - // Require either new content or an interest in ignored elements to invoke the callback - if ( first || ignored ) { - scripts = jQuery.map( getAll( fragment, "script" ), disableScript ); - hasScripts = scripts.length; - - // Use the original fragment for the last item - // instead of the first because it can end up - // being emptied incorrectly in certain situations (#8070). - for ( ; i < l; i++ ) { - node = fragment; - - if ( i !== iNoClone ) { - node = jQuery.clone( node, true, true ); - - // Keep references to cloned scripts for later restoration - if ( hasScripts ) { - - // Support: Android <=4.0 only, PhantomJS 1 only - // push.apply(_, arraylike) throws on ancient WebKit - jQuery.merge( scripts, getAll( node, "script" ) ); - } - } - - callback.call( collection[ i ], node, i ); - } - - if ( hasScripts ) { - doc = scripts[ scripts.length - 1 ].ownerDocument; - - // Reenable scripts - jQuery.map( scripts, restoreScript ); - - // Evaluate executable scripts on first document insertion - for ( i = 0; i < hasScripts; i++ ) { - node = scripts[ i ]; - if ( rscriptType.test( node.type || "" ) && - !dataPriv.access( node, "globalEval" ) && - jQuery.contains( doc, node ) ) { - - if ( node.src && ( node.type || "" ).toLowerCase() !== "module" ) { - - // Optional AJAX dependency, but won't run scripts if not present - if ( jQuery._evalUrl && !node.noModule ) { - jQuery._evalUrl( node.src, { - nonce: node.nonce || node.getAttribute( "nonce" ) - }, doc ); - } - } else { - DOMEval( node.textContent.replace( rcleanScript, "" ), node, doc ); - } - } - } - } - } - } - - return collection; -} - -function remove( elem, selector, keepData ) { - var node, - nodes = selector ? jQuery.filter( selector, elem ) : elem, - i = 0; - - for ( ; ( node = nodes[ i ] ) != null; i++ ) { - if ( !keepData && node.nodeType === 1 ) { - jQuery.cleanData( getAll( node ) ); - } - - if ( node.parentNode ) { - if ( keepData && isAttached( node ) ) { - setGlobalEval( getAll( node, "script" ) ); - } - node.parentNode.removeChild( node ); - } - } - - return elem; -} - -jQuery.extend( { - htmlPrefilter: function( html ) { - return html; - }, - - clone: function( elem, dataAndEvents, deepDataAndEvents ) { - var i, l, srcElements, destElements, - clone = elem.cloneNode( true ), - inPage = isAttached( elem ); - - // Fix IE cloning issues - if ( !support.noCloneChecked && ( elem.nodeType === 1 || elem.nodeType === 11 ) && - !jQuery.isXMLDoc( elem ) ) { - - // We eschew Sizzle here for performance reasons: https://jsperf.com/getall-vs-sizzle/2 - destElements = getAll( clone ); - srcElements = getAll( elem ); - - for ( i = 0, l = srcElements.length; i < l; i++ ) { - fixInput( srcElements[ i ], destElements[ i ] ); - } - } - - // Copy the events from the original to the clone - if ( dataAndEvents ) { - if ( deepDataAndEvents ) { - srcElements = srcElements || getAll( elem ); - destElements = destElements || getAll( clone ); - - for ( i = 0, l = srcElements.length; i < l; i++ ) { - cloneCopyEvent( srcElements[ i ], destElements[ i ] ); - } - } else { - cloneCopyEvent( elem, clone ); - } - } - - // Preserve script evaluation history - destElements = getAll( clone, "script" ); - if ( destElements.length > 0 ) { - setGlobalEval( destElements, !inPage && getAll( elem, "script" ) ); - } - - // Return the cloned set - return clone; - }, - - cleanData: function( elems ) { - var data, elem, type, - special = jQuery.event.special, - i = 0; - - for ( ; ( elem = elems[ i ] ) !== undefined; i++ ) { - if ( acceptData( elem ) ) { - if ( ( data = elem[ dataPriv.expando ] ) ) { - if ( data.events ) { - for ( type in data.events ) { - if ( special[ type ] ) { - jQuery.event.remove( elem, type ); - - // This is a shortcut to avoid jQuery.event.remove's overhead - } else { - jQuery.removeEvent( elem, type, data.handle ); - } - } - } - - // Support: Chrome <=35 - 45+ - // Assign undefined instead of using delete, see Data#remove - elem[ dataPriv.expando ] = undefined; - } - if ( elem[ dataUser.expando ] ) { - - // Support: Chrome <=35 - 45+ - // Assign undefined instead of using delete, see Data#remove - elem[ dataUser.expando ] = undefined; - } - } - } - } -} ); - -jQuery.fn.extend( { - detach: function( selector ) { - return remove( this, selector, true ); - }, - - remove: function( selector ) { - return remove( this, selector ); - }, - - text: function( value ) { - return access( this, function( value ) { - return value === undefined ? - jQuery.text( this ) : - this.empty().each( function() { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - this.textContent = value; - } - } ); - }, null, value, arguments.length ); - }, - - append: function() { - return domManip( this, arguments, function( elem ) { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - var target = manipulationTarget( this, elem ); - target.appendChild( elem ); - } - } ); - }, - - prepend: function() { - return domManip( this, arguments, function( elem ) { - if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { - var target = manipulationTarget( this, elem ); - target.insertBefore( elem, target.firstChild ); - } - } ); - }, - - before: function() { - return domManip( this, arguments, function( elem ) { - if ( this.parentNode ) { - this.parentNode.insertBefore( elem, this ); - } - } ); - }, - - after: function() { - return domManip( this, arguments, function( elem ) { - if ( this.parentNode ) { - this.parentNode.insertBefore( elem, this.nextSibling ); - } - } ); - }, - - empty: function() { - var elem, - i = 0; - - for ( ; ( elem = this[ i ] ) != null; i++ ) { - if ( elem.nodeType === 1 ) { - - // Prevent memory leaks - jQuery.cleanData( getAll( elem, false ) ); - - // Remove any remaining nodes - elem.textContent = ""; - } - } - - return this; - }, - - clone: function( dataAndEvents, deepDataAndEvents ) { - dataAndEvents = dataAndEvents == null ? false : dataAndEvents; - deepDataAndEvents = deepDataAndEvents == null ? dataAndEvents : deepDataAndEvents; - - return this.map( function() { - return jQuery.clone( this, dataAndEvents, deepDataAndEvents ); - } ); - }, - - html: function( value ) { - return access( this, function( value ) { - var elem = this[ 0 ] || {}, - i = 0, - l = this.length; - - if ( value === undefined && elem.nodeType === 1 ) { - return elem.innerHTML; - } - - // See if we can take a shortcut and just use innerHTML - if ( typeof value === "string" && !rnoInnerhtml.test( value ) && - !wrapMap[ ( rtagName.exec( value ) || [ "", "" ] )[ 1 ].toLowerCase() ] ) { - - value = jQuery.htmlPrefilter( value ); - - try { - for ( ; i < l; i++ ) { - elem = this[ i ] || {}; - - // Remove element nodes and prevent memory leaks - if ( elem.nodeType === 1 ) { - jQuery.cleanData( getAll( elem, false ) ); - elem.innerHTML = value; - } - } - - elem = 0; - - // If using innerHTML throws an exception, use the fallback method - } catch ( e ) {} - } - - if ( elem ) { - this.empty().append( value ); - } - }, null, value, arguments.length ); - }, - - replaceWith: function() { - var ignored = []; - - // Make the changes, replacing each non-ignored context element with the new content - return domManip( this, arguments, function( elem ) { - var parent = this.parentNode; - - if ( jQuery.inArray( this, ignored ) < 0 ) { - jQuery.cleanData( getAll( this ) ); - if ( parent ) { - parent.replaceChild( elem, this ); - } - } - - // Force callback invocation - }, ignored ); - } -} ); - -jQuery.each( { - appendTo: "append", - prependTo: "prepend", - insertBefore: "before", - insertAfter: "after", - replaceAll: "replaceWith" -}, function( name, original ) { - jQuery.fn[ name ] = function( selector ) { - var elems, - ret = [], - insert = jQuery( selector ), - last = insert.length - 1, - i = 0; - - for ( ; i <= last; i++ ) { - elems = i === last ? this : this.clone( true ); - jQuery( insert[ i ] )[ original ]( elems ); - - // Support: Android <=4.0 only, PhantomJS 1 only - // .get() because push.apply(_, arraylike) throws on ancient WebKit - push.apply( ret, elems.get() ); - } - - return this.pushStack( ret ); - }; -} ); -var rnumnonpx = new RegExp( "^(" + pnum + ")(?!px)[a-z%]+$", "i" ); - -var getStyles = function( elem ) { - - // Support: IE <=11 only, Firefox <=30 (#15098, #14150) - // IE throws on elements created in popups - // FF meanwhile throws on frame elements through "defaultView.getComputedStyle" - var view = elem.ownerDocument.defaultView; - - if ( !view || !view.opener ) { - view = window; - } - - return view.getComputedStyle( elem ); - }; - -var swap = function( elem, options, callback ) { - var ret, name, - old = {}; - - // Remember the old values, and insert the new ones - for ( name in options ) { - old[ name ] = elem.style[ name ]; - elem.style[ name ] = options[ name ]; - } - - ret = callback.call( elem ); - - // Revert the old values - for ( name in options ) { - elem.style[ name ] = old[ name ]; - } - - return ret; -}; - - -var rboxStyle = new RegExp( cssExpand.join( "|" ), "i" ); - - - -( function() { - - // Executing both pixelPosition & boxSizingReliable tests require only one layout - // so they're executed at the same time to save the second computation. - function computeStyleTests() { - - // This is a singleton, we need to execute it only once - if ( !div ) { - return; - } - - container.style.cssText = "position:absolute;left:-11111px;width:60px;" + - "margin-top:1px;padding:0;border:0"; - div.style.cssText = - "position:relative;display:block;box-sizing:border-box;overflow:scroll;" + - "margin:auto;border:1px;padding:1px;" + - "width:60%;top:1%"; - documentElement.appendChild( container ).appendChild( div ); - - var divStyle = window.getComputedStyle( div ); - pixelPositionVal = divStyle.top !== "1%"; - - // Support: Android 4.0 - 4.3 only, Firefox <=3 - 44 - reliableMarginLeftVal = roundPixelMeasures( divStyle.marginLeft ) === 12; - - // Support: Android 4.0 - 4.3 only, Safari <=9.1 - 10.1, iOS <=7.0 - 9.3 - // Some styles come back with percentage values, even though they shouldn't - div.style.right = "60%"; - pixelBoxStylesVal = roundPixelMeasures( divStyle.right ) === 36; - - // Support: IE 9 - 11 only - // Detect misreporting of content dimensions for box-sizing:border-box elements - boxSizingReliableVal = roundPixelMeasures( divStyle.width ) === 36; - - // Support: IE 9 only - // Detect overflow:scroll screwiness (gh-3699) - // Support: Chrome <=64 - // Don't get tricked when zoom affects offsetWidth (gh-4029) - div.style.position = "absolute"; - scrollboxSizeVal = roundPixelMeasures( div.offsetWidth / 3 ) === 12; - - documentElement.removeChild( container ); - - // Nullify the div so it wouldn't be stored in the memory and - // it will also be a sign that checks already performed - div = null; - } - - function roundPixelMeasures( measure ) { - return Math.round( parseFloat( measure ) ); - } - - var pixelPositionVal, boxSizingReliableVal, scrollboxSizeVal, pixelBoxStylesVal, - reliableTrDimensionsVal, reliableMarginLeftVal, - container = document.createElement( "div" ), - div = document.createElement( "div" ); - - // Finish early in limited (non-browser) environments - if ( !div.style ) { - return; - } - - // Support: IE <=9 - 11 only - // Style of cloned element affects source element cloned (#8908) - div.style.backgroundClip = "content-box"; - div.cloneNode( true ).style.backgroundClip = ""; - support.clearCloneStyle = div.style.backgroundClip === "content-box"; - - jQuery.extend( support, { - boxSizingReliable: function() { - computeStyleTests(); - return boxSizingReliableVal; - }, - pixelBoxStyles: function() { - computeStyleTests(); - return pixelBoxStylesVal; - }, - pixelPosition: function() { - computeStyleTests(); - return pixelPositionVal; - }, - reliableMarginLeft: function() { - computeStyleTests(); - return reliableMarginLeftVal; - }, - scrollboxSize: function() { - computeStyleTests(); - return scrollboxSizeVal; - }, - - // Support: IE 9 - 11+, Edge 15 - 18+ - // IE/Edge misreport `getComputedStyle` of table rows with width/height - // set in CSS while `offset*` properties report correct values. - // Behavior in IE 9 is more subtle than in newer versions & it passes - // some versions of this test; make sure not to make it pass there! - reliableTrDimensions: function() { - var table, tr, trChild, trStyle; - if ( reliableTrDimensionsVal == null ) { - table = document.createElement( "table" ); - tr = document.createElement( "tr" ); - trChild = document.createElement( "div" ); - - table.style.cssText = "position:absolute;left:-11111px"; - tr.style.height = "1px"; - trChild.style.height = "9px"; - - documentElement - .appendChild( table ) - .appendChild( tr ) - .appendChild( trChild ); - - trStyle = window.getComputedStyle( tr ); - reliableTrDimensionsVal = parseInt( trStyle.height ) > 3; - - documentElement.removeChild( table ); - } - return reliableTrDimensionsVal; - } - } ); -} )(); - - -function curCSS( elem, name, computed ) { - var width, minWidth, maxWidth, ret, - - // Support: Firefox 51+ - // Retrieving style before computed somehow - // fixes an issue with getting wrong values - // on detached elements - style = elem.style; - - computed = computed || getStyles( elem ); - - // getPropertyValue is needed for: - // .css('filter') (IE 9 only, #12537) - // .css('--customProperty) (#3144) - if ( computed ) { - ret = computed.getPropertyValue( name ) || computed[ name ]; - - if ( ret === "" && !isAttached( elem ) ) { - ret = jQuery.style( elem, name ); - } - - // A tribute to the "awesome hack by Dean Edwards" - // Android Browser returns percentage for some values, - // but width seems to be reliably pixels. - // This is against the CSSOM draft spec: - // https://drafts.csswg.org/cssom/#resolved-values - if ( !support.pixelBoxStyles() && rnumnonpx.test( ret ) && rboxStyle.test( name ) ) { - - // Remember the original values - width = style.width; - minWidth = style.minWidth; - maxWidth = style.maxWidth; - - // Put in the new values to get a computed value out - style.minWidth = style.maxWidth = style.width = ret; - ret = computed.width; - - // Revert the changed values - style.width = width; - style.minWidth = minWidth; - style.maxWidth = maxWidth; - } - } - - return ret !== undefined ? - - // Support: IE <=9 - 11 only - // IE returns zIndex value as an integer. - ret + "" : - ret; -} - - -function addGetHookIf( conditionFn, hookFn ) { - - // Define the hook, we'll check on the first run if it's really needed. - return { - get: function() { - if ( conditionFn() ) { - - // Hook not needed (or it's not possible to use it due - // to missing dependency), remove it. - delete this.get; - return; - } - - // Hook needed; redefine it so that the support test is not executed again. - return ( this.get = hookFn ).apply( this, arguments ); - } - }; -} - - -var cssPrefixes = [ "Webkit", "Moz", "ms" ], - emptyStyle = document.createElement( "div" ).style, - vendorProps = {}; - -// Return a vendor-prefixed property or undefined -function vendorPropName( name ) { - - // Check for vendor prefixed names - var capName = name[ 0 ].toUpperCase() + name.slice( 1 ), - i = cssPrefixes.length; - - while ( i-- ) { - name = cssPrefixes[ i ] + capName; - if ( name in emptyStyle ) { - return name; - } - } -} - -// Return a potentially-mapped jQuery.cssProps or vendor prefixed property -function finalPropName( name ) { - var final = jQuery.cssProps[ name ] || vendorProps[ name ]; - - if ( final ) { - return final; - } - if ( name in emptyStyle ) { - return name; - } - return vendorProps[ name ] = vendorPropName( name ) || name; -} - - -var - - // Swappable if display is none or starts with table - // except "table", "table-cell", or "table-caption" - // See here for display values: https://developer.mozilla.org/en-US/docs/CSS/display - rdisplayswap = /^(none|table(?!-c[ea]).+)/, - rcustomProp = /^--/, - cssShow = { position: "absolute", visibility: "hidden", display: "block" }, - cssNormalTransform = { - letterSpacing: "0", - fontWeight: "400" - }; - -function setPositiveNumber( _elem, value, subtract ) { - - // Any relative (+/-) values have already been - // normalized at this point - var matches = rcssNum.exec( value ); - return matches ? - - // Guard against undefined "subtract", e.g., when used as in cssHooks - Math.max( 0, matches[ 2 ] - ( subtract || 0 ) ) + ( matches[ 3 ] || "px" ) : - value; -} - -function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, computedVal ) { - var i = dimension === "width" ? 1 : 0, - extra = 0, - delta = 0; - - // Adjustment may not be necessary - if ( box === ( isBorderBox ? "border" : "content" ) ) { - return 0; - } - - for ( ; i < 4; i += 2 ) { - - // Both box models exclude margin - if ( box === "margin" ) { - delta += jQuery.css( elem, box + cssExpand[ i ], true, styles ); - } - - // If we get here with a content-box, we're seeking "padding" or "border" or "margin" - if ( !isBorderBox ) { - - // Add padding - delta += jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); - - // For "border" or "margin", add border - if ( box !== "padding" ) { - delta += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - - // But still keep track of it otherwise - } else { - extra += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - } - - // If we get here with a border-box (content + padding + border), we're seeking "content" or - // "padding" or "margin" - } else { - - // For "content", subtract padding - if ( box === "content" ) { - delta -= jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); - } - - // For "content" or "padding", subtract border - if ( box !== "margin" ) { - delta -= jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); - } - } - } - - // Account for positive content-box scroll gutter when requested by providing computedVal - if ( !isBorderBox && computedVal >= 0 ) { - - // offsetWidth/offsetHeight is a rounded sum of content, padding, scroll gutter, and border - // Assuming integer scroll gutter, subtract the rest and round down - delta += Math.max( 0, Math.ceil( - elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - - computedVal - - delta - - extra - - 0.5 - - // If offsetWidth/offsetHeight is unknown, then we can't determine content-box scroll gutter - // Use an explicit zero to avoid NaN (gh-3964) - ) ) || 0; - } - - return delta; -} - -function getWidthOrHeight( elem, dimension, extra ) { - - // Start with computed style - var styles = getStyles( elem ), - - // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-4322). - // Fake content-box until we know it's needed to know the true value. - boxSizingNeeded = !support.boxSizingReliable() || extra, - isBorderBox = boxSizingNeeded && - jQuery.css( elem, "boxSizing", false, styles ) === "border-box", - valueIsBorderBox = isBorderBox, - - val = curCSS( elem, dimension, styles ), - offsetProp = "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ); - - // Support: Firefox <=54 - // Return a confounding non-pixel value or feign ignorance, as appropriate. - if ( rnumnonpx.test( val ) ) { - if ( !extra ) { - return val; - } - val = "auto"; - } - - - // Support: IE 9 - 11 only - // Use offsetWidth/offsetHeight for when box sizing is unreliable. - // In those cases, the computed value can be trusted to be border-box. - if ( ( !support.boxSizingReliable() && isBorderBox || - - // Support: IE 10 - 11+, Edge 15 - 18+ - // IE/Edge misreport `getComputedStyle` of table rows with width/height - // set in CSS while `offset*` properties report correct values. - // Interestingly, in some cases IE 9 doesn't suffer from this issue. - !support.reliableTrDimensions() && nodeName( elem, "tr" ) || - - // Fall back to offsetWidth/offsetHeight when value is "auto" - // This happens for inline elements with no explicit setting (gh-3571) - val === "auto" || - - // Support: Android <=4.1 - 4.3 only - // Also use offsetWidth/offsetHeight for misreported inline dimensions (gh-3602) - !parseFloat( val ) && jQuery.css( elem, "display", false, styles ) === "inline" ) && - - // Make sure the element is visible & connected - elem.getClientRects().length ) { - - isBorderBox = jQuery.css( elem, "boxSizing", false, styles ) === "border-box"; - - // Where available, offsetWidth/offsetHeight approximate border box dimensions. - // Where not available (e.g., SVG), assume unreliable box-sizing and interpret the - // retrieved value as a content box dimension. - valueIsBorderBox = offsetProp in elem; - if ( valueIsBorderBox ) { - val = elem[ offsetProp ]; - } - } - - // Normalize "" and auto - val = parseFloat( val ) || 0; - - // Adjust for the element's box model - return ( val + - boxModelAdjustment( - elem, - dimension, - extra || ( isBorderBox ? "border" : "content" ), - valueIsBorderBox, - styles, - - // Provide the current computed size to request scroll gutter calculation (gh-3589) - val - ) - ) + "px"; -} - -jQuery.extend( { - - // Add in style property hooks for overriding the default - // behavior of getting and setting a style property - cssHooks: { - opacity: { - get: function( elem, computed ) { - if ( computed ) { - - // We should always get a number back from opacity - var ret = curCSS( elem, "opacity" ); - return ret === "" ? "1" : ret; - } - } - } - }, - - // Don't automatically add "px" to these possibly-unitless properties - cssNumber: { - "animationIterationCount": true, - "columnCount": true, - "fillOpacity": true, - "flexGrow": true, - "flexShrink": true, - "fontWeight": true, - "gridArea": true, - "gridColumn": true, - "gridColumnEnd": true, - "gridColumnStart": true, - "gridRow": true, - "gridRowEnd": true, - "gridRowStart": true, - "lineHeight": true, - "opacity": true, - "order": true, - "orphans": true, - "widows": true, - "zIndex": true, - "zoom": true - }, - - // Add in properties whose names you wish to fix before - // setting or getting the value - cssProps: {}, - - // Get and set the style property on a DOM Node - style: function( elem, name, value, extra ) { - - // Don't set styles on text and comment nodes - if ( !elem || elem.nodeType === 3 || elem.nodeType === 8 || !elem.style ) { - return; - } - - // Make sure that we're working with the right name - var ret, type, hooks, - origName = camelCase( name ), - isCustomProp = rcustomProp.test( name ), - style = elem.style; - - // Make sure that we're working with the right name. We don't - // want to query the value if it is a CSS custom property - // since they are user-defined. - if ( !isCustomProp ) { - name = finalPropName( origName ); - } - - // Gets hook for the prefixed version, then unprefixed version - hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; - - // Check if we're setting a value - if ( value !== undefined ) { - type = typeof value; - - // Convert "+=" or "-=" to relative numbers (#7345) - if ( type === "string" && ( ret = rcssNum.exec( value ) ) && ret[ 1 ] ) { - value = adjustCSS( elem, name, ret ); - - // Fixes bug #9237 - type = "number"; - } - - // Make sure that null and NaN values aren't set (#7116) - if ( value == null || value !== value ) { - return; - } - - // If a number was passed in, add the unit (except for certain CSS properties) - // The isCustomProp check can be removed in jQuery 4.0 when we only auto-append - // "px" to a few hardcoded values. - if ( type === "number" && !isCustomProp ) { - value += ret && ret[ 3 ] || ( jQuery.cssNumber[ origName ] ? "" : "px" ); - } - - // background-* props affect original clone's values - if ( !support.clearCloneStyle && value === "" && name.indexOf( "background" ) === 0 ) { - style[ name ] = "inherit"; - } - - // If a hook was provided, use that value, otherwise just set the specified value - if ( !hooks || !( "set" in hooks ) || - ( value = hooks.set( elem, value, extra ) ) !== undefined ) { - - if ( isCustomProp ) { - style.setProperty( name, value ); - } else { - style[ name ] = value; - } - } - - } else { - - // If a hook was provided get the non-computed value from there - if ( hooks && "get" in hooks && - ( ret = hooks.get( elem, false, extra ) ) !== undefined ) { - - return ret; - } - - // Otherwise just get the value from the style object - return style[ name ]; - } - }, - - css: function( elem, name, extra, styles ) { - var val, num, hooks, - origName = camelCase( name ), - isCustomProp = rcustomProp.test( name ); - - // Make sure that we're working with the right name. We don't - // want to modify the value if it is a CSS custom property - // since they are user-defined. - if ( !isCustomProp ) { - name = finalPropName( origName ); - } - - // Try prefixed name followed by the unprefixed name - hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; - - // If a hook was provided get the computed value from there - if ( hooks && "get" in hooks ) { - val = hooks.get( elem, true, extra ); - } - - // Otherwise, if a way to get the computed value exists, use that - if ( val === undefined ) { - val = curCSS( elem, name, styles ); - } - - // Convert "normal" to computed value - if ( val === "normal" && name in cssNormalTransform ) { - val = cssNormalTransform[ name ]; - } - - // Make numeric if forced or a qualifier was provided and val looks numeric - if ( extra === "" || extra ) { - num = parseFloat( val ); - return extra === true || isFinite( num ) ? num || 0 : val; - } - - return val; - } -} ); - -jQuery.each( [ "height", "width" ], function( _i, dimension ) { - jQuery.cssHooks[ dimension ] = { - get: function( elem, computed, extra ) { - if ( computed ) { - - // Certain elements can have dimension info if we invisibly show them - // but it must have a current display style that would benefit - return rdisplayswap.test( jQuery.css( elem, "display" ) ) && - - // Support: Safari 8+ - // Table columns in Safari have non-zero offsetWidth & zero - // getBoundingClientRect().width unless display is changed. - // Support: IE <=11 only - // Running getBoundingClientRect on a disconnected node - // in IE throws an error. - ( !elem.getClientRects().length || !elem.getBoundingClientRect().width ) ? - swap( elem, cssShow, function() { - return getWidthOrHeight( elem, dimension, extra ); - } ) : - getWidthOrHeight( elem, dimension, extra ); - } - }, - - set: function( elem, value, extra ) { - var matches, - styles = getStyles( elem ), - - // Only read styles.position if the test has a chance to fail - // to avoid forcing a reflow. - scrollboxSizeBuggy = !support.scrollboxSize() && - styles.position === "absolute", - - // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-3991) - boxSizingNeeded = scrollboxSizeBuggy || extra, - isBorderBox = boxSizingNeeded && - jQuery.css( elem, "boxSizing", false, styles ) === "border-box", - subtract = extra ? - boxModelAdjustment( - elem, - dimension, - extra, - isBorderBox, - styles - ) : - 0; - - // Account for unreliable border-box dimensions by comparing offset* to computed and - // faking a content-box to get border and padding (gh-3699) - if ( isBorderBox && scrollboxSizeBuggy ) { - subtract -= Math.ceil( - elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - - parseFloat( styles[ dimension ] ) - - boxModelAdjustment( elem, dimension, "border", false, styles ) - - 0.5 - ); - } - - // Convert to pixels if value adjustment is needed - if ( subtract && ( matches = rcssNum.exec( value ) ) && - ( matches[ 3 ] || "px" ) !== "px" ) { - - elem.style[ dimension ] = value; - value = jQuery.css( elem, dimension ); - } - - return setPositiveNumber( elem, value, subtract ); - } - }; -} ); - -jQuery.cssHooks.marginLeft = addGetHookIf( support.reliableMarginLeft, - function( elem, computed ) { - if ( computed ) { - return ( parseFloat( curCSS( elem, "marginLeft" ) ) || - elem.getBoundingClientRect().left - - swap( elem, { marginLeft: 0 }, function() { - return elem.getBoundingClientRect().left; - } ) - ) + "px"; - } - } -); - -// These hooks are used by animate to expand properties -jQuery.each( { - margin: "", - padding: "", - border: "Width" -}, function( prefix, suffix ) { - jQuery.cssHooks[ prefix + suffix ] = { - expand: function( value ) { - var i = 0, - expanded = {}, - - // Assumes a single number if not a string - parts = typeof value === "string" ? value.split( " " ) : [ value ]; - - for ( ; i < 4; i++ ) { - expanded[ prefix + cssExpand[ i ] + suffix ] = - parts[ i ] || parts[ i - 2 ] || parts[ 0 ]; - } - - return expanded; - } - }; - - if ( prefix !== "margin" ) { - jQuery.cssHooks[ prefix + suffix ].set = setPositiveNumber; - } -} ); - -jQuery.fn.extend( { - css: function( name, value ) { - return access( this, function( elem, name, value ) { - var styles, len, - map = {}, - i = 0; - - if ( Array.isArray( name ) ) { - styles = getStyles( elem ); - len = name.length; - - for ( ; i < len; i++ ) { - map[ name[ i ] ] = jQuery.css( elem, name[ i ], false, styles ); - } - - return map; - } - - return value !== undefined ? - jQuery.style( elem, name, value ) : - jQuery.css( elem, name ); - }, name, value, arguments.length > 1 ); - } -} ); - - -function Tween( elem, options, prop, end, easing ) { - return new Tween.prototype.init( elem, options, prop, end, easing ); -} -jQuery.Tween = Tween; - -Tween.prototype = { - constructor: Tween, - init: function( elem, options, prop, end, easing, unit ) { - this.elem = elem; - this.prop = prop; - this.easing = easing || jQuery.easing._default; - this.options = options; - this.start = this.now = this.cur(); - this.end = end; - this.unit = unit || ( jQuery.cssNumber[ prop ] ? "" : "px" ); - }, - cur: function() { - var hooks = Tween.propHooks[ this.prop ]; - - return hooks && hooks.get ? - hooks.get( this ) : - Tween.propHooks._default.get( this ); - }, - run: function( percent ) { - var eased, - hooks = Tween.propHooks[ this.prop ]; - - if ( this.options.duration ) { - this.pos = eased = jQuery.easing[ this.easing ]( - percent, this.options.duration * percent, 0, 1, this.options.duration - ); - } else { - this.pos = eased = percent; - } - this.now = ( this.end - this.start ) * eased + this.start; - - if ( this.options.step ) { - this.options.step.call( this.elem, this.now, this ); - } - - if ( hooks && hooks.set ) { - hooks.set( this ); - } else { - Tween.propHooks._default.set( this ); - } - return this; - } -}; - -Tween.prototype.init.prototype = Tween.prototype; - -Tween.propHooks = { - _default: { - get: function( tween ) { - var result; - - // Use a property on the element directly when it is not a DOM element, - // or when there is no matching style property that exists. - if ( tween.elem.nodeType !== 1 || - tween.elem[ tween.prop ] != null && tween.elem.style[ tween.prop ] == null ) { - return tween.elem[ tween.prop ]; - } - - // Passing an empty string as a 3rd parameter to .css will automatically - // attempt a parseFloat and fallback to a string if the parse fails. - // Simple values such as "10px" are parsed to Float; - // complex values such as "rotate(1rad)" are returned as-is. - result = jQuery.css( tween.elem, tween.prop, "" ); - - // Empty strings, null, undefined and "auto" are converted to 0. - return !result || result === "auto" ? 0 : result; - }, - set: function( tween ) { - - // Use step hook for back compat. - // Use cssHook if its there. - // Use .style if available and use plain properties where available. - if ( jQuery.fx.step[ tween.prop ] ) { - jQuery.fx.step[ tween.prop ]( tween ); - } else if ( tween.elem.nodeType === 1 && ( - jQuery.cssHooks[ tween.prop ] || - tween.elem.style[ finalPropName( tween.prop ) ] != null ) ) { - jQuery.style( tween.elem, tween.prop, tween.now + tween.unit ); - } else { - tween.elem[ tween.prop ] = tween.now; - } - } - } -}; - -// Support: IE <=9 only -// Panic based approach to setting things on disconnected nodes -Tween.propHooks.scrollTop = Tween.propHooks.scrollLeft = { - set: function( tween ) { - if ( tween.elem.nodeType && tween.elem.parentNode ) { - tween.elem[ tween.prop ] = tween.now; - } - } -}; - -jQuery.easing = { - linear: function( p ) { - return p; - }, - swing: function( p ) { - return 0.5 - Math.cos( p * Math.PI ) / 2; - }, - _default: "swing" -}; - -jQuery.fx = Tween.prototype.init; - -// Back compat <1.8 extension point -jQuery.fx.step = {}; - - - - -var - fxNow, inProgress, - rfxtypes = /^(?:toggle|show|hide)$/, - rrun = /queueHooks$/; - -function schedule() { - if ( inProgress ) { - if ( document.hidden === false && window.requestAnimationFrame ) { - window.requestAnimationFrame( schedule ); - } else { - window.setTimeout( schedule, jQuery.fx.interval ); - } - - jQuery.fx.tick(); - } -} - -// Animations created synchronously will run synchronously -function createFxNow() { - window.setTimeout( function() { - fxNow = undefined; - } ); - return ( fxNow = Date.now() ); -} - -// Generate parameters to create a standard animation -function genFx( type, includeWidth ) { - var which, - i = 0, - attrs = { height: type }; - - // If we include width, step value is 1 to do all cssExpand values, - // otherwise step value is 2 to skip over Left and Right - includeWidth = includeWidth ? 1 : 0; - for ( ; i < 4; i += 2 - includeWidth ) { - which = cssExpand[ i ]; - attrs[ "margin" + which ] = attrs[ "padding" + which ] = type; - } - - if ( includeWidth ) { - attrs.opacity = attrs.width = type; - } - - return attrs; -} - -function createTween( value, prop, animation ) { - var tween, - collection = ( Animation.tweeners[ prop ] || [] ).concat( Animation.tweeners[ "*" ] ), - index = 0, - length = collection.length; - for ( ; index < length; index++ ) { - if ( ( tween = collection[ index ].call( animation, prop, value ) ) ) { - - // We're done with this property - return tween; - } - } -} - -function defaultPrefilter( elem, props, opts ) { - var prop, value, toggle, hooks, oldfire, propTween, restoreDisplay, display, - isBox = "width" in props || "height" in props, - anim = this, - orig = {}, - style = elem.style, - hidden = elem.nodeType && isHiddenWithinTree( elem ), - dataShow = dataPriv.get( elem, "fxshow" ); - - // Queue-skipping animations hijack the fx hooks - if ( !opts.queue ) { - hooks = jQuery._queueHooks( elem, "fx" ); - if ( hooks.unqueued == null ) { - hooks.unqueued = 0; - oldfire = hooks.empty.fire; - hooks.empty.fire = function() { - if ( !hooks.unqueued ) { - oldfire(); - } - }; - } - hooks.unqueued++; - - anim.always( function() { - - // Ensure the complete handler is called before this completes - anim.always( function() { - hooks.unqueued--; - if ( !jQuery.queue( elem, "fx" ).length ) { - hooks.empty.fire(); - } - } ); - } ); - } - - // Detect show/hide animations - for ( prop in props ) { - value = props[ prop ]; - if ( rfxtypes.test( value ) ) { - delete props[ prop ]; - toggle = toggle || value === "toggle"; - if ( value === ( hidden ? "hide" : "show" ) ) { - - // Pretend to be hidden if this is a "show" and - // there is still data from a stopped show/hide - if ( value === "show" && dataShow && dataShow[ prop ] !== undefined ) { - hidden = true; - - // Ignore all other no-op show/hide data - } else { - continue; - } - } - orig[ prop ] = dataShow && dataShow[ prop ] || jQuery.style( elem, prop ); - } - } - - // Bail out if this is a no-op like .hide().hide() - propTween = !jQuery.isEmptyObject( props ); - if ( !propTween && jQuery.isEmptyObject( orig ) ) { - return; - } - - // Restrict "overflow" and "display" styles during box animations - if ( isBox && elem.nodeType === 1 ) { - - // Support: IE <=9 - 11, Edge 12 - 15 - // Record all 3 overflow attributes because IE does not infer the shorthand - // from identically-valued overflowX and overflowY and Edge just mirrors - // the overflowX value there. - opts.overflow = [ style.overflow, style.overflowX, style.overflowY ]; - - // Identify a display type, preferring old show/hide data over the CSS cascade - restoreDisplay = dataShow && dataShow.display; - if ( restoreDisplay == null ) { - restoreDisplay = dataPriv.get( elem, "display" ); - } - display = jQuery.css( elem, "display" ); - if ( display === "none" ) { - if ( restoreDisplay ) { - display = restoreDisplay; - } else { - - // Get nonempty value(s) by temporarily forcing visibility - showHide( [ elem ], true ); - restoreDisplay = elem.style.display || restoreDisplay; - display = jQuery.css( elem, "display" ); - showHide( [ elem ] ); - } - } - - // Animate inline elements as inline-block - if ( display === "inline" || display === "inline-block" && restoreDisplay != null ) { - if ( jQuery.css( elem, "float" ) === "none" ) { - - // Restore the original display value at the end of pure show/hide animations - if ( !propTween ) { - anim.done( function() { - style.display = restoreDisplay; - } ); - if ( restoreDisplay == null ) { - display = style.display; - restoreDisplay = display === "none" ? "" : display; - } - } - style.display = "inline-block"; - } - } - } - - if ( opts.overflow ) { - style.overflow = "hidden"; - anim.always( function() { - style.overflow = opts.overflow[ 0 ]; - style.overflowX = opts.overflow[ 1 ]; - style.overflowY = opts.overflow[ 2 ]; - } ); - } - - // Implement show/hide animations - propTween = false; - for ( prop in orig ) { - - // General show/hide setup for this element animation - if ( !propTween ) { - if ( dataShow ) { - if ( "hidden" in dataShow ) { - hidden = dataShow.hidden; - } - } else { - dataShow = dataPriv.access( elem, "fxshow", { display: restoreDisplay } ); - } - - // Store hidden/visible for toggle so `.stop().toggle()` "reverses" - if ( toggle ) { - dataShow.hidden = !hidden; - } - - // Show elements before animating them - if ( hidden ) { - showHide( [ elem ], true ); - } - - /* eslint-disable no-loop-func */ - - anim.done( function() { - - /* eslint-enable no-loop-func */ - - // The final step of a "hide" animation is actually hiding the element - if ( !hidden ) { - showHide( [ elem ] ); - } - dataPriv.remove( elem, "fxshow" ); - for ( prop in orig ) { - jQuery.style( elem, prop, orig[ prop ] ); - } - } ); - } - - // Per-property setup - propTween = createTween( hidden ? dataShow[ prop ] : 0, prop, anim ); - if ( !( prop in dataShow ) ) { - dataShow[ prop ] = propTween.start; - if ( hidden ) { - propTween.end = propTween.start; - propTween.start = 0; - } - } - } -} - -function propFilter( props, specialEasing ) { - var index, name, easing, value, hooks; - - // camelCase, specialEasing and expand cssHook pass - for ( index in props ) { - name = camelCase( index ); - easing = specialEasing[ name ]; - value = props[ index ]; - if ( Array.isArray( value ) ) { - easing = value[ 1 ]; - value = props[ index ] = value[ 0 ]; - } - - if ( index !== name ) { - props[ name ] = value; - delete props[ index ]; - } - - hooks = jQuery.cssHooks[ name ]; - if ( hooks && "expand" in hooks ) { - value = hooks.expand( value ); - delete props[ name ]; - - // Not quite $.extend, this won't overwrite existing keys. - // Reusing 'index' because we have the correct "name" - for ( index in value ) { - if ( !( index in props ) ) { - props[ index ] = value[ index ]; - specialEasing[ index ] = easing; - } - } - } else { - specialEasing[ name ] = easing; - } - } -} - -function Animation( elem, properties, options ) { - var result, - stopped, - index = 0, - length = Animation.prefilters.length, - deferred = jQuery.Deferred().always( function() { - - // Don't match elem in the :animated selector - delete tick.elem; - } ), - tick = function() { - if ( stopped ) { - return false; - } - var currentTime = fxNow || createFxNow(), - remaining = Math.max( 0, animation.startTime + animation.duration - currentTime ), - - // Support: Android 2.3 only - // Archaic crash bug won't allow us to use `1 - ( 0.5 || 0 )` (#12497) - temp = remaining / animation.duration || 0, - percent = 1 - temp, - index = 0, - length = animation.tweens.length; - - for ( ; index < length; index++ ) { - animation.tweens[ index ].run( percent ); - } - - deferred.notifyWith( elem, [ animation, percent, remaining ] ); - - // If there's more to do, yield - if ( percent < 1 && length ) { - return remaining; - } - - // If this was an empty animation, synthesize a final progress notification - if ( !length ) { - deferred.notifyWith( elem, [ animation, 1, 0 ] ); - } - - // Resolve the animation and report its conclusion - deferred.resolveWith( elem, [ animation ] ); - return false; - }, - animation = deferred.promise( { - elem: elem, - props: jQuery.extend( {}, properties ), - opts: jQuery.extend( true, { - specialEasing: {}, - easing: jQuery.easing._default - }, options ), - originalProperties: properties, - originalOptions: options, - startTime: fxNow || createFxNow(), - duration: options.duration, - tweens: [], - createTween: function( prop, end ) { - var tween = jQuery.Tween( elem, animation.opts, prop, end, - animation.opts.specialEasing[ prop ] || animation.opts.easing ); - animation.tweens.push( tween ); - return tween; - }, - stop: function( gotoEnd ) { - var index = 0, - - // If we are going to the end, we want to run all the tweens - // otherwise we skip this part - length = gotoEnd ? animation.tweens.length : 0; - if ( stopped ) { - return this; - } - stopped = true; - for ( ; index < length; index++ ) { - animation.tweens[ index ].run( 1 ); - } - - // Resolve when we played the last frame; otherwise, reject - if ( gotoEnd ) { - deferred.notifyWith( elem, [ animation, 1, 0 ] ); - deferred.resolveWith( elem, [ animation, gotoEnd ] ); - } else { - deferred.rejectWith( elem, [ animation, gotoEnd ] ); - } - return this; - } - } ), - props = animation.props; - - propFilter( props, animation.opts.specialEasing ); - - for ( ; index < length; index++ ) { - result = Animation.prefilters[ index ].call( animation, elem, props, animation.opts ); - if ( result ) { - if ( isFunction( result.stop ) ) { - jQuery._queueHooks( animation.elem, animation.opts.queue ).stop = - result.stop.bind( result ); - } - return result; - } - } - - jQuery.map( props, createTween, animation ); - - if ( isFunction( animation.opts.start ) ) { - animation.opts.start.call( elem, animation ); - } - - // Attach callbacks from options - animation - .progress( animation.opts.progress ) - .done( animation.opts.done, animation.opts.complete ) - .fail( animation.opts.fail ) - .always( animation.opts.always ); - - jQuery.fx.timer( - jQuery.extend( tick, { - elem: elem, - anim: animation, - queue: animation.opts.queue - } ) - ); - - return animation; -} - -jQuery.Animation = jQuery.extend( Animation, { - - tweeners: { - "*": [ function( prop, value ) { - var tween = this.createTween( prop, value ); - adjustCSS( tween.elem, prop, rcssNum.exec( value ), tween ); - return tween; - } ] - }, - - tweener: function( props, callback ) { - if ( isFunction( props ) ) { - callback = props; - props = [ "*" ]; - } else { - props = props.match( rnothtmlwhite ); - } - - var prop, - index = 0, - length = props.length; - - for ( ; index < length; index++ ) { - prop = props[ index ]; - Animation.tweeners[ prop ] = Animation.tweeners[ prop ] || []; - Animation.tweeners[ prop ].unshift( callback ); - } - }, - - prefilters: [ defaultPrefilter ], - - prefilter: function( callback, prepend ) { - if ( prepend ) { - Animation.prefilters.unshift( callback ); - } else { - Animation.prefilters.push( callback ); - } - } -} ); - -jQuery.speed = function( speed, easing, fn ) { - var opt = speed && typeof speed === "object" ? jQuery.extend( {}, speed ) : { - complete: fn || !fn && easing || - isFunction( speed ) && speed, - duration: speed, - easing: fn && easing || easing && !isFunction( easing ) && easing - }; - - // Go to the end state if fx are off - if ( jQuery.fx.off ) { - opt.duration = 0; - - } else { - if ( typeof opt.duration !== "number" ) { - if ( opt.duration in jQuery.fx.speeds ) { - opt.duration = jQuery.fx.speeds[ opt.duration ]; - - } else { - opt.duration = jQuery.fx.speeds._default; - } - } - } - - // Normalize opt.queue - true/undefined/null -> "fx" - if ( opt.queue == null || opt.queue === true ) { - opt.queue = "fx"; - } - - // Queueing - opt.old = opt.complete; - - opt.complete = function() { - if ( isFunction( opt.old ) ) { - opt.old.call( this ); - } - - if ( opt.queue ) { - jQuery.dequeue( this, opt.queue ); - } - }; - - return opt; -}; - -jQuery.fn.extend( { - fadeTo: function( speed, to, easing, callback ) { - - // Show any hidden elements after setting opacity to 0 - return this.filter( isHiddenWithinTree ).css( "opacity", 0 ).show() - - // Animate to the value specified - .end().animate( { opacity: to }, speed, easing, callback ); - }, - animate: function( prop, speed, easing, callback ) { - var empty = jQuery.isEmptyObject( prop ), - optall = jQuery.speed( speed, easing, callback ), - doAnimation = function() { - - // Operate on a copy of prop so per-property easing won't be lost - var anim = Animation( this, jQuery.extend( {}, prop ), optall ); - - // Empty animations, or finishing resolves immediately - if ( empty || dataPriv.get( this, "finish" ) ) { - anim.stop( true ); - } - }; - doAnimation.finish = doAnimation; - - return empty || optall.queue === false ? - this.each( doAnimation ) : - this.queue( optall.queue, doAnimation ); - }, - stop: function( type, clearQueue, gotoEnd ) { - var stopQueue = function( hooks ) { - var stop = hooks.stop; - delete hooks.stop; - stop( gotoEnd ); - }; - - if ( typeof type !== "string" ) { - gotoEnd = clearQueue; - clearQueue = type; - type = undefined; - } - if ( clearQueue ) { - this.queue( type || "fx", [] ); - } - - return this.each( function() { - var dequeue = true, - index = type != null && type + "queueHooks", - timers = jQuery.timers, - data = dataPriv.get( this ); - - if ( index ) { - if ( data[ index ] && data[ index ].stop ) { - stopQueue( data[ index ] ); - } - } else { - for ( index in data ) { - if ( data[ index ] && data[ index ].stop && rrun.test( index ) ) { - stopQueue( data[ index ] ); - } - } - } - - for ( index = timers.length; index--; ) { - if ( timers[ index ].elem === this && - ( type == null || timers[ index ].queue === type ) ) { - - timers[ index ].anim.stop( gotoEnd ); - dequeue = false; - timers.splice( index, 1 ); - } - } - - // Start the next in the queue if the last step wasn't forced. - // Timers currently will call their complete callbacks, which - // will dequeue but only if they were gotoEnd. - if ( dequeue || !gotoEnd ) { - jQuery.dequeue( this, type ); - } - } ); - }, - finish: function( type ) { - if ( type !== false ) { - type = type || "fx"; - } - return this.each( function() { - var index, - data = dataPriv.get( this ), - queue = data[ type + "queue" ], - hooks = data[ type + "queueHooks" ], - timers = jQuery.timers, - length = queue ? queue.length : 0; - - // Enable finishing flag on private data - data.finish = true; - - // Empty the queue first - jQuery.queue( this, type, [] ); - - if ( hooks && hooks.stop ) { - hooks.stop.call( this, true ); - } - - // Look for any active animations, and finish them - for ( index = timers.length; index--; ) { - if ( timers[ index ].elem === this && timers[ index ].queue === type ) { - timers[ index ].anim.stop( true ); - timers.splice( index, 1 ); - } - } - - // Look for any animations in the old queue and finish them - for ( index = 0; index < length; index++ ) { - if ( queue[ index ] && queue[ index ].finish ) { - queue[ index ].finish.call( this ); - } - } - - // Turn off finishing flag - delete data.finish; - } ); - } -} ); - -jQuery.each( [ "toggle", "show", "hide" ], function( _i, name ) { - var cssFn = jQuery.fn[ name ]; - jQuery.fn[ name ] = function( speed, easing, callback ) { - return speed == null || typeof speed === "boolean" ? - cssFn.apply( this, arguments ) : - this.animate( genFx( name, true ), speed, easing, callback ); - }; -} ); - -// Generate shortcuts for custom animations -jQuery.each( { - slideDown: genFx( "show" ), - slideUp: genFx( "hide" ), - slideToggle: genFx( "toggle" ), - fadeIn: { opacity: "show" }, - fadeOut: { opacity: "hide" }, - fadeToggle: { opacity: "toggle" } -}, function( name, props ) { - jQuery.fn[ name ] = function( speed, easing, callback ) { - return this.animate( props, speed, easing, callback ); - }; -} ); - -jQuery.timers = []; -jQuery.fx.tick = function() { - var timer, - i = 0, - timers = jQuery.timers; - - fxNow = Date.now(); - - for ( ; i < timers.length; i++ ) { - timer = timers[ i ]; - - // Run the timer and safely remove it when done (allowing for external removal) - if ( !timer() && timers[ i ] === timer ) { - timers.splice( i--, 1 ); - } - } - - if ( !timers.length ) { - jQuery.fx.stop(); - } - fxNow = undefined; -}; - -jQuery.fx.timer = function( timer ) { - jQuery.timers.push( timer ); - jQuery.fx.start(); -}; - -jQuery.fx.interval = 13; -jQuery.fx.start = function() { - if ( inProgress ) { - return; - } - - inProgress = true; - schedule(); -}; - -jQuery.fx.stop = function() { - inProgress = null; -}; - -jQuery.fx.speeds = { - slow: 600, - fast: 200, - - // Default speed - _default: 400 -}; - - -// Based off of the plugin by Clint Helfers, with permission. -// https://web.archive.org/web/20100324014747/http://blindsignals.com/index.php/2009/07/jquery-delay/ -jQuery.fn.delay = function( time, type ) { - time = jQuery.fx ? jQuery.fx.speeds[ time ] || time : time; - type = type || "fx"; - - return this.queue( type, function( next, hooks ) { - var timeout = window.setTimeout( next, time ); - hooks.stop = function() { - window.clearTimeout( timeout ); - }; - } ); -}; - - -( function() { - var input = document.createElement( "input" ), - select = document.createElement( "select" ), - opt = select.appendChild( document.createElement( "option" ) ); - - input.type = "checkbox"; - - // Support: Android <=4.3 only - // Default value for a checkbox should be "on" - support.checkOn = input.value !== ""; - - // Support: IE <=11 only - // Must access selectedIndex to make default options select - support.optSelected = opt.selected; - - // Support: IE <=11 only - // An input loses its value after becoming a radio - input = document.createElement( "input" ); - input.value = "t"; - input.type = "radio"; - support.radioValue = input.value === "t"; -} )(); - - -var boolHook, - attrHandle = jQuery.expr.attrHandle; - -jQuery.fn.extend( { - attr: function( name, value ) { - return access( this, jQuery.attr, name, value, arguments.length > 1 ); - }, - - removeAttr: function( name ) { - return this.each( function() { - jQuery.removeAttr( this, name ); - } ); - } -} ); - -jQuery.extend( { - attr: function( elem, name, value ) { - var ret, hooks, - nType = elem.nodeType; - - // Don't get/set attributes on text, comment and attribute nodes - if ( nType === 3 || nType === 8 || nType === 2 ) { - return; - } - - // Fallback to prop when attributes are not supported - if ( typeof elem.getAttribute === "undefined" ) { - return jQuery.prop( elem, name, value ); - } - - // Attribute hooks are determined by the lowercase version - // Grab necessary hook if one is defined - if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { - hooks = jQuery.attrHooks[ name.toLowerCase() ] || - ( jQuery.expr.match.bool.test( name ) ? boolHook : undefined ); - } - - if ( value !== undefined ) { - if ( value === null ) { - jQuery.removeAttr( elem, name ); - return; - } - - if ( hooks && "set" in hooks && - ( ret = hooks.set( elem, value, name ) ) !== undefined ) { - return ret; - } - - elem.setAttribute( name, value + "" ); - return value; - } - - if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { - return ret; - } - - ret = jQuery.find.attr( elem, name ); - - // Non-existent attributes return null, we normalize to undefined - return ret == null ? undefined : ret; - }, - - attrHooks: { - type: { - set: function( elem, value ) { - if ( !support.radioValue && value === "radio" && - nodeName( elem, "input" ) ) { - var val = elem.value; - elem.setAttribute( "type", value ); - if ( val ) { - elem.value = val; - } - return value; - } - } - } - }, - - removeAttr: function( elem, value ) { - var name, - i = 0, - - // Attribute names can contain non-HTML whitespace characters - // https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 - attrNames = value && value.match( rnothtmlwhite ); - - if ( attrNames && elem.nodeType === 1 ) { - while ( ( name = attrNames[ i++ ] ) ) { - elem.removeAttribute( name ); - } - } - } -} ); - -// Hooks for boolean attributes -boolHook = { - set: function( elem, value, name ) { - if ( value === false ) { - - // Remove boolean attributes when set to false - jQuery.removeAttr( elem, name ); - } else { - elem.setAttribute( name, name ); - } - return name; - } -}; - -jQuery.each( jQuery.expr.match.bool.source.match( /\w+/g ), function( _i, name ) { - var getter = attrHandle[ name ] || jQuery.find.attr; - - attrHandle[ name ] = function( elem, name, isXML ) { - var ret, handle, - lowercaseName = name.toLowerCase(); - - if ( !isXML ) { - - // Avoid an infinite loop by temporarily removing this function from the getter - handle = attrHandle[ lowercaseName ]; - attrHandle[ lowercaseName ] = ret; - ret = getter( elem, name, isXML ) != null ? - lowercaseName : - null; - attrHandle[ lowercaseName ] = handle; - } - return ret; - }; -} ); - - - - -var rfocusable = /^(?:input|select|textarea|button)$/i, - rclickable = /^(?:a|area)$/i; - -jQuery.fn.extend( { - prop: function( name, value ) { - return access( this, jQuery.prop, name, value, arguments.length > 1 ); - }, - - removeProp: function( name ) { - return this.each( function() { - delete this[ jQuery.propFix[ name ] || name ]; - } ); - } -} ); - -jQuery.extend( { - prop: function( elem, name, value ) { - var ret, hooks, - nType = elem.nodeType; - - // Don't get/set properties on text, comment and attribute nodes - if ( nType === 3 || nType === 8 || nType === 2 ) { - return; - } - - if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { - - // Fix name and attach hooks - name = jQuery.propFix[ name ] || name; - hooks = jQuery.propHooks[ name ]; - } - - if ( value !== undefined ) { - if ( hooks && "set" in hooks && - ( ret = hooks.set( elem, value, name ) ) !== undefined ) { - return ret; - } - - return ( elem[ name ] = value ); - } - - if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { - return ret; - } - - return elem[ name ]; - }, - - propHooks: { - tabIndex: { - get: function( elem ) { - - // Support: IE <=9 - 11 only - // elem.tabIndex doesn't always return the - // correct value when it hasn't been explicitly set - // https://web.archive.org/web/20141116233347/http://fluidproject.org/blog/2008/01/09/getting-setting-and-removing-tabindex-values-with-javascript/ - // Use proper attribute retrieval(#12072) - var tabindex = jQuery.find.attr( elem, "tabindex" ); - - if ( tabindex ) { - return parseInt( tabindex, 10 ); - } - - if ( - rfocusable.test( elem.nodeName ) || - rclickable.test( elem.nodeName ) && - elem.href - ) { - return 0; - } - - return -1; - } - } - }, - - propFix: { - "for": "htmlFor", - "class": "className" - } -} ); - -// Support: IE <=11 only -// Accessing the selectedIndex property -// forces the browser to respect setting selected -// on the option -// The getter ensures a default option is selected -// when in an optgroup -// eslint rule "no-unused-expressions" is disabled for this code -// since it considers such accessions noop -if ( !support.optSelected ) { - jQuery.propHooks.selected = { - get: function( elem ) { - - /* eslint no-unused-expressions: "off" */ - - var parent = elem.parentNode; - if ( parent && parent.parentNode ) { - parent.parentNode.selectedIndex; - } - return null; - }, - set: function( elem ) { - - /* eslint no-unused-expressions: "off" */ - - var parent = elem.parentNode; - if ( parent ) { - parent.selectedIndex; - - if ( parent.parentNode ) { - parent.parentNode.selectedIndex; - } - } - } - }; -} - -jQuery.each( [ - "tabIndex", - "readOnly", - "maxLength", - "cellSpacing", - "cellPadding", - "rowSpan", - "colSpan", - "useMap", - "frameBorder", - "contentEditable" -], function() { - jQuery.propFix[ this.toLowerCase() ] = this; -} ); - - - - - // Strip and collapse whitespace according to HTML spec - // https://infra.spec.whatwg.org/#strip-and-collapse-ascii-whitespace - function stripAndCollapse( value ) { - var tokens = value.match( rnothtmlwhite ) || []; - return tokens.join( " " ); - } - - -function getClass( elem ) { - return elem.getAttribute && elem.getAttribute( "class" ) || ""; -} - -function classesToArray( value ) { - if ( Array.isArray( value ) ) { - return value; - } - if ( typeof value === "string" ) { - return value.match( rnothtmlwhite ) || []; - } - return []; -} - -jQuery.fn.extend( { - addClass: function( value ) { - var classes, elem, cur, curValue, clazz, j, finalValue, - i = 0; - - if ( isFunction( value ) ) { - return this.each( function( j ) { - jQuery( this ).addClass( value.call( this, j, getClass( this ) ) ); - } ); - } - - classes = classesToArray( value ); - - if ( classes.length ) { - while ( ( elem = this[ i++ ] ) ) { - curValue = getClass( elem ); - cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); - - if ( cur ) { - j = 0; - while ( ( clazz = classes[ j++ ] ) ) { - if ( cur.indexOf( " " + clazz + " " ) < 0 ) { - cur += clazz + " "; - } - } - - // Only assign if different to avoid unneeded rendering. - finalValue = stripAndCollapse( cur ); - if ( curValue !== finalValue ) { - elem.setAttribute( "class", finalValue ); - } - } - } - } - - return this; - }, - - removeClass: function( value ) { - var classes, elem, cur, curValue, clazz, j, finalValue, - i = 0; - - if ( isFunction( value ) ) { - return this.each( function( j ) { - jQuery( this ).removeClass( value.call( this, j, getClass( this ) ) ); - } ); - } - - if ( !arguments.length ) { - return this.attr( "class", "" ); - } - - classes = classesToArray( value ); - - if ( classes.length ) { - while ( ( elem = this[ i++ ] ) ) { - curValue = getClass( elem ); - - // This expression is here for better compressibility (see addClass) - cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); - - if ( cur ) { - j = 0; - while ( ( clazz = classes[ j++ ] ) ) { - - // Remove *all* instances - while ( cur.indexOf( " " + clazz + " " ) > -1 ) { - cur = cur.replace( " " + clazz + " ", " " ); - } - } - - // Only assign if different to avoid unneeded rendering. - finalValue = stripAndCollapse( cur ); - if ( curValue !== finalValue ) { - elem.setAttribute( "class", finalValue ); - } - } - } - } - - return this; - }, - - toggleClass: function( value, stateVal ) { - var type = typeof value, - isValidValue = type === "string" || Array.isArray( value ); - - if ( typeof stateVal === "boolean" && isValidValue ) { - return stateVal ? this.addClass( value ) : this.removeClass( value ); - } - - if ( isFunction( value ) ) { - return this.each( function( i ) { - jQuery( this ).toggleClass( - value.call( this, i, getClass( this ), stateVal ), - stateVal - ); - } ); - } - - return this.each( function() { - var className, i, self, classNames; - - if ( isValidValue ) { - - // Toggle individual class names - i = 0; - self = jQuery( this ); - classNames = classesToArray( value ); - - while ( ( className = classNames[ i++ ] ) ) { - - // Check each className given, space separated list - if ( self.hasClass( className ) ) { - self.removeClass( className ); - } else { - self.addClass( className ); - } - } - - // Toggle whole class name - } else if ( value === undefined || type === "boolean" ) { - className = getClass( this ); - if ( className ) { - - // Store className if set - dataPriv.set( this, "__className__", className ); - } - - // If the element has a class name or if we're passed `false`, - // then remove the whole classname (if there was one, the above saved it). - // Otherwise bring back whatever was previously saved (if anything), - // falling back to the empty string if nothing was stored. - if ( this.setAttribute ) { - this.setAttribute( "class", - className || value === false ? - "" : - dataPriv.get( this, "__className__" ) || "" - ); - } - } - } ); - }, - - hasClass: function( selector ) { - var className, elem, - i = 0; - - className = " " + selector + " "; - while ( ( elem = this[ i++ ] ) ) { - if ( elem.nodeType === 1 && - ( " " + stripAndCollapse( getClass( elem ) ) + " " ).indexOf( className ) > -1 ) { - return true; - } - } - - return false; - } -} ); - - - - -var rreturn = /\r/g; - -jQuery.fn.extend( { - val: function( value ) { - var hooks, ret, valueIsFunction, - elem = this[ 0 ]; - - if ( !arguments.length ) { - if ( elem ) { - hooks = jQuery.valHooks[ elem.type ] || - jQuery.valHooks[ elem.nodeName.toLowerCase() ]; - - if ( hooks && - "get" in hooks && - ( ret = hooks.get( elem, "value" ) ) !== undefined - ) { - return ret; - } - - ret = elem.value; - - // Handle most common string cases - if ( typeof ret === "string" ) { - return ret.replace( rreturn, "" ); - } - - // Handle cases where value is null/undef or number - return ret == null ? "" : ret; - } - - return; - } - - valueIsFunction = isFunction( value ); - - return this.each( function( i ) { - var val; - - if ( this.nodeType !== 1 ) { - return; - } - - if ( valueIsFunction ) { - val = value.call( this, i, jQuery( this ).val() ); - } else { - val = value; - } - - // Treat null/undefined as ""; convert numbers to string - if ( val == null ) { - val = ""; - - } else if ( typeof val === "number" ) { - val += ""; - - } else if ( Array.isArray( val ) ) { - val = jQuery.map( val, function( value ) { - return value == null ? "" : value + ""; - } ); - } - - hooks = jQuery.valHooks[ this.type ] || jQuery.valHooks[ this.nodeName.toLowerCase() ]; - - // If set returns undefined, fall back to normal setting - if ( !hooks || !( "set" in hooks ) || hooks.set( this, val, "value" ) === undefined ) { - this.value = val; - } - } ); - } -} ); - -jQuery.extend( { - valHooks: { - option: { - get: function( elem ) { - - var val = jQuery.find.attr( elem, "value" ); - return val != null ? - val : - - // Support: IE <=10 - 11 only - // option.text throws exceptions (#14686, #14858) - // Strip and collapse whitespace - // https://html.spec.whatwg.org/#strip-and-collapse-whitespace - stripAndCollapse( jQuery.text( elem ) ); - } - }, - select: { - get: function( elem ) { - var value, option, i, - options = elem.options, - index = elem.selectedIndex, - one = elem.type === "select-one", - values = one ? null : [], - max = one ? index + 1 : options.length; - - if ( index < 0 ) { - i = max; - - } else { - i = one ? index : 0; - } - - // Loop through all the selected options - for ( ; i < max; i++ ) { - option = options[ i ]; - - // Support: IE <=9 only - // IE8-9 doesn't update selected after form reset (#2551) - if ( ( option.selected || i === index ) && - - // Don't return options that are disabled or in a disabled optgroup - !option.disabled && - ( !option.parentNode.disabled || - !nodeName( option.parentNode, "optgroup" ) ) ) { - - // Get the specific value for the option - value = jQuery( option ).val(); - - // We don't need an array for one selects - if ( one ) { - return value; - } - - // Multi-Selects return an array - values.push( value ); - } - } - - return values; - }, - - set: function( elem, value ) { - var optionSet, option, - options = elem.options, - values = jQuery.makeArray( value ), - i = options.length; - - while ( i-- ) { - option = options[ i ]; - - /* eslint-disable no-cond-assign */ - - if ( option.selected = - jQuery.inArray( jQuery.valHooks.option.get( option ), values ) > -1 - ) { - optionSet = true; - } - - /* eslint-enable no-cond-assign */ - } - - // Force browsers to behave consistently when non-matching value is set - if ( !optionSet ) { - elem.selectedIndex = -1; - } - return values; - } - } - } -} ); - -// Radios and checkboxes getter/setter -jQuery.each( [ "radio", "checkbox" ], function() { - jQuery.valHooks[ this ] = { - set: function( elem, value ) { - if ( Array.isArray( value ) ) { - return ( elem.checked = jQuery.inArray( jQuery( elem ).val(), value ) > -1 ); - } - } - }; - if ( !support.checkOn ) { - jQuery.valHooks[ this ].get = function( elem ) { - return elem.getAttribute( "value" ) === null ? "on" : elem.value; - }; - } -} ); - - - - -// Return jQuery for attributes-only inclusion - - -support.focusin = "onfocusin" in window; - - -var rfocusMorph = /^(?:focusinfocus|focusoutblur)$/, - stopPropagationCallback = function( e ) { - e.stopPropagation(); - }; - -jQuery.extend( jQuery.event, { - - trigger: function( event, data, elem, onlyHandlers ) { - - var i, cur, tmp, bubbleType, ontype, handle, special, lastElement, - eventPath = [ elem || document ], - type = hasOwn.call( event, "type" ) ? event.type : event, - namespaces = hasOwn.call( event, "namespace" ) ? event.namespace.split( "." ) : []; - - cur = lastElement = tmp = elem = elem || document; - - // Don't do events on text and comment nodes - if ( elem.nodeType === 3 || elem.nodeType === 8 ) { - return; - } - - // focus/blur morphs to focusin/out; ensure we're not firing them right now - if ( rfocusMorph.test( type + jQuery.event.triggered ) ) { - return; - } - - if ( type.indexOf( "." ) > -1 ) { - - // Namespaced trigger; create a regexp to match event type in handle() - namespaces = type.split( "." ); - type = namespaces.shift(); - namespaces.sort(); - } - ontype = type.indexOf( ":" ) < 0 && "on" + type; - - // Caller can pass in a jQuery.Event object, Object, or just an event type string - event = event[ jQuery.expando ] ? - event : - new jQuery.Event( type, typeof event === "object" && event ); - - // Trigger bitmask: & 1 for native handlers; & 2 for jQuery (always true) - event.isTrigger = onlyHandlers ? 2 : 3; - event.namespace = namespaces.join( "." ); - event.rnamespace = event.namespace ? - new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ) : - null; - - // Clean up the event in case it is being reused - event.result = undefined; - if ( !event.target ) { - event.target = elem; - } - - // Clone any incoming data and prepend the event, creating the handler arg list - data = data == null ? - [ event ] : - jQuery.makeArray( data, [ event ] ); - - // Allow special events to draw outside the lines - special = jQuery.event.special[ type ] || {}; - if ( !onlyHandlers && special.trigger && special.trigger.apply( elem, data ) === false ) { - return; - } - - // Determine event propagation path in advance, per W3C events spec (#9951) - // Bubble up to document, then to window; watch for a global ownerDocument var (#9724) - if ( !onlyHandlers && !special.noBubble && !isWindow( elem ) ) { - - bubbleType = special.delegateType || type; - if ( !rfocusMorph.test( bubbleType + type ) ) { - cur = cur.parentNode; - } - for ( ; cur; cur = cur.parentNode ) { - eventPath.push( cur ); - tmp = cur; - } - - // Only add window if we got to document (e.g., not plain obj or detached DOM) - if ( tmp === ( elem.ownerDocument || document ) ) { - eventPath.push( tmp.defaultView || tmp.parentWindow || window ); - } - } - - // Fire handlers on the event path - i = 0; - while ( ( cur = eventPath[ i++ ] ) && !event.isPropagationStopped() ) { - lastElement = cur; - event.type = i > 1 ? - bubbleType : - special.bindType || type; - - // jQuery handler - handle = ( - dataPriv.get( cur, "events" ) || Object.create( null ) - )[ event.type ] && - dataPriv.get( cur, "handle" ); - if ( handle ) { - handle.apply( cur, data ); - } - - // Native handler - handle = ontype && cur[ ontype ]; - if ( handle && handle.apply && acceptData( cur ) ) { - event.result = handle.apply( cur, data ); - if ( event.result === false ) { - event.preventDefault(); - } - } - } - event.type = type; - - // If nobody prevented the default action, do it now - if ( !onlyHandlers && !event.isDefaultPrevented() ) { - - if ( ( !special._default || - special._default.apply( eventPath.pop(), data ) === false ) && - acceptData( elem ) ) { - - // Call a native DOM method on the target with the same name as the event. - // Don't do default actions on window, that's where global variables be (#6170) - if ( ontype && isFunction( elem[ type ] ) && !isWindow( elem ) ) { - - // Don't re-trigger an onFOO event when we call its FOO() method - tmp = elem[ ontype ]; - - if ( tmp ) { - elem[ ontype ] = null; - } - - // Prevent re-triggering of the same event, since we already bubbled it above - jQuery.event.triggered = type; - - if ( event.isPropagationStopped() ) { - lastElement.addEventListener( type, stopPropagationCallback ); - } - - elem[ type ](); - - if ( event.isPropagationStopped() ) { - lastElement.removeEventListener( type, stopPropagationCallback ); - } - - jQuery.event.triggered = undefined; - - if ( tmp ) { - elem[ ontype ] = tmp; - } - } - } - } - - return event.result; - }, - - // Piggyback on a donor event to simulate a different one - // Used only for `focus(in | out)` events - simulate: function( type, elem, event ) { - var e = jQuery.extend( - new jQuery.Event(), - event, - { - type: type, - isSimulated: true - } - ); - - jQuery.event.trigger( e, null, elem ); - } - -} ); - -jQuery.fn.extend( { - - trigger: function( type, data ) { - return this.each( function() { - jQuery.event.trigger( type, data, this ); - } ); - }, - triggerHandler: function( type, data ) { - var elem = this[ 0 ]; - if ( elem ) { - return jQuery.event.trigger( type, data, elem, true ); - } - } -} ); - - -// Support: Firefox <=44 -// Firefox doesn't have focus(in | out) events -// Related ticket - https://bugzilla.mozilla.org/show_bug.cgi?id=687787 -// -// Support: Chrome <=48 - 49, Safari <=9.0 - 9.1 -// focus(in | out) events fire after focus & blur events, -// which is spec violation - http://www.w3.org/TR/DOM-Level-3-Events/#events-focusevent-event-order -// Related ticket - https://bugs.chromium.org/p/chromium/issues/detail?id=449857 -if ( !support.focusin ) { - jQuery.each( { focus: "focusin", blur: "focusout" }, function( orig, fix ) { - - // Attach a single capturing handler on the document while someone wants focusin/focusout - var handler = function( event ) { - jQuery.event.simulate( fix, event.target, jQuery.event.fix( event ) ); - }; - - jQuery.event.special[ fix ] = { - setup: function() { - - // Handle: regular nodes (via `this.ownerDocument`), window - // (via `this.document`) & document (via `this`). - var doc = this.ownerDocument || this.document || this, - attaches = dataPriv.access( doc, fix ); - - if ( !attaches ) { - doc.addEventListener( orig, handler, true ); - } - dataPriv.access( doc, fix, ( attaches || 0 ) + 1 ); - }, - teardown: function() { - var doc = this.ownerDocument || this.document || this, - attaches = dataPriv.access( doc, fix ) - 1; - - if ( !attaches ) { - doc.removeEventListener( orig, handler, true ); - dataPriv.remove( doc, fix ); - - } else { - dataPriv.access( doc, fix, attaches ); - } - } - }; - } ); -} -var location = window.location; - -var nonce = { guid: Date.now() }; - -var rquery = ( /\?/ ); - - - -// Cross-browser xml parsing -jQuery.parseXML = function( data ) { - var xml; - if ( !data || typeof data !== "string" ) { - return null; - } - - // Support: IE 9 - 11 only - // IE throws on parseFromString with invalid input. - try { - xml = ( new window.DOMParser() ).parseFromString( data, "text/xml" ); - } catch ( e ) { - xml = undefined; - } - - if ( !xml || xml.getElementsByTagName( "parsererror" ).length ) { - jQuery.error( "Invalid XML: " + data ); - } - return xml; -}; - - -var - rbracket = /\[\]$/, - rCRLF = /\r?\n/g, - rsubmitterTypes = /^(?:submit|button|image|reset|file)$/i, - rsubmittable = /^(?:input|select|textarea|keygen)/i; - -function buildParams( prefix, obj, traditional, add ) { - var name; - - if ( Array.isArray( obj ) ) { - - // Serialize array item. - jQuery.each( obj, function( i, v ) { - if ( traditional || rbracket.test( prefix ) ) { - - // Treat each array item as a scalar. - add( prefix, v ); - - } else { - - // Item is non-scalar (array or object), encode its numeric index. - buildParams( - prefix + "[" + ( typeof v === "object" && v != null ? i : "" ) + "]", - v, - traditional, - add - ); - } - } ); - - } else if ( !traditional && toType( obj ) === "object" ) { - - // Serialize object item. - for ( name in obj ) { - buildParams( prefix + "[" + name + "]", obj[ name ], traditional, add ); - } - - } else { - - // Serialize scalar item. - add( prefix, obj ); - } -} - -// Serialize an array of form elements or a set of -// key/values into a query string -jQuery.param = function( a, traditional ) { - var prefix, - s = [], - add = function( key, valueOrFunction ) { - - // If value is a function, invoke it and use its return value - var value = isFunction( valueOrFunction ) ? - valueOrFunction() : - valueOrFunction; - - s[ s.length ] = encodeURIComponent( key ) + "=" + - encodeURIComponent( value == null ? "" : value ); - }; - - if ( a == null ) { - return ""; - } - - // If an array was passed in, assume that it is an array of form elements. - if ( Array.isArray( a ) || ( a.jquery && !jQuery.isPlainObject( a ) ) ) { - - // Serialize the form elements - jQuery.each( a, function() { - add( this.name, this.value ); - } ); - - } else { - - // If traditional, encode the "old" way (the way 1.3.2 or older - // did it), otherwise encode params recursively. - for ( prefix in a ) { - buildParams( prefix, a[ prefix ], traditional, add ); - } - } - - // Return the resulting serialization - return s.join( "&" ); -}; - -jQuery.fn.extend( { - serialize: function() { - return jQuery.param( this.serializeArray() ); - }, - serializeArray: function() { - return this.map( function() { - - // Can add propHook for "elements" to filter or add form elements - var elements = jQuery.prop( this, "elements" ); - return elements ? jQuery.makeArray( elements ) : this; - } ) - .filter( function() { - var type = this.type; - - // Use .is( ":disabled" ) so that fieldset[disabled] works - return this.name && !jQuery( this ).is( ":disabled" ) && - rsubmittable.test( this.nodeName ) && !rsubmitterTypes.test( type ) && - ( this.checked || !rcheckableType.test( type ) ); - } ) - .map( function( _i, elem ) { - var val = jQuery( this ).val(); - - if ( val == null ) { - return null; - } - - if ( Array.isArray( val ) ) { - return jQuery.map( val, function( val ) { - return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; - } ); - } - - return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; - } ).get(); - } -} ); - - -var - r20 = /%20/g, - rhash = /#.*$/, - rantiCache = /([?&])_=[^&]*/, - rheaders = /^(.*?):[ \t]*([^\r\n]*)$/mg, - - // #7653, #8125, #8152: local protocol detection - rlocalProtocol = /^(?:about|app|app-storage|.+-extension|file|res|widget):$/, - rnoContent = /^(?:GET|HEAD)$/, - rprotocol = /^\/\//, - - /* Prefilters - * 1) They are useful to introduce custom dataTypes (see ajax/jsonp.js for an example) - * 2) These are called: - * - BEFORE asking for a transport - * - AFTER param serialization (s.data is a string if s.processData is true) - * 3) key is the dataType - * 4) the catchall symbol "*" can be used - * 5) execution will start with transport dataType and THEN continue down to "*" if needed - */ - prefilters = {}, - - /* Transports bindings - * 1) key is the dataType - * 2) the catchall symbol "*" can be used - * 3) selection will start with transport dataType and THEN go to "*" if needed - */ - transports = {}, - - // Avoid comment-prolog char sequence (#10098); must appease lint and evade compression - allTypes = "*/".concat( "*" ), - - // Anchor tag for parsing the document origin - originAnchor = document.createElement( "a" ); - originAnchor.href = location.href; - -// Base "constructor" for jQuery.ajaxPrefilter and jQuery.ajaxTransport -function addToPrefiltersOrTransports( structure ) { - - // dataTypeExpression is optional and defaults to "*" - return function( dataTypeExpression, func ) { - - if ( typeof dataTypeExpression !== "string" ) { - func = dataTypeExpression; - dataTypeExpression = "*"; - } - - var dataType, - i = 0, - dataTypes = dataTypeExpression.toLowerCase().match( rnothtmlwhite ) || []; - - if ( isFunction( func ) ) { - - // For each dataType in the dataTypeExpression - while ( ( dataType = dataTypes[ i++ ] ) ) { - - // Prepend if requested - if ( dataType[ 0 ] === "+" ) { - dataType = dataType.slice( 1 ) || "*"; - ( structure[ dataType ] = structure[ dataType ] || [] ).unshift( func ); - - // Otherwise append - } else { - ( structure[ dataType ] = structure[ dataType ] || [] ).push( func ); - } - } - } - }; -} - -// Base inspection function for prefilters and transports -function inspectPrefiltersOrTransports( structure, options, originalOptions, jqXHR ) { - - var inspected = {}, - seekingTransport = ( structure === transports ); - - function inspect( dataType ) { - var selected; - inspected[ dataType ] = true; - jQuery.each( structure[ dataType ] || [], function( _, prefilterOrFactory ) { - var dataTypeOrTransport = prefilterOrFactory( options, originalOptions, jqXHR ); - if ( typeof dataTypeOrTransport === "string" && - !seekingTransport && !inspected[ dataTypeOrTransport ] ) { - - options.dataTypes.unshift( dataTypeOrTransport ); - inspect( dataTypeOrTransport ); - return false; - } else if ( seekingTransport ) { - return !( selected = dataTypeOrTransport ); - } - } ); - return selected; - } - - return inspect( options.dataTypes[ 0 ] ) || !inspected[ "*" ] && inspect( "*" ); -} - -// A special extend for ajax options -// that takes "flat" options (not to be deep extended) -// Fixes #9887 -function ajaxExtend( target, src ) { - var key, deep, - flatOptions = jQuery.ajaxSettings.flatOptions || {}; - - for ( key in src ) { - if ( src[ key ] !== undefined ) { - ( flatOptions[ key ] ? target : ( deep || ( deep = {} ) ) )[ key ] = src[ key ]; - } - } - if ( deep ) { - jQuery.extend( true, target, deep ); - } - - return target; -} - -/* Handles responses to an ajax request: - * - finds the right dataType (mediates between content-type and expected dataType) - * - returns the corresponding response - */ -function ajaxHandleResponses( s, jqXHR, responses ) { - - var ct, type, finalDataType, firstDataType, - contents = s.contents, - dataTypes = s.dataTypes; - - // Remove auto dataType and get content-type in the process - while ( dataTypes[ 0 ] === "*" ) { - dataTypes.shift(); - if ( ct === undefined ) { - ct = s.mimeType || jqXHR.getResponseHeader( "Content-Type" ); - } - } - - // Check if we're dealing with a known content-type - if ( ct ) { - for ( type in contents ) { - if ( contents[ type ] && contents[ type ].test( ct ) ) { - dataTypes.unshift( type ); - break; - } - } - } - - // Check to see if we have a response for the expected dataType - if ( dataTypes[ 0 ] in responses ) { - finalDataType = dataTypes[ 0 ]; - } else { - - // Try convertible dataTypes - for ( type in responses ) { - if ( !dataTypes[ 0 ] || s.converters[ type + " " + dataTypes[ 0 ] ] ) { - finalDataType = type; - break; - } - if ( !firstDataType ) { - firstDataType = type; - } - } - - // Or just use first one - finalDataType = finalDataType || firstDataType; - } - - // If we found a dataType - // We add the dataType to the list if needed - // and return the corresponding response - if ( finalDataType ) { - if ( finalDataType !== dataTypes[ 0 ] ) { - dataTypes.unshift( finalDataType ); - } - return responses[ finalDataType ]; - } -} - -/* Chain conversions given the request and the original response - * Also sets the responseXXX fields on the jqXHR instance - */ -function ajaxConvert( s, response, jqXHR, isSuccess ) { - var conv2, current, conv, tmp, prev, - converters = {}, - - // Work with a copy of dataTypes in case we need to modify it for conversion - dataTypes = s.dataTypes.slice(); - - // Create converters map with lowercased keys - if ( dataTypes[ 1 ] ) { - for ( conv in s.converters ) { - converters[ conv.toLowerCase() ] = s.converters[ conv ]; - } - } - - current = dataTypes.shift(); - - // Convert to each sequential dataType - while ( current ) { - - if ( s.responseFields[ current ] ) { - jqXHR[ s.responseFields[ current ] ] = response; - } - - // Apply the dataFilter if provided - if ( !prev && isSuccess && s.dataFilter ) { - response = s.dataFilter( response, s.dataType ); - } - - prev = current; - current = dataTypes.shift(); - - if ( current ) { - - // There's only work to do if current dataType is non-auto - if ( current === "*" ) { - - current = prev; - - // Convert response if prev dataType is non-auto and differs from current - } else if ( prev !== "*" && prev !== current ) { - - // Seek a direct converter - conv = converters[ prev + " " + current ] || converters[ "* " + current ]; - - // If none found, seek a pair - if ( !conv ) { - for ( conv2 in converters ) { - - // If conv2 outputs current - tmp = conv2.split( " " ); - if ( tmp[ 1 ] === current ) { - - // If prev can be converted to accepted input - conv = converters[ prev + " " + tmp[ 0 ] ] || - converters[ "* " + tmp[ 0 ] ]; - if ( conv ) { - - // Condense equivalence converters - if ( conv === true ) { - conv = converters[ conv2 ]; - - // Otherwise, insert the intermediate dataType - } else if ( converters[ conv2 ] !== true ) { - current = tmp[ 0 ]; - dataTypes.unshift( tmp[ 1 ] ); - } - break; - } - } - } - } - - // Apply converter (if not an equivalence) - if ( conv !== true ) { - - // Unless errors are allowed to bubble, catch and return them - if ( conv && s.throws ) { - response = conv( response ); - } else { - try { - response = conv( response ); - } catch ( e ) { - return { - state: "parsererror", - error: conv ? e : "No conversion from " + prev + " to " + current - }; - } - } - } - } - } - } - - return { state: "success", data: response }; -} - -jQuery.extend( { - - // Counter for holding the number of active queries - active: 0, - - // Last-Modified header cache for next request - lastModified: {}, - etag: {}, - - ajaxSettings: { - url: location.href, - type: "GET", - isLocal: rlocalProtocol.test( location.protocol ), - global: true, - processData: true, - async: true, - contentType: "application/x-www-form-urlencoded; charset=UTF-8", - - /* - timeout: 0, - data: null, - dataType: null, - username: null, - password: null, - cache: null, - throws: false, - traditional: false, - headers: {}, - */ - - accepts: { - "*": allTypes, - text: "text/plain", - html: "text/html", - xml: "application/xml, text/xml", - json: "application/json, text/javascript" - }, - - contents: { - xml: /\bxml\b/, - html: /\bhtml/, - json: /\bjson\b/ - }, - - responseFields: { - xml: "responseXML", - text: "responseText", - json: "responseJSON" - }, - - // Data converters - // Keys separate source (or catchall "*") and destination types with a single space - converters: { - - // Convert anything to text - "* text": String, - - // Text to html (true = no transformation) - "text html": true, - - // Evaluate text as a json expression - "text json": JSON.parse, - - // Parse text as xml - "text xml": jQuery.parseXML - }, - - // For options that shouldn't be deep extended: - // you can add your own custom options here if - // and when you create one that shouldn't be - // deep extended (see ajaxExtend) - flatOptions: { - url: true, - context: true - } - }, - - // Creates a full fledged settings object into target - // with both ajaxSettings and settings fields. - // If target is omitted, writes into ajaxSettings. - ajaxSetup: function( target, settings ) { - return settings ? - - // Building a settings object - ajaxExtend( ajaxExtend( target, jQuery.ajaxSettings ), settings ) : - - // Extending ajaxSettings - ajaxExtend( jQuery.ajaxSettings, target ); - }, - - ajaxPrefilter: addToPrefiltersOrTransports( prefilters ), - ajaxTransport: addToPrefiltersOrTransports( transports ), - - // Main method - ajax: function( url, options ) { - - // If url is an object, simulate pre-1.5 signature - if ( typeof url === "object" ) { - options = url; - url = undefined; - } - - // Force options to be an object - options = options || {}; - - var transport, - - // URL without anti-cache param - cacheURL, - - // Response headers - responseHeadersString, - responseHeaders, - - // timeout handle - timeoutTimer, - - // Url cleanup var - urlAnchor, - - // Request state (becomes false upon send and true upon completion) - completed, - - // To know if global events are to be dispatched - fireGlobals, - - // Loop variable - i, - - // uncached part of the url - uncached, - - // Create the final options object - s = jQuery.ajaxSetup( {}, options ), - - // Callbacks context - callbackContext = s.context || s, - - // Context for global events is callbackContext if it is a DOM node or jQuery collection - globalEventContext = s.context && - ( callbackContext.nodeType || callbackContext.jquery ) ? - jQuery( callbackContext ) : - jQuery.event, - - // Deferreds - deferred = jQuery.Deferred(), - completeDeferred = jQuery.Callbacks( "once memory" ), - - // Status-dependent callbacks - statusCode = s.statusCode || {}, - - // Headers (they are sent all at once) - requestHeaders = {}, - requestHeadersNames = {}, - - // Default abort message - strAbort = "canceled", - - // Fake xhr - jqXHR = { - readyState: 0, - - // Builds headers hashtable if needed - getResponseHeader: function( key ) { - var match; - if ( completed ) { - if ( !responseHeaders ) { - responseHeaders = {}; - while ( ( match = rheaders.exec( responseHeadersString ) ) ) { - responseHeaders[ match[ 1 ].toLowerCase() + " " ] = - ( responseHeaders[ match[ 1 ].toLowerCase() + " " ] || [] ) - .concat( match[ 2 ] ); - } - } - match = responseHeaders[ key.toLowerCase() + " " ]; - } - return match == null ? null : match.join( ", " ); - }, - - // Raw string - getAllResponseHeaders: function() { - return completed ? responseHeadersString : null; - }, - - // Caches the header - setRequestHeader: function( name, value ) { - if ( completed == null ) { - name = requestHeadersNames[ name.toLowerCase() ] = - requestHeadersNames[ name.toLowerCase() ] || name; - requestHeaders[ name ] = value; - } - return this; - }, - - // Overrides response content-type header - overrideMimeType: function( type ) { - if ( completed == null ) { - s.mimeType = type; - } - return this; - }, - - // Status-dependent callbacks - statusCode: function( map ) { - var code; - if ( map ) { - if ( completed ) { - - // Execute the appropriate callbacks - jqXHR.always( map[ jqXHR.status ] ); - } else { - - // Lazy-add the new callbacks in a way that preserves old ones - for ( code in map ) { - statusCode[ code ] = [ statusCode[ code ], map[ code ] ]; - } - } - } - return this; - }, - - // Cancel the request - abort: function( statusText ) { - var finalText = statusText || strAbort; - if ( transport ) { - transport.abort( finalText ); - } - done( 0, finalText ); - return this; - } - }; - - // Attach deferreds - deferred.promise( jqXHR ); - - // Add protocol if not provided (prefilters might expect it) - // Handle falsy url in the settings object (#10093: consistency with old signature) - // We also use the url parameter if available - s.url = ( ( url || s.url || location.href ) + "" ) - .replace( rprotocol, location.protocol + "//" ); - - // Alias method option to type as per ticket #12004 - s.type = options.method || options.type || s.method || s.type; - - // Extract dataTypes list - s.dataTypes = ( s.dataType || "*" ).toLowerCase().match( rnothtmlwhite ) || [ "" ]; - - // A cross-domain request is in order when the origin doesn't match the current origin. - if ( s.crossDomain == null ) { - urlAnchor = document.createElement( "a" ); - - // Support: IE <=8 - 11, Edge 12 - 15 - // IE throws exception on accessing the href property if url is malformed, - // e.g. http://example.com:80x/ - try { - urlAnchor.href = s.url; - - // Support: IE <=8 - 11 only - // Anchor's host property isn't correctly set when s.url is relative - urlAnchor.href = urlAnchor.href; - s.crossDomain = originAnchor.protocol + "//" + originAnchor.host !== - urlAnchor.protocol + "//" + urlAnchor.host; - } catch ( e ) { - - // If there is an error parsing the URL, assume it is crossDomain, - // it can be rejected by the transport if it is invalid - s.crossDomain = true; - } - } - - // Convert data if not already a string - if ( s.data && s.processData && typeof s.data !== "string" ) { - s.data = jQuery.param( s.data, s.traditional ); - } - - // Apply prefilters - inspectPrefiltersOrTransports( prefilters, s, options, jqXHR ); - - // If request was aborted inside a prefilter, stop there - if ( completed ) { - return jqXHR; - } - - // We can fire global events as of now if asked to - // Don't fire events if jQuery.event is undefined in an AMD-usage scenario (#15118) - fireGlobals = jQuery.event && s.global; - - // Watch for a new set of requests - if ( fireGlobals && jQuery.active++ === 0 ) { - jQuery.event.trigger( "ajaxStart" ); - } - - // Uppercase the type - s.type = s.type.toUpperCase(); - - // Determine if request has content - s.hasContent = !rnoContent.test( s.type ); - - // Save the URL in case we're toying with the If-Modified-Since - // and/or If-None-Match header later on - // Remove hash to simplify url manipulation - cacheURL = s.url.replace( rhash, "" ); - - // More options handling for requests with no content - if ( !s.hasContent ) { - - // Remember the hash so we can put it back - uncached = s.url.slice( cacheURL.length ); - - // If data is available and should be processed, append data to url - if ( s.data && ( s.processData || typeof s.data === "string" ) ) { - cacheURL += ( rquery.test( cacheURL ) ? "&" : "?" ) + s.data; - - // #9682: remove data so that it's not used in an eventual retry - delete s.data; - } - - // Add or update anti-cache param if needed - if ( s.cache === false ) { - cacheURL = cacheURL.replace( rantiCache, "$1" ); - uncached = ( rquery.test( cacheURL ) ? "&" : "?" ) + "_=" + ( nonce.guid++ ) + - uncached; - } - - // Put hash and anti-cache on the URL that will be requested (gh-1732) - s.url = cacheURL + uncached; - - // Change '%20' to '+' if this is encoded form body content (gh-2658) - } else if ( s.data && s.processData && - ( s.contentType || "" ).indexOf( "application/x-www-form-urlencoded" ) === 0 ) { - s.data = s.data.replace( r20, "+" ); - } - - // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. - if ( s.ifModified ) { - if ( jQuery.lastModified[ cacheURL ] ) { - jqXHR.setRequestHeader( "If-Modified-Since", jQuery.lastModified[ cacheURL ] ); - } - if ( jQuery.etag[ cacheURL ] ) { - jqXHR.setRequestHeader( "If-None-Match", jQuery.etag[ cacheURL ] ); - } - } - - // Set the correct header, if data is being sent - if ( s.data && s.hasContent && s.contentType !== false || options.contentType ) { - jqXHR.setRequestHeader( "Content-Type", s.contentType ); - } - - // Set the Accepts header for the server, depending on the dataType - jqXHR.setRequestHeader( - "Accept", - s.dataTypes[ 0 ] && s.accepts[ s.dataTypes[ 0 ] ] ? - s.accepts[ s.dataTypes[ 0 ] ] + - ( s.dataTypes[ 0 ] !== "*" ? ", " + allTypes + "; q=0.01" : "" ) : - s.accepts[ "*" ] - ); - - // Check for headers option - for ( i in s.headers ) { - jqXHR.setRequestHeader( i, s.headers[ i ] ); - } - - // Allow custom headers/mimetypes and early abort - if ( s.beforeSend && - ( s.beforeSend.call( callbackContext, jqXHR, s ) === false || completed ) ) { - - // Abort if not done already and return - return jqXHR.abort(); - } - - // Aborting is no longer a cancellation - strAbort = "abort"; - - // Install callbacks on deferreds - completeDeferred.add( s.complete ); - jqXHR.done( s.success ); - jqXHR.fail( s.error ); - - // Get transport - transport = inspectPrefiltersOrTransports( transports, s, options, jqXHR ); - - // If no transport, we auto-abort - if ( !transport ) { - done( -1, "No Transport" ); - } else { - jqXHR.readyState = 1; - - // Send global event - if ( fireGlobals ) { - globalEventContext.trigger( "ajaxSend", [ jqXHR, s ] ); - } - - // If request was aborted inside ajaxSend, stop there - if ( completed ) { - return jqXHR; - } - - // Timeout - if ( s.async && s.timeout > 0 ) { - timeoutTimer = window.setTimeout( function() { - jqXHR.abort( "timeout" ); - }, s.timeout ); - } - - try { - completed = false; - transport.send( requestHeaders, done ); - } catch ( e ) { - - // Rethrow post-completion exceptions - if ( completed ) { - throw e; - } - - // Propagate others as results - done( -1, e ); - } - } - - // Callback for when everything is done - function done( status, nativeStatusText, responses, headers ) { - var isSuccess, success, error, response, modified, - statusText = nativeStatusText; - - // Ignore repeat invocations - if ( completed ) { - return; - } - - completed = true; - - // Clear timeout if it exists - if ( timeoutTimer ) { - window.clearTimeout( timeoutTimer ); - } - - // Dereference transport for early garbage collection - // (no matter how long the jqXHR object will be used) - transport = undefined; - - // Cache response headers - responseHeadersString = headers || ""; - - // Set readyState - jqXHR.readyState = status > 0 ? 4 : 0; - - // Determine if successful - isSuccess = status >= 200 && status < 300 || status === 304; - - // Get response data - if ( responses ) { - response = ajaxHandleResponses( s, jqXHR, responses ); - } - - // Use a noop converter for missing script - if ( !isSuccess && jQuery.inArray( "script", s.dataTypes ) > -1 ) { - s.converters[ "text script" ] = function() {}; - } - - // Convert no matter what (that way responseXXX fields are always set) - response = ajaxConvert( s, response, jqXHR, isSuccess ); - - // If successful, handle type chaining - if ( isSuccess ) { - - // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. - if ( s.ifModified ) { - modified = jqXHR.getResponseHeader( "Last-Modified" ); - if ( modified ) { - jQuery.lastModified[ cacheURL ] = modified; - } - modified = jqXHR.getResponseHeader( "etag" ); - if ( modified ) { - jQuery.etag[ cacheURL ] = modified; - } - } - - // if no content - if ( status === 204 || s.type === "HEAD" ) { - statusText = "nocontent"; - - // if not modified - } else if ( status === 304 ) { - statusText = "notmodified"; - - // If we have data, let's convert it - } else { - statusText = response.state; - success = response.data; - error = response.error; - isSuccess = !error; - } - } else { - - // Extract error from statusText and normalize for non-aborts - error = statusText; - if ( status || !statusText ) { - statusText = "error"; - if ( status < 0 ) { - status = 0; - } - } - } - - // Set data for the fake xhr object - jqXHR.status = status; - jqXHR.statusText = ( nativeStatusText || statusText ) + ""; - - // Success/Error - if ( isSuccess ) { - deferred.resolveWith( callbackContext, [ success, statusText, jqXHR ] ); - } else { - deferred.rejectWith( callbackContext, [ jqXHR, statusText, error ] ); - } - - // Status-dependent callbacks - jqXHR.statusCode( statusCode ); - statusCode = undefined; - - if ( fireGlobals ) { - globalEventContext.trigger( isSuccess ? "ajaxSuccess" : "ajaxError", - [ jqXHR, s, isSuccess ? success : error ] ); - } - - // Complete - completeDeferred.fireWith( callbackContext, [ jqXHR, statusText ] ); - - if ( fireGlobals ) { - globalEventContext.trigger( "ajaxComplete", [ jqXHR, s ] ); - - // Handle the global AJAX counter - if ( !( --jQuery.active ) ) { - jQuery.event.trigger( "ajaxStop" ); - } - } - } - - return jqXHR; - }, - - getJSON: function( url, data, callback ) { - return jQuery.get( url, data, callback, "json" ); - }, - - getScript: function( url, callback ) { - return jQuery.get( url, undefined, callback, "script" ); - } -} ); - -jQuery.each( [ "get", "post" ], function( _i, method ) { - jQuery[ method ] = function( url, data, callback, type ) { - - // Shift arguments if data argument was omitted - if ( isFunction( data ) ) { - type = type || callback; - callback = data; - data = undefined; - } - - // The url can be an options object (which then must have .url) - return jQuery.ajax( jQuery.extend( { - url: url, - type: method, - dataType: type, - data: data, - success: callback - }, jQuery.isPlainObject( url ) && url ) ); - }; -} ); - -jQuery.ajaxPrefilter( function( s ) { - var i; - for ( i in s.headers ) { - if ( i.toLowerCase() === "content-type" ) { - s.contentType = s.headers[ i ] || ""; - } - } -} ); - - -jQuery._evalUrl = function( url, options, doc ) { - return jQuery.ajax( { - url: url, - - // Make this explicit, since user can override this through ajaxSetup (#11264) - type: "GET", - dataType: "script", - cache: true, - async: false, - global: false, - - // Only evaluate the response if it is successful (gh-4126) - // dataFilter is not invoked for failure responses, so using it instead - // of the default converter is kludgy but it works. - converters: { - "text script": function() {} - }, - dataFilter: function( response ) { - jQuery.globalEval( response, options, doc ); - } - } ); -}; - - -jQuery.fn.extend( { - wrapAll: function( html ) { - var wrap; - - if ( this[ 0 ] ) { - if ( isFunction( html ) ) { - html = html.call( this[ 0 ] ); - } - - // The elements to wrap the target around - wrap = jQuery( html, this[ 0 ].ownerDocument ).eq( 0 ).clone( true ); - - if ( this[ 0 ].parentNode ) { - wrap.insertBefore( this[ 0 ] ); - } - - wrap.map( function() { - var elem = this; - - while ( elem.firstElementChild ) { - elem = elem.firstElementChild; - } - - return elem; - } ).append( this ); - } - - return this; - }, - - wrapInner: function( html ) { - if ( isFunction( html ) ) { - return this.each( function( i ) { - jQuery( this ).wrapInner( html.call( this, i ) ); - } ); - } - - return this.each( function() { - var self = jQuery( this ), - contents = self.contents(); - - if ( contents.length ) { - contents.wrapAll( html ); - - } else { - self.append( html ); - } - } ); - }, - - wrap: function( html ) { - var htmlIsFunction = isFunction( html ); - - return this.each( function( i ) { - jQuery( this ).wrapAll( htmlIsFunction ? html.call( this, i ) : html ); - } ); - }, - - unwrap: function( selector ) { - this.parent( selector ).not( "body" ).each( function() { - jQuery( this ).replaceWith( this.childNodes ); - } ); - return this; - } -} ); - - -jQuery.expr.pseudos.hidden = function( elem ) { - return !jQuery.expr.pseudos.visible( elem ); -}; -jQuery.expr.pseudos.visible = function( elem ) { - return !!( elem.offsetWidth || elem.offsetHeight || elem.getClientRects().length ); -}; - - - - -jQuery.ajaxSettings.xhr = function() { - try { - return new window.XMLHttpRequest(); - } catch ( e ) {} -}; - -var xhrSuccessStatus = { - - // File protocol always yields status code 0, assume 200 - 0: 200, - - // Support: IE <=9 only - // #1450: sometimes IE returns 1223 when it should be 204 - 1223: 204 - }, - xhrSupported = jQuery.ajaxSettings.xhr(); - -support.cors = !!xhrSupported && ( "withCredentials" in xhrSupported ); -support.ajax = xhrSupported = !!xhrSupported; - -jQuery.ajaxTransport( function( options ) { - var callback, errorCallback; - - // Cross domain only allowed if supported through XMLHttpRequest - if ( support.cors || xhrSupported && !options.crossDomain ) { - return { - send: function( headers, complete ) { - var i, - xhr = options.xhr(); - - xhr.open( - options.type, - options.url, - options.async, - options.username, - options.password - ); - - // Apply custom fields if provided - if ( options.xhrFields ) { - for ( i in options.xhrFields ) { - xhr[ i ] = options.xhrFields[ i ]; - } - } - - // Override mime type if needed - if ( options.mimeType && xhr.overrideMimeType ) { - xhr.overrideMimeType( options.mimeType ); - } - - // X-Requested-With header - // For cross-domain requests, seeing as conditions for a preflight are - // akin to a jigsaw puzzle, we simply never set it to be sure. - // (it can always be set on a per-request basis or even using ajaxSetup) - // For same-domain requests, won't change header if already provided. - if ( !options.crossDomain && !headers[ "X-Requested-With" ] ) { - headers[ "X-Requested-With" ] = "XMLHttpRequest"; - } - - // Set headers - for ( i in headers ) { - xhr.setRequestHeader( i, headers[ i ] ); - } - - // Callback - callback = function( type ) { - return function() { - if ( callback ) { - callback = errorCallback = xhr.onload = - xhr.onerror = xhr.onabort = xhr.ontimeout = - xhr.onreadystatechange = null; - - if ( type === "abort" ) { - xhr.abort(); - } else if ( type === "error" ) { - - // Support: IE <=9 only - // On a manual native abort, IE9 throws - // errors on any property access that is not readyState - if ( typeof xhr.status !== "number" ) { - complete( 0, "error" ); - } else { - complete( - - // File: protocol always yields status 0; see #8605, #14207 - xhr.status, - xhr.statusText - ); - } - } else { - complete( - xhrSuccessStatus[ xhr.status ] || xhr.status, - xhr.statusText, - - // Support: IE <=9 only - // IE9 has no XHR2 but throws on binary (trac-11426) - // For XHR2 non-text, let the caller handle it (gh-2498) - ( xhr.responseType || "text" ) !== "text" || - typeof xhr.responseText !== "string" ? - { binary: xhr.response } : - { text: xhr.responseText }, - xhr.getAllResponseHeaders() - ); - } - } - }; - }; - - // Listen to events - xhr.onload = callback(); - errorCallback = xhr.onerror = xhr.ontimeout = callback( "error" ); - - // Support: IE 9 only - // Use onreadystatechange to replace onabort - // to handle uncaught aborts - if ( xhr.onabort !== undefined ) { - xhr.onabort = errorCallback; - } else { - xhr.onreadystatechange = function() { - - // Check readyState before timeout as it changes - if ( xhr.readyState === 4 ) { - - // Allow onerror to be called first, - // but that will not handle a native abort - // Also, save errorCallback to a variable - // as xhr.onerror cannot be accessed - window.setTimeout( function() { - if ( callback ) { - errorCallback(); - } - } ); - } - }; - } - - // Create the abort callback - callback = callback( "abort" ); - - try { - - // Do send the request (this may raise an exception) - xhr.send( options.hasContent && options.data || null ); - } catch ( e ) { - - // #14683: Only rethrow if this hasn't been notified as an error yet - if ( callback ) { - throw e; - } - } - }, - - abort: function() { - if ( callback ) { - callback(); - } - } - }; - } -} ); - - - - -// Prevent auto-execution of scripts when no explicit dataType was provided (See gh-2432) -jQuery.ajaxPrefilter( function( s ) { - if ( s.crossDomain ) { - s.contents.script = false; - } -} ); - -// Install script dataType -jQuery.ajaxSetup( { - accepts: { - script: "text/javascript, application/javascript, " + - "application/ecmascript, application/x-ecmascript" - }, - contents: { - script: /\b(?:java|ecma)script\b/ - }, - converters: { - "text script": function( text ) { - jQuery.globalEval( text ); - return text; - } - } -} ); - -// Handle cache's special case and crossDomain -jQuery.ajaxPrefilter( "script", function( s ) { - if ( s.cache === undefined ) { - s.cache = false; - } - if ( s.crossDomain ) { - s.type = "GET"; - } -} ); - -// Bind script tag hack transport -jQuery.ajaxTransport( "script", function( s ) { - - // This transport only deals with cross domain or forced-by-attrs requests - if ( s.crossDomain || s.scriptAttrs ) { - var script, callback; - return { - send: function( _, complete ) { - script = jQuery( " - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
- - - - - -
-
-
- - - - - - - - - - - -
-
-
- - - - - - - - - - - - - - - - -
- - - - -
-
- -
- Shortcuts -
-
- -
-
- - - -
- -
-
- -
-

Appendix

-
-

Model Architecture

-

All standard models available in tf.keras.applications (Tensorflow) and torchvision (PyTorch) can be trained. Custom models can also be trained by importing the model and setting the slideflow.model.ModelParams.model parameter equal to the model class.

-

Model inputs are an X by X by 3 array of standardized image data (R, G, and B image data layers converted to floats with range 0 -> 1). If desired, the core model is initialized with pre-trained weights, either from ImageNet or from a pre-trained model specified by the user.

-

The model core is then optionally connected to an additional set of fully-connected hidden layers as specified in the hyperparameter options, which then connects to outputs with softmax (categorical models) or linear (linear models) activations.

-
-
-

A Note on Input Balancing

-

When training, it is important to consider whether category-level balancing should be performed on your input in order to reduce bias against sparse categories. There is no established best practice for input balancing when training on histology images; the balancing method you choose to use is up to you.

-

Suppose you have five slides, labeled A through E. Slides A and B belong to category 1, while C, D, E belong to category 2. Let’s suppose tumors in all the slides are roughly the same physical size, except for B which is three times as large.

-

You perform tile extraction, and all the patients except B produce roughly the same number of image tiles. The training optimizer is ready for the next batch of images. Let’s say the batch size is 32. How does it select the next 32 images?

-

If tile-level balancing (“tile”) is used, tiles will be selected randomly. Because slide B has so many more tiles than the other slides, B will be over-represented in the batch. This means that the model will inherently learn a bias towards patient B. If patients like patient B are truly of greater prevalence in the real-world population, this is fine; the model is learning an appropriate bias. Otherwise, it is learning a bias which will hurt the model’s generalizability, which will result in poor performance on our test set.

-

If patient-based balancing (“patient”) is used, the input stream will balance tiles in a given batch across the patients. Now the model has no bias towards any given patient. However, you’ll notice that category 1 (patients A and B) only has 13 tiles, whereas category 2 (patients C, D, and E) has 19 tiles. With this type of balancing, models will learn bias towards categories with more patients (in this case category 2).

-

If category-based balancing (“category”) is used, the input stream balances tiles based on the category. There are now an equal number of tiles from category 1 and category 2, 16 from both. We are still unbalanced within category 1, as slide B has more tiles than slide A. However, because this unbalance is not occurring between categories, which is what the algorithm is training on, the bias effect is less prominent. The algorithm will expect category 1 to look more like slide B than slide A, but it is not clear whether this is avoidable. Unless you dispose of excess tiles, your model will be exposed to more tiles from B than from A, whether it happens on a per-batch basis or throughout its training across epochs.

-
-
- - -
- -
- - -
-
- - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
-
- - -
-
-
- - -
- - - - - - - - - - \ No newline at end of file diff --git a/docs/backend.html b/docs/backend.html deleted file mode 100644 index 6a5c0c548..000000000 --- a/docs/backend.html +++ /dev/null @@ -1,435 +0,0 @@ - - - - - - - - - - - - - Switching backends — slideflow 1.1.0 documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
- - - - - -
-
-
- - - - - - - - - - - -
-
-
- - - - - - - - - - - - - - - - -
- -
    - -
  • - - - Docs - - > -
  • - - -
  • Switching backends
  • - - -
  • - - - - - -
  • - -
- - -
-
- -
- Shortcuts -
-
- -
-
- - - -
- -
-
- -
-

Switching backends

-

The default backend for this package is Tensorflow/Keras, but a full PyTorch backend is also included, with a dedicated TFRecord reader/writer that ensures saved image tiles can be served to both Tensorflow and PyTorch models in cross-compatible fashion.

-

If using the Tensorflow backend, PyTorch does not need to be installed; the reverse is true as well.

-

To switch backends, simply set the environmental variable SF_BACKEND equal to either torch or tensorflow:

-
export SF_BACKEND=torch
-
-
-
-

TFRecord DataLoader

-

In addition to using the built-in training tools, you can use tiles that have been extracted with Slideflow with completely external projects. The slideflow.Dataset class includes both torch() and tensorflow() functions to prepare a DataLoader or Tensorflow tf.data.Dataset instance that interleaves and processs images from stored TFRecords.

-
from slideflow import Project
-
-P = Project('/project/path', ...)
-dts = P.dataset(tile_px=299, tile_um=302, filters=None)
-
-
-

If you want to perform any balancing, use the slideflow.Datset.balance() method:

-
dts = dts.balance('HPV_status', strategy='category')
-
-
-

Finally, use the slideflow.Dataset.torch() method to create a DataLoader object:

-
dataloader = dts.torch(
-    labels       = ...       # Your outcome label
-    batch_size   = 64,       # Batch size
-    num_workers  = 6,        # Number of workers reading tfrecords
-    infinite     = True,     # True for training, False for validation
-    augment      = True,     # Flip/rotate/compression augmentation
-    standardize  = True,     # Standardize images: mean 0, variance of 1
-    pin_memory   = False,    # Pin memory to GPUs
-)
-
-
-

The returned dataloader can then be used directly with your external PyTorch applications.

-
-
- - -
- -
- - -
-
- -
- -
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
-
- - -
-
-
- - -
- - - - - - - - - - \ No newline at end of file diff --git a/docs/biscuit/index.html b/docs/biscuit/index.html new file mode 100644 index 000000000..d813b19b4 --- /dev/null +++ b/docs/biscuit/index.html @@ -0,0 +1,1219 @@ + + + + + + + + + + + + + slideflow.biscuit — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ +
    + +
  • + + + Docs + + > +
  • + + +
  • slideflow.biscuit
  • + + +
  • + + + + + +
  • + +
+ + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +
+

slideflow.biscuit

+

This module contains an official implementation of BISCUIT, an uncertainty quantification and confidence thresholding algorithm for whole-slide images. The original implementation, which includes instructions for reproducing experimental results reported in the manuscript, is available on GitHub.

+

This module is requires the slideflow-noncommercial package, which can be installed with:

+
pip install slideflow-noncommercial
+
+
+

See Uncertainty Quantification for more information.

+
+
+find_cv(project, label, outcome, epoch=None, k=3)[source]
+

Finds paths to cross-validation models.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • outcome (str, optional) – Outcome name.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
  • kfold (int, optional) – K-fold iteration. Defaults to None.

  • +
+
+
Returns:
+

Paths to cross-validation models.

+
+
Return type:
+

list(str)

+
+
+
+ +
+
+get_model_results(path, epoch, outcome)[source]
+

Reads results/metrics from a trained model.

+
+
Parameters:
+
    +
  • path (str) – Path to model.

  • +
  • outcome (str) – Outcome name.

  • +
+
+
Returns:
+

+
pt_auc, pt_ap, slide_auc, slide_ap,

tile_auc, tile_ap, opt_thresh

+
+
+

+
+
Return type:
+

Dict of results with the keys

+
+
+
+ +
+

biscuit.Experiment

+
+
+class Experiment(train_project, eval_projects=None, outcome='cohort', outcome1='LUAD', outcome2='LUSC', outdir='results')[source]
+

Supervises uncertainty thresholding experiments.

+
+ +
+
+display(self, df, eval_dfs, hue='uq', palette='tab10', relplot_uq_compare=True, boxplot_uq_compare=True, ttest_uq_groups=['all', 'include'], prefix='')
+

Creates plots from assmebled results, exports results to CSV.

+
+
Parameters:
+
    +
  • df (pandas.DataFrame) – Cross-validation results metrics, as generated +by results()

  • +
  • eval_dfs (dict(pandas.DataFrame)) – Dict of external eval dataset names +(keys) mapped to pandas DataFrame of result metrics (values).

  • +
  • hue (str, optional) – Comparison to show with different hue on plots. +Defaults to ‘uq’.

  • +
  • palette (str, optional) – Seaborn color palette. Defaults to ‘tab10’.

  • +
  • relplot_uq_compare (bool, optional) – For the Relplot display, ensure +non-UQ and UQ results are generated from the same models/preds.

  • +
  • boxplot_uq_compare (bool, optional) – For the boxplot display, ensure +non-UQ and UQ results are generated from the same models/preds.

  • +
  • ttest_uq_groups (list(str)) – UQ groups to compare via t-test. Defaults +to [‘all’, ‘include’].

  • +
  • prefix (str, optional) – Prefix to use when saving figures. +Defaults to empty string.

  • +
+
+
Returns:
+

None

+
+
+
+ +
+
+plot_uq_calibration(self, label, tile_uq, slide_uq, slide_pred, epoch=1)
+

Plots a graph of predictions vs. uncertainty.

+
+
Parameters:
+
    +
  • label (str) – Experiment label.

  • +
  • kfold (int) – Validation k-fold.

  • +
  • tile_uq (float) – Tile-level uncertainty threshold.

  • +
  • slide_uq (float) – Slide-level uncertainty threshold.

  • +
  • slide_pred (float) – Slide-level prediction threshold.

  • +
+
+
Returns:
+

None

+
+
+
+ +
+
+results(self, exp_to_run, uq=True, eval=True, plot=False)
+

Assembles results from experiments, applies UQ thresholding, +and returns pandas dataframes with metrics.

+
+
Parameters:
+
    +
  • exp_to_run (list) – List of experiment IDs to search for results.

  • +
  • uq (bool, optional) – Apply UQ thresholds. Defaults to True.

  • +
  • eval (bool, optional) – Calculate results of external evaluation models. +Defaults to True.

  • +
  • plot (bool, optional) – Show plots. Defaults to False.

  • +
+
+
Returns:
+

Cross-val results, +pandas.DataFrame: Dxternal eval results

+
+
Return type:
+

pandas.DataFrame

+
+
+
+ +
+
+thresholds_from_nested_cv(self, label, outer_k=3, inner_k=5, id=None, threshold_params=None, epoch=1, tile_filename='tile_predictions_val_epoch1.csv', y_true=None, y_pred=None, uncertainty=None)
+

Detects tile- and slide-level UQ thresholds and slide-level prediction +thresholds from nested cross-validation.

+
+ +
+
+train(self, hp, label, filters=None, save_predictions='csv', validate_on_batch=32, validation_steps=32, **kwargs)
+

Train outer cross-validation models.

+
+
Parameters:
+
    +
  • hp (slideflow.ModelParams) – Hyperparameters object.

  • +
  • label (str) – Experimental label.

  • +
  • filters (dict, optional) – Dataset filters to use for +selecting slides. See slideflow.Dataset.filter() for +more information. Defaults to None.

  • +
  • save_predictions (bool, optional) – Save validation predictions to +model folder. Defaults to ‘csv’.

  • +
+
+
Keyword Arguments:
+
    +
  • validate_on_batch (int) – Frequency of validation checks during +training, in steps. Defaults to 32.

  • +
  • validation_steps (int) – Number of validation steps to perform +during each mid-training evaluation check. Defaults to 32.

  • +
  • **kwargs – All remaining keyword arguments are passed to +slideflow.Project.train().

  • +
+
+
Returns:
+

None

+
+
+
+ +
+
+train_nested_cv(self, hp, label, outer_k=3, inner_k=5, **kwargs)
+

Train models using nested cross-validation (outer_k=3, inner_k=5), +skipping already-generated models.

+
+
Parameters:
+
+
+
Keyword Arguments:
+
    +
  • outer_k (int) – Number of outer cross-folds. Defaults to 3.

  • +
  • inner_k (int) – Number of inner cross-folds. Defaults to 5.

  • +
  • **kwargs – All remaining keyword arguments are passed to +slideflow.Project.train().

  • +
+
+
Returns:
+

None

+
+
+
+ +
+
+

biscuit.hp

+
+
+nature2022()[source]
+

Hyperparameters used in the associated manuscript.

+

Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D. et al. +Uncertainty-informed deep learning models enable high-confidence +predictions for digital histopathology. Nat Commun 13, 6572 (2022). +https://doi.org/10.1038/s41467-022-34025-x

+
+
Returns:
+

sf.ModelParams

+
+
+
+ +
+
+

biscuit.threshold

+
+
+apply(df, tile_uq, slide_uq, tile_pred=0.5, slide_pred=0.5, plot=False, keep='high_confidence', title=None, patients=None, level='slide')[source]
+

Apply pre-calculcated tile- and group-level uncertainty thresholds.

+
+
Parameters:
+
    +
  • df (pandas.DataFrame) – Must contain columns ‘y_true’, ‘y_pred’, +and ‘uncertainty’.

  • +
  • tile_uq (float) – Tile-level uncertainty threshold.

  • +
  • slide_uq (float) – Slide-level uncertainty threshold.

  • +
  • tile_pred (float, optional) – Tile-level prediction threshold. +Defaults to 0.5.

  • +
  • slide_pred (float, optional) – Slide-level prediction threshold. +Defaults to 0.5.

  • +
  • plot (bool, optional) – Plot slide-level uncertainty. Defaults to False.

  • +
  • keep (str, optional) – Either ‘high_confidence’ or ‘low_confidence’. +Cohort to keep after thresholding. Defaults to ‘high_confidence’.

  • +
  • title (str, optional) – Title for uncertainty plot. Defaults to None.

  • +
  • patients (dict, optional) – Dictionary mapping slides to patients. Adds +a ‘patient’ column in the tile prediction dataframe, enabling +patient-level thresholding. Defaults to None.

  • +
  • level (str, optional) – Either ‘slide’ or ‘patient’. Level at which to +apply threshold. If ‘patient’, requires patient dict be supplied. +Defaults to ‘slide’.

  • +
+
+
Returns:
+

+
Dictionary of results, with keys auc, percent_incl, accuracy,

sensitivity, and specificity

+
+
+

DataFrame of thresholded group-level predictions

+

+
+
+
+ +
+
+detect(df, tile_uq='detect', slide_uq='detect', tile_pred='detect', slide_pred='detect', plot=False, patients=None)[source]
+

Detect optimal tile- and slide-level uncertainty thresholds.

+
+
Parameters:
+
    +
  • df (pandas.DataFrame) – Tile-level predictions. Must contain columns +‘y_true’, ‘y_pred’, and ‘uncertainty’.

  • +
  • tile_uq (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect tile-level uncertainty threshold. If float, will use +the specified tile-level uncertainty threshold.

  • +
  • slide_uq (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect slide-level uncertainty threshold. If float, will use +the specified slide-level uncertainty threshold.

  • +
  • tile_pred (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect tile-level prediction threshold. If float, will use the +specified tile-level prediction threshold.

  • +
  • slide_pred (str or float) – Either ‘detect’ or float. If ‘detect’ +will detect slide-level prediction threshold. If float, will use +the specified slide-level prediction threshold.

  • +
  • plot (bool, optional) – Plot slide-level uncertainty. Defaults to False.

  • +
  • patients (dict, optional) – Dict mapping slides to patients. Required +for patient-level thresholding.

  • +
+
+
Returns:
+

+
Dictionary with tile- and slide-level UQ and prediction threhsolds,

with keys: ‘tile_uq’, ‘tile_pred’, ‘slide_uq’, ‘slide_pred’

+
+
+

Float: Slide-level AUROC

+

+
+
+
+ +
+
+from_cv(dfs, **kwargs)[source]
+

Finds the optimal tile and slide-level thresholds from a set of nested +cross-validation experiments.

+
+
Parameters:
+

dfs (list(DataFrame)) – List of DataFrames with tile predictions, +containing headers ‘y_true’, ‘y_pred’, ‘uncertainty’, ‘slide’, +and ‘patient’.

+
+
Keyword Arguments:
+
    +
  • tile_uq (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect tile-level uncertainty threshold. If float, will use +the specified tile-level uncertainty threshold.

  • +
  • slide_uq (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect slide-level uncertainty threshold. If float, will use +the specified slide-level uncertainty threshold.

  • +
  • tile_pred (str or float) – Either ‘detect’ or float. If ‘detect’, +will detect tile-level prediction threshold. If float, will use the +specified tile-level prediction threshold.

  • +
  • slide_pred (str or float) – Either ‘detect’ or float. If ‘detect’ +will detect slide-level prediction threshold. If float, will use +the specified slide-level prediction threshold.

  • +
  • plot (bool, optional) – Plot slide-level uncertainty. Defaults to False.

  • +
  • patients (dict, optional) – Dict mapping slides to patients. Required +for patient-level thresholding.

  • +
+
+
Returns:
+

+
Dictionary with tile- and slide-level UQ and prediction threhsolds,

with keys: ‘tile_uq’, ‘tile_pred’, ‘slide_uq’, ‘slide_pred’

+
+
+

+
+
+
+ +
+
+plot_uncertainty(df, kind, threshold=None, title=None)[source]
+

Plots figure of tile or slide-level predictions vs. uncertainty.

+
+
Parameters:
+
    +
  • df (pandas.DataFrame) – Processed dataframe containing columns +‘uncertainty’, ‘correct’, ‘y_pred’.

  • +
  • kind (str) – Kind of plot. If ‘tile’, subsample to only 1000 points. +Included in title.

  • +
  • threshold (float, optional) – Uncertainty threshold. +Defaults to None.

  • +
  • title (str, optional) – Title for plots. Defaults to None.

  • +
+
+
Returns:
+

None

+
+
+
+ +
+
+process_group_predictions(df, pred_thresh, level)[source]
+

From a given dataframe of tile-level predictions, calculate group-level +predictions and uncertainty.

+
+ +
+
+process_tile_predictions(df, pred_thresh=0.5, patients=None)[source]
+

Load and process tile-level predictions from CSV.

+
+
Parameters:
+
    +
  • df (pandas.DataFrame) – Unprocessed DataFrame from reading tile-level +predictions.

  • +
  • pred_thresh (float or str, optional) – Tile-level prediction threshold. +If ‘detect’, will auto-detect via Youden’s J. Defaults to 0.5.

  • +
  • patients (dict, optional) – Dict mapping slides to patients, used for +patient-level thresholding. Defaults to None.

  • +
+
+
Returns:
+

pandas.DataFrame, tile prediction threshold

+
+
+
+ +
+
+

biscuit.utils

+
+
+auc(y_true, y_pred)[source]
+

Calculate Area Under Receiver Operator Curve (AUC / AUROC)

+
+
Parameters:
+
    +
  • y_true (np.ndarray) – True labels.

  • +
  • y_pred (np.ndarray) – Predictions.

  • +
+
+
Returns:
+

AUC

+
+
Return type:
+

Float

+
+
+
+ +
+
+auc_and_threshold(y_true, y_pred)[source]
+

Calculates AUC and optimal threshold (via Youden’s J)

+
+
Parameters:
+
    +
  • y_true (np.ndarray) – Y true (labels).

  • +
  • y_pred (np.ndarray) – Y pred (predictions).

  • +
+
+
Returns:
+

AUC +float: Optimal threshold

+
+
Return type:
+

float

+
+
+
+ +
+
+df_from_cv(project, label, outcome, epoch=None, k=3, y_true=None, y_pred=None, uncertainty=None)[source]
+

Loads tile predictions from cross-fold models & renames columns.

+
+
Parameters:
+
    +
  • project (sf.Project) – Slideflow project.

  • +
  • label (str) – Experimental label.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
  • k (int, optional) – K-fold iteration. Defaults to 3.

  • +
  • outcome (str, optional) – Outcome name.

  • +
  • y_true (str, optional) – Column name for ground truth labels. +Defaults to {outcome}_y_true0.

  • +
  • y_pred (str, optional) – Column name for predictions. +Defaults to {outcome}_y_pred1.

  • +
  • uncertainty (str, optional) – Column name for uncertainty. +Defaults to {outcome}_y_uncertainty1.

  • +
+
+
Returns:
+

DataFrame for each k-fold.

+
+
Return type:
+

list(DataFrame)

+
+
+
+ +
+
+eval_exists(project, label, outcome, epoch=1)[source]
+

Check if matching eval exists.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
+
+
Returns:
+

If eval exists

+
+
Return type:
+

bool

+
+
+
+ +
+
+find_cv(project, label, outcome, epoch=None, k=3)[source]
+

Finds paths to cross-validation models.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • outcome (str, optional) – Outcome name.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
  • kfold (int, optional) – K-fold iteration. Defaults to None.

  • +
+
+
Returns:
+

Paths to cross-validation models.

+
+
Return type:
+

list(str)

+
+
+
+ +
+
+find_cv_early_stop(project, label, outcome, k=3)[source]
+

Detects early stop batch from cross-val trained models.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • k (int, optional) – Number of k-fold iterations. Defaults to 3.

  • +
  • outcome (str) – Outcome name.

  • +
+
+
Returns:
+

Early stop batch.

+
+
Return type:
+

int

+
+
+
+ +
+
+find_eval(project, label, outcome, epoch=1)[source]
+

Finds matching eval directory.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • outcome (str, optional) – Outcome name.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
+
+
Raises:
+
    +
  • MultipleModelsFoundError – If multiple matches are found.

  • +
  • ModelNotFoundError – If no match is found.

  • +
+
+
Returns:
+

path to eval directory

+
+
Return type:
+

str

+
+
+
+ +
+
+find_model(project, label, outcome, epoch=None, kfold=None)[source]
+

Searches for a model in a project model directory.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • outcome (str) – Outcome name.

  • +
  • epoch (int, optional) – Epoch to search for. If not None, returns +path to the saved model. If None, returns path to parent model +folder. Defaults to None.

  • +
  • kfold (int, optional) – K-fold iteration. Defaults to None.

  • +
+
+
Raises:
+
    +
  • MultipleModelsFoundError – If multiple potential matches are found.

  • +
  • ModelNotFoundError – If no matching model is found.

  • +
+
+
Returns:
+

Path to matching model.

+
+
Return type:
+

str

+
+
+
+ +
+
+get_model_results(path, epoch, outcome)[source]
+

Reads results/metrics from a trained model.

+
+
Parameters:
+
    +
  • path (str) – Path to model.

  • +
  • outcome (str) – Outcome name.

  • +
+
+
Returns:
+

+
pt_auc, pt_ap, slide_auc, slide_ap,

tile_auc, tile_ap, opt_thresh

+
+
+

+
+
Return type:
+

Dict of results with the keys

+
+
+
+ +
+
+get_eval_results(path, outcome)[source]
+

Reads results/metrics from a trained model.

+
+
Parameters:
+
    +
  • path (str) – Path to model.

  • +
  • outcome (str) – Outcome name.

  • +
+
+
Returns:
+

+
pt_auc, pt_ap, slide_auc, slide_ap,

tile_auc, tile_ap, opt_thresh

+
+
+

+
+
Return type:
+

Dict of results with the keys

+
+
+
+ +
+
+model_exists(project, label, outcome, epoch=None, kfold=None)[source]
+

Check if matching model exists.

+
+
Parameters:
+
    +
  • project (slideflow.Project) – Project.

  • +
  • label (str) – Experimental label.

  • +
  • outcome (str, optional) – Outcome name.

  • +
  • epoch (int, optional) – Epoch number of saved model. Defaults to None.

  • +
  • kfold (int, optional) – K-fold iteration. Defaults to None.

  • +
+
+
Returns:
+

If model exists

+
+
Return type:
+

bool

+
+
+
+ +
+
+prediction_metrics(y_true, y_pred, threshold)[source]
+

Calculate prediction metrics (AUC, sensitivity/specificity, etc)

+
+
Parameters:
+
    +
  • y_true (np.ndarray) – True labels.

  • +
  • y_pred (np.ndarray) – Predictions.

  • +
  • threshold (_type_) – Prediction threshold.

  • +
+
+
Returns:
+

Prediction metrics.

+
+
Return type:
+

dict

+
+
+
+ +
+
+read_group_predictions(path)[source]
+

Reads patient- or slide-level predictions CSV or parquet file, +returning y_true and y_pred.

+

Expects a binary categorical outcome.

+

Compatible with Slideflow 1.1 and 1.2.

+
+ +
+
+truncate_colormap(cmap, minval=0.0, maxval=1.0, n=100)[source]
+

Truncates matplotlib colormap.

+
+ +
+
+

biscuit.delong

+
+
+fastDeLong(predictions_sorted_transposed, label_1_count)[source]
+

The fast version of DeLong’s method for computing the covariance of +unadjusted AUC.

+
+
Parameters:
+

predictions_sorted_transposed – a 2D numpy.array[n_classifiers, n_examples] +sorted such as the examples with label “1” are first

+
+
Returns:
+

(AUC value, DeLong covariance)

+
+
+
+ +
+
+delong_roc_variance(ground_truth, predictions)[source]
+

Computes ROC AUC variance for a single set of predictions

+
+
Parameters:
+
    +
  • ground_truth – np.array of 0 and 1

  • +
  • predictions – np.array of floats of the probability of being class 1

  • +
+
+
+
+ +
+
+delong_roc_test(ground_truth, predictions_one, predictions_two)[source]
+

Computes log(p-value) for hypothesis that two ROC AUCs are different

+
+
Parameters:
+
    +
  • ground_truth – np.array of 0 and 1

  • +
  • predictions_one – predictions of the first model, +np.array of floats of the probability of being class 1

  • +
  • predictions_two – predictions of the second model, +np.array of floats of the probability of being class 1

  • +
+
+
+
+ +
+
+ + +
+ +
+ + +
+
+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/cellseg/index.html b/docs/cellseg/index.html new file mode 100644 index 000000000..6b209ca6c --- /dev/null +++ b/docs/cellseg/index.html @@ -0,0 +1,711 @@ + + + + + + + + + + + + + Cell Segmentation — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ +
    + +
  • + + + Docs + + > +
  • + + +
  • Cell Segmentation
  • + + +
  • + + + + + +
  • + +
+ + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +
+

Cell Segmentation

+

Many tasks in digital pathology rely on analysis of cellular features, as opposed to higher-level architectural features. Slideflow supports whole-slide analysis of cellular features with a cell detection and segmentation pipeline based on Cellpose. To start, ensure cellpose has been installed via pip:

+
pip install cellpose
+
+
+
+

Approach

+
+../_images/cell_segmentation.png +
+

The general approach for cell detection and segmentation in Slideflow is illustrated above, and will be discussed in the following sections. In short, the general approach is to tune the cell segmentation parameters on a single slide, use these parameters to detect cells in all of your slides, then extract cell images at these locations.

+
+
+

Slideflow Studio

+

Cellpose models have several configurable parameters which will affect the quality of your segmentation masks, namely the pretrained model and cell diameter. The best way to determine the optimal parameters to use for your dataset is through interactive visualization using Slideflow Studio.

+

Use Cellpose-based cell segmentation in Slideflow Studio by enabling the extension, or start Studio with the --cellpose flag:

+
python -m slideflow.studio --cellpose
+
+
+
+

Control panel

+

Open the Cell Segmentation section in the control panel to access the segmentation controls.

+
+../_images/cellseg_workbench_panel.png +
+

The Model & Cell Diameter subsection is used to customize the segmentation model (defaults to +‘cyto2’) and cell diameter (defaults to 10 microns). Selecting “Auto-detect diameter” then +clicking “Preview” will perform cell segmentation on the portion of the slide currently in view. Once complete, the diameter text box will be updated with the detected cell diameter. Any user-trained models will be listed in the model dropdown selection.

+
+
+

Viewing cell segmentations

+
+../_images/cellseg_workbench_masks.png +
+

The View Controls subsection provides options for customizing how cell segmentations are displayed. By default, cell segmentation masks are shown in cyan on a black background. The black +background can be removed by unchecking “Black BG”. You can add a green dot at each cell’s detected centroid by selecting the “Centroid option.” The “Alpha” slider controls transparency for the mask overlay.

+

You can also choose to view the segmentation masks as outlines. The “Outline” button will +convert any masks currently in view to outlines, allowing you to more easily see how the +masks match cells visible on the slide.

+
+../_images/cellseg_workbench_outlines.png +
+

Finally, the “gradXY” option will show the flow gradients calculated during cell segmentation.

+
+../_images/cellseg_workbench_flows.png +
+
+
+

Preparing WSI segmentation

+

Once you are satisifed with a chosen model and cell diameter, set the cell diameter to a +manual value in microns. Once the cell diameter has been set, the middle control panel will +activate, allowing you to perform whole-slide segmentation.

+

The Otsu threshold option will perform strict Otsu’s thresholding on the whole slide image, +only performing cell segmentation in non-background areas (reducing computational time). +You can preview the Otsu’s thresholding algorithm in the Slide section. This option is disabled by default, as Otsu’s thresholding does not +work well for all slides (particularly cytology slides).

+

The Save flows option saves gradients during cell segmentation, allowing you to generate +visualizations as shown with the gradXY option above. This is disabled by default, as +calculation requires high RAM usage and may not be practical on all systems.

+ ++++ + + + + + +

The Advanced subsection provides additional options for controlling the cell segmentation process.

+

Window controls the window size during cell segmentation; cell segmentation is performed +on images of this pixel size and then stitched together. The Tile option permits further sub- +tiling of each window, reducing GPU and CPU memory utilization.

+

Downscale will scale down the final generated cell segmentation mask, reducing memory +utilization (both RAM and disk). Enable spawn workers enables a multiprocessing technique that improves cell segmentation speed at the cost of higher RAM usage.

+
../_images/cellseg_workbench_advanced.png +
+
+
+

Running WSI segmentation

+

Once you are satisifed with the settings, whole-slide cell segmentation can be initialized by +clicking Segment. You will see a notification in the bottom-right corner of the screen when +segmentation is complete. In the meantime, a progress bar will be shown in the terminal +along with ETA.

+
+
+

Exporting results

+

Once segmentation is complete, masks can be saved to disk for later use with Export. +Masks are saved in *.zip format, and can be loaded in Studio with drag-and-drop.

+
+
+
+

Segmenting cells

+
+

Single slide segmentation

+

Once the segmentation parameters have been determined, you can run segmentation for a single slide using slideflow.cellseg.segment_slide().

+
import slideflow as sf
+from slideflow.cellseg import segment_slide
+
+segmentation = segment_slide(
+    '.../slide.svs',
+    model='cyto2',
+    diam_um=10,
+    ...
+)
+segmentation.save('...masks.zip')
+
+
+
+
+

Project-wide segmentation

+

Cell segmentation can also be performed automatically for all slides in a Slideflow project. +Cell segmentation masks (and associated cell centroids) are calculated for all slides in the project using slideflow.Project.cell_segmentation().

+
import slideflow as sf
+
+# Load a slideflow project
+P = sf.Project(...)
+
+# Perform cell segmentation
+P.cell_segmentation(
+    model='cyto2',
+    diam_um=10
+)
+
+
+

Relevant arguments for this function include:

+
    +
  • model : Cell segmentation model. All cellpose models are supported, including ‘cyto’, +‘cyto2’, ‘nuclei’, and more.

  • +
  • diam_um : Cell diameter, in microns.

  • +
  • buffer : Path to a buffer, significantly speeds up segmentation if running from a HDD +(same as P.extract_tiles())

  • +
  • window_size : Integer. Defaults to 256. Increasing this to 512 will make things slightly +faster, but will use a bit more GPU memory.

  • +
  • downscale : Factor by which to downscale the masks, to save memory. Defaults to 1 +(no downscaling, full resolution). Downscale of 2 is a nice balance between memory +size and fidelity.

  • +
+

Depending on the size of the slide, this may take between 5-25 minutes per slide.

+

Masks will be saved in the project subfolder masks/ . As described above, +these masks can be loaded in Studio for interactive visualization via drag-and-drop. +They can also be used for downstream analysis and cell extraction, as described in the next +section.

+
+
+

Accessing segmentation masks

+

Saved cell segmentation masks (in *.zip format) can be loaded with slideflow.cellseg.Segmentation.

+
from slideflow.cellseg import Segmentation
+seg = Segmentation.load('.../slide-masks.zip')
+
+
+

The mask array, Segmentation.masks , is a np.ndarray with dtype of np.uint32. Zero values are background, and masks for each cell are represented by a unique integer. Flows/gradients, +if calculated, will be available in Segmentation.flows.

+

Centroids for detected cells can be calculated with Segmentation.centroids(), returning an array of centroid locations. By default, coordinates are returned in mask dimension space. With the argument wsi_dim=True, centroid coordinates will be in the slide dimension space.

+
+
+

Caveats

+

There are some caveats to the cell segmentation process, including:

+
    +
  • Memory usage: Cell segmentation requires at minimum 32 GB of RAM. Larger slides (particularly cytology) may require up to 64 GB of RAM.

  • +
  • Stitching artifacts: At present, due to the algorithm by which whole-slide cell segmentations are stitched together, you may see some cells that are not detected, missing in a grid-like pattern. Work is ongoing to reduce these stitching artifacts.

  • +
  • Cell diameter: The quality of cell segmentation results is highly dependent on an appropriately chosen cell diameter. Use Slideflow Studio to find the best cell diameter for your application.

  • +
+
+
+
+

Extracting cells from slides

+

Once segmentation masks have been calculated, images of individual cells can be extracted from a whole-slide image. This can be performed for either a single slide, or all slides in a project.

+
+

From a single slide

+

Start by loading the saved segmentation, as described above. Then, use slideflow.WSI.apply_segmentation(), followed by slideflow.WSI.extract_cells().

+
import slideflow as sf
+from slideflow.cellseg import Segmentation
+
+# Load WSI.
+wsi = sf.WSI('../slide.svs', tile_px=96, tile_um='40x')
+
+# Load cell segmentations.
+seg = Segmentation.load('.../slide-masks.zip')
+
+# Apply segmentations to the slide.
+wsi.apply_segmentation(seg)
+
+# Extract images of cells.
+wsi.extract_cells(tiles_dir=...)
+
+
+ ++++ + + + + + + + + +

By default, segmentation masks will be applied to the extracted cell images:

../_images/cell_masked.png +

However, you can choose not to apply masks by using the argument apply_masks=False.

../_images/cell_unmasked.png +
+

Tile extraction is then performed as usual. Cell images (tiles) can either be saved as loose images or in TFRecord format. See slideflow.WSI.extract_cells() for more information.

+
+
+

From all slides

+

Additionally, cell images can be extracted from all slides in a project. This should only be +done after slideflow.Project.cell_segmentation().

+
P.extract_cells(
+    tile_px=96,
+    tile_um='40x',
+    apply_masks=True
+)
+
+
+

Extracted cell images are saved by default in TFRecord format, and are otherwise handled +identically to tile images generated through slideflow.Project.extract_tiles().

+
+
+
+

Complete example

+

An example of a complete cell segmentation pipeline is shown below, from parameter tuning +to final tile extraction from detected cells.

+
+

1. Slideflow Studio

+

Determine optimal cell segmenation parameters using Studio, as described above:

+
python -m slideflow.studio --cellpose
+
+
+
+
+

2. Cell segmentation

+

Segment cells for all slides in a Slideflow project.

+
P = sf.Project(...)
+P.cell_segmentation(
+    model='cyto2',
+    diam_um=10,
+    window_size=512,
+    downscale=2
+)
+
+
+
+
+

3. Cell image extraction

+

Extract image tiles of segmented cells, in this case using segmentation masks.

+
P.extract_cells(
+    tile_px=96,
+    tile_um='40x',
+    apply_masks=True,
+    grayspace_fraction=1
+)
+
+
+
+
+
+ + +
+ +
+ + +
+
+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/clam.html b/docs/clam.html deleted file mode 100644 index 140f7a066..000000000 --- a/docs/clam.html +++ /dev/null @@ -1,452 +0,0 @@ - - - - - - - - - - - - CLAM — slideflow 1.1.1 documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
- - - - - -
-
-
- - - - - - - - - - - -
-
-
- - - - - - - - - - - - - - - - -
- - - - -
-
- -
- Shortcuts -
-
- -
-
- - - -
- -
-
- -
-

CLAM

-

In addition to standard Tensorflow/Keras model applications, slideflow supports training models with CLAM. A slightly modified version of CLAM which supports slideflow dataset and input pipelines is included in slideflow.clam.

-
-

Creating slide activations

-

The first step in the CLAM pipeline is generating tile-level activations across whole-slide images. While the original CLAM paper used features generated from an imagenet-trained model, we have found it useful to generate feature activations from models pretrained with histology images. To this end, the project function slideflow.Project.generate_features_for_clam() accepts any model as input and will generate feature vectors from the specified intermediate layers. For example:

-
P.generate_features_for_clam(
-    model='/path/to/saved/model',
-    outdir='/clam/path',
-    layers=['postconv']
-)
-
-
-
-
-

Training

-

To train a CLAM model, use the project function slideflow.Project.train_clam(). Clam arguments are configured with slideflow.clam.get_args():

-
dataset = P.dataset(tile_px=299, tile_um=302)
-P.generate_features_for_clam(..., outdir='/clam/path')
-
-clam_args = sf.clam.get_args(k=3, bag_loss='svm', ...)
-
-P.train_clam(
-    exp_name='test_experiment',
-    pt_files='/clam/path',
-    outcomes='category1',
-    dataset=dataset,
-    clam_args=clam_args
-)
-
-
-

The training function will, by default, save heatmaps of the attention layers for each of the validation slides. This behavior can be disabled by passing attention_heatmaps=False.

-
-
-

Evaluation

-

To evaluate a saved CLAM model on an external dataset, first extract features from this dataset, then use the project function slideflow.Project.evaluate_clam():

-
P.generate_features_for_clam(..., outdir='/eval/clam/path')
-
-P.evaluate_clam(
-    exp_name='evaluation',
-    pt_files='/eval/clam/path',
-    outcomes='category1',
-    tile_px=299,
-    tile_um=302
-)
-
-
-
-
- - -
- -
- - -
-
- - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-
-
- - -
-
-
- - -
- - - - - - - - - - \ No newline at end of file diff --git a/docs/custom_extractors/index.html b/docs/custom_extractors/index.html new file mode 100644 index 000000000..ce87cc481 --- /dev/null +++ b/docs/custom_extractors/index.html @@ -0,0 +1,657 @@ + + + + + + + + + + + + + Custom Feature Extractors — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+ + + + + +
+
+
+ + + + + + + + + + + +
+
+
+ + + + + + + + + + + + + + + + +
+ +
    + +
  • + + + Docs + + > +
  • + + +
  • Custom Feature Extractors
  • + + +
  • + + + + + +
  • + +
+ + +
+
+ +
+ Shortcuts +
+
+ +
+
+ + + +
+ +
+
+ +
+

Custom Feature Extractors

+

Slideflow includes several pretrained feature extractors for converting image tiles into feature vectors as well as tools to assist with building your own feature extractor. In this note, we’ll walk through the process of building a custom feature extractor from both a PyTorch and Tensorflow model.

+
+

PyTorch

+

Feature extractors are implemented as a subclass of slideflow.model.extractors._factory_torch.TorchFeatureExtractor. The base class provides core functionality and helper methods for generating features from image tiles (dtype uint8) or whole-slide images (type slideflow.WSI).

+

The initializer should create the feature extraction model and move it to the appropriate device (i.e. GPU). The model should be a torch.nn.Module that accepts an image tensor as input and returns a feature tensor as output.

+
# Import your custom torch.nn.Module,
+# which generates features from an image.
+from my_module import MyModel
+
+from slideflow.model.extractors._factory_torch import TorchFeatureExtractor
+
+class MyFeatureExtractor(TorchFeatureExtractor):
+
+    tag = 'my_feature_extractor'  # Human-readable identifier
+
+    def __init__(self):
+        super().__init__()
+
+        # Create the device, move to GPU, and set in evaluation mode.
+        self.model = MyModel()
+        self.model.to('cuda')
+        self.model.eval()
+
+
+

Next, the initializer should set the number of features expected to be returned by the model.

+
...
+
+    def __init__(self):
+        ...
+
+        self.num_features = 1024
+
+
+

The initializer is also responsible for registering image preprocessing. The image preprocessing transformation, a function which converts a raw uint8 image to a float32 tensor for model input, should be stored in self.transform. If the transformation standardizes the images, then the parameter self.preprocess_kwargs should be set to {'standardize': False}, indicating that Slideflow should not perform any additional standardization. You can use the class method .build_transform() to use the standard preprocessing pipeline.

+
from torchvision import transforms
+
+...
+
+    def __init__(self):
+        ...
+
+        # Image preprocessing.
+        self.transform = self.build_transform(img_size=256)
+        # Disable Slideflow standardization,
+        # as we are standardizing with transforms.Normalize
+        self.preprocess_kwargs = {'standardize': False}
+
+
+

The final required method is .dump_config(), which returns a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with "class" and "kwargs" attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training.

+
...
+
+    def dump_config(self):
+        return self._dump_config(
+            class_name='my_module.MyFeatureExtractor'
+        )
+
+
+

The final class should look like this:

+
from my_module import MyModel
+from slideflow.model.extractors._factory_torch import TorchFeatureExtractor
+from torchvision import transforms
+
+class MyFeatureExtractor(TorchFeatureExtractor):
+
+    tag = 'my_feature_extractor'  # Human-readable identifier
+
+    def __init__(self):
+        super().__init__()
+
+        # Create the device, move to GPU, and set in evaluation mode.
+        self.model = MyModel()
+        self.model.to('cuda')
+        self.model.eval()
+        self.num_features = 1024
+
+        # Image preprocessing.
+        self.transform = self.build_transform(img_size=256)
+        # Disable Slideflow standardization,
+        # as we are standardizing with transforms.Normalize
+        self.preprocess_kwargs = {'standardize': False}
+
+    def dump_config(self):
+        return self._dump_config(
+            class_name='my_module.MyFeatureExtractor'
+        )
+
+
+

You can then use the feature extractor for generating bags for MIL training, as described in Multiple-Instance Learning (MIL).

+
# Build the feature extractor.
+myfeatures = MyFeatureExtractor()
+
+# Load a dataset.
+project = slideflow.load_project(...)
+dataset = project.dataset(...)
+
+# Generate bags.
+project.generate_feature_bags(myfeatures, dataset)
+
+
+

You can also generate features across whole-slide images, returning a grid of features for each slide. The size of the returned grid reflects the slide’s tile grid. For example, for a slide with 24 columns and 33 rows of tiles, the returned grid will have shape (24, 33, n_features).

+
>>> myfeatures = MyFeatureExtractor()
+>>> wsi = sf.WSI('path/to/wsi', tile_px=256, tile_um=302)
+>>> features = myfeatures(wsi)
+>>> features.shape
+(24, 33, 1024)
+
+
+

Finally, the feature extractor can also be used to perform latent space analysis and generate mosaic maps, as described in Layer Activations.

+

Slideflow includes a registration system for keeping track of all available feature extractors. To register your feature extractor, use the slideflow.model.extractors.register_torch() decorator.

+
from slideflow.model.extractors import register_torch
+
+@register_torch
+def my_feature_extractor(**kwargs):
+    return MyFeatureExtractor(**kwargs)
+
+
+

Once registered, a feature extractor can be built by name:

+
import slideflow as sf
+extractor = sf.build_feature_extractor('my_feature_extractor')
+
+
+
+
+

Tensorflow

+

Tensorflow feature extractors are implemented very similarly to PyTorch feature extractors, extended from slideflow.model.extractors._tensorflow_base.TensorflowFeatureExtractor.

+

The initializer should create the model and set the expected number of features.

+
from my_module import MyModel
+from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor
+
+class MyFeatureExtractor(TensorflowFeatureExtractor):
+
+    tag = 'my_feature_extractor'  # Unique identifier
+
+    def __init__(self):
+        super().__init__()
+
+        # Create the model.
+        self.model = MyModel()
+        self.num_features = 1024
+
+
+

The initializer is also responsible for registering image preprocessing and transformations. Preprocessing steps are stored in the .preprocess_kwargs dictionary, which should have the keys standardize and transform. If standardize=True, images will be standardized using tf.image.per_image_standardization. If transform is not None, it should be a callable that accepts a single image tensor and returns a transformed image tensor.

+

For example, to only perform standardization and no further preprocessing:

+
...
+
+    def __init__(self):
+        ...
+
+        # Image preprocessing.
+        self.preprocess_kwargs = {
+            'standardize': True,
+            'transform': None
+        }
+
+
+

To perform standardization and resize images to 256x256:

+
import tensorflow as tf
+
+@tf.function
+def resize_256(x):
+    return = tf.image.resize(x, (resize_px, resize_px))
+
+...
+
+    def __init__(self):
+        ...
+
+        # Image preprocessing.
+        self.preprocess_kwargs = {
+            'standardize': True,
+            'transform': resize_256
+        }
+
+
+

The .dump_config() method should then be set, which is expected to return a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with "class" and "kwargs" attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training.

+
...
+
+    def dump_config(self):
+        return {
+            'class': 'MyFeatureExtractor',
+            'kwargs': {}
+        }
+
+
+

The final class should look like this:

+
from my_module import MyModel
+from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor
+
+class MyFeatureExtractor(TensorflowFeatureExtractor):
+
+    tag = 'my_feature_extractor'  # Unique identifier
+
+    def __init__(self):
+        super().__init__()
+
+        # Create the model.
+        self.model = MyModel()
+        self.num_features = 1024
+
+        # Image preprocessing.
+        self.preprocess_kwargs = {
+            'standardize': True,
+            'transform': None
+        }
+
+    def dump_config(self):
+        return {
+            'class': 'MyFeatureExtractor',
+            'kwargs': {}
+        }
+
+
+

As described above, this feature extractor can then be used to create bags for MIL training, generate features across whole-slide images, or perform feature space analysis across a dataset.

+

To register your feature extractor, use the slideflow.model.extractors.register_tensorflow() decorator.

+
from slideflow.model.extractors import register_tf
+
+@register_tf
+def my_feature_extractor(**kwargs):
+    return MyFeatureExtractor(**kwargs)
+
+
+

…which will allow the feature extractor to be built by name:

+
import slideflow as sf
+extractor = sf.build_feature_extractor('my_feature_extractor')
+
+
+
+
+ + +
+ +
+ + +
+
+ + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + +
+
+
+ + +
+ + + + + + + + + + \ No newline at end of file diff --git a/docs/custom_loops.html b/docs/custom_loops/index.html similarity index 51% rename from docs/custom_loops.html rename to docs/custom_loops/index.html index af3836e1a..7f901c591 100644 --- a/docs/custom_loops.html +++ b/docs/custom_loops/index.html @@ -6,48 +6,48 @@ - + + - - Custom training loops — slideflow 1.1.1 documentation - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + Custom Training Loops — slideflow 3.0.0 documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + + + + - - @@ -61,15 +61,7 @@ - - - +
@@ -84,11 +76,11 @@
  • - Tutorials + Tutorials
  • - GitHub + GitHub
  • @@ -100,9 +92,9 @@ - - + +